Setting up a script for the Spamassassin's learning and reporting systems

June 20, 2021 by Roberto Puzzanghera 2 comments

Now that we have the spam filters in place we have to train our bayesian system and report our spam to Razor, Pyzor and Spamcop.

The obvious thing that comes in mind at this point could be to call sa_learn and spamassassin --report in cascade when clicking in the Roundcube webmail's "Mark as Junk"  button (look  at the cmd_learn and multi_driver drivers of the markasjunk plugin), but this option has a couple of downsides:

  • the learning process, the resulting journal syncing and the connection to several filtering networks takes up to 10 seconds, a time interval that our users don't want to wait.
  • even worse, when they click the "Mark as Junk" button it is not always for a real spam message. For example, think about the regular newsletters that they no longer want to read and that they decide to conveniently label as spamming instead of unsubscribe in the proper way.

Therefore it is better to run these two tasks by means of a cronjob every night (and this is going to solve the first issue), processing the messages stored in a folder where the users have copied only real spam or ham messages (then fixing the second issue as well).

Creating the "Teach" mailboxes

When you configured dovecot you have prepared the code for the autocreation of the TeachSpam and TeachNotSpam mailboxes as sons of Junk. If this is not a fresh installation or you configured dovecot some time ago, check your 15-mailboxes.conf file:

 mailbox "Junk.TeachSpam" { 
   auto = subscribe 
   autoexpunge = 5d 
 } 
 mailbox "Junk.TeachNotSpam" { 
   auto = subscribe 
   autoexpunge = 30d 
 }

Cronjob setup

Now download my shell script

wget -O /usr/local/bin/sa_cron.sh https://notes.sagredo.eu/files/qmail/sa_cron.sh
chmod +x /usr/local/bin/sa_cron.sh

and setup a cronjob to run it every night, for example

45 2 * * * /usr/local/bin/sa_cron.sh >> /var/log/cron

If you run it with no arguments, the script will do the job for all users having the .Junk.TeachSpam and .Junk.TeachNotSpam mailboxes in their Maildirs.

If you want to test it for a single admin user you can run it in the following way:

sa_cron.sh username@domain.tld

Edit the script and set DELETE_TEACH_DATA=1 if you want to delete the messages after they have been processed. I commented out the line which deletes the messages in the TeachNotSpam mailbox because I'm not sure that deleting the ham messages is a good idea.

Set DEBUG=1 to run sa_learn and spamassassin in debug mode, so that the logs will show everything.

Logrotate

Setup che logrotate for the above log files:

cat > /etc/logrotate.d/spam_reports << __EOF__
/var/log/spamassassin/spamassassin.log /var/log/spamassassin/sa_learn.log {
su root apache
rotate 5
daily
missingok
notifempty
delaycompress
create 664 root apache 
sharedscripts
}
__EOF__

Comments

log file duplication?

Hi,

In the installation step a log is setup for spamd and I don't know if the /var/log/spamassassin/spamassassin.log setup here has to be separate or if it can point to the other one?

Reply |

log file duplication?

it can be the same, but I prefer to separate the log of spamd from these ones

Edit: eventually, you have to set the log file inside the script

Reply |

Add a comment