Setting up a script for the Spamassassin's learning and reporting systems

June 20, 2021 Roberto Puzzanghera0 comments

Now that we have the spam filters in place we have to train our bayesian system and report our spam to Razor, Pyzor and Spamcop.

The obvious thing that comes in mind at this point could be to call sa_learn and spamassassin --report in cascade when clicking in the Roundcube webmail's "Mark as Junk"  button (look  at the cmd_learn and multi_driver drivers of the markasjunk plugin), but this option has a couple of downsides:

  • the learning process, the resulting journal syncing and the connection to several filtering networks takes up to 10 seconds, a time interval that our users don't want to wait.
  • even worse, when they click the "Mark as Junk" button it is not always for a real spam message. For example, think about the regular newsletters that they no longer want to read and that they decide to conveniently label as spamming instead of unsubscribe in the proper way.

Therefore it is better to run these two tasks by means a cronjob every night (and this is going to solve the first issue), processing the messages stored in a folder where the users had copied only real spam or ham messages (then fixing the second as well).

Creating the "Teach" mailboxes

When you configured dovecot you have prepared the code for the autocreation of the TeachSpam and TeachNotSpam mailboxes as sons of Junk. If this is not a fresh installation or you configured dovecot some time ago, check your 15-mailboxes.conf file:

 mailbox "Junk.TeachSpam" { 
   auto = subscribe 
   autoexpunge = 5d 
 mailbox "Junk.TeachNotSpam" { 
   auto = subscribe 
   autoexpunge = 30d 

Cronjob setup

Now download my shell script

cd /usr/local/bin
chmod +x

and setup a cronjob to run it every night, for example

45 2 * * * /usr/local/bin/ >> /var/log/cron

If you run it with no arguments, the script will do the job for all users having the .Junk.TeachSpam and .Junk.TeachNotSpam mailboxes in their Maildirs.

If you want to test it for a single admin user you can run it in the following way: username@domain.tld

Edit the script and set DELETE_TEACH_DATA=1 if you want to delete the messages after they have been processed. I commented out the line which deletes the messages in the TeachNotSpam mailbox because I'm not sure that deleting the ham messages is a good idea.

Set DEBUG=1 to run sa_learn and spamassassin in debug mode, so that the logs will show everything.


Setup che logrotate for the above log files:

cat > /etc/logrotate.d/spam_reports << __EOF__
/var/log/spamassassin/spamassassin.log /var/log/spamassassin/sa_learn.log {
su root apache
rotate 5
create 664 root apache 

Add a comment