Spamassassin TxRep Reputation and Bayesean (SQL)

May 27, 2020 Roberto Puzzanghera14 comments

TxRep was designed as an enhanced replacement of the AutoWhitelist plugin. TxRep, just like AWL, tracks scores of messages previously received, and adjusts the current message score, either by boosting messages from senders who send ham or penalizing senders who have sent spam previously. This not only treats some senders as if they were whitelisted but also treats spammers as if they were blacklisted. Each message from a particular sender adjusts the historical total score which can change them from a spammer if they send non-spam messages. Senders who are considered non-spammers can become treated as spammers if they send messages which appear to be spam. Simpler told TxRep is a score averaging system. It keeps track of the historical average of a sender, and pushes any subsequent mail towards that average.

I assume that you have a "spamassassin" DB and user as already done in the previous page.

Changes in v. 3.4.3

It was pointed out (tx Tony Fung) that the count column was renamed in v. 3.4.3 of spamassassin, so you should run this query after the upgrade:

ALTER TABLE `txrep` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';

Create the DB tables

# mysql -h [mysql-IP] -u root -p

USE spamassassin;
CREATE TABLE txrep (
 username varchar(100) NOT NULL default '',
 email varchar(255) NOT NULL default '',
 ip varchar(40) NOT NULL default '',
 ​msgcount int(11) NOT NULL default '0',
 totscore float NOT NULL default '0',
 signedby varchar(255) NOT NULL default '',
 last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (username,email,signedby,ip),
 KEY last_hit (last_hit)
) ENGINE=InnoDB;

CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) ENGINE=InnoDB;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) ENGINE=InnoDB;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) ENGINE=InnoDB;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) ENGINE=InnoDB;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) ENGINE=InnoDB;

Configure

Enable TxRep editing the file local.cf

use_txrep 1
txrep_factory Mail::SpamAssassin::SQLBasedAddrList

 and commenting out this line in the /etc/mail/spamassassin/v310.pre file:

loadplugin Mail::SpamAssassin::Plugin::AWL

Add this to your /etc/mail/spamassassin/90-sql.cf (you should have created 90-sql.cf while configuring userprefs) and adjust to your needs:

# txrep
txrep_factory                   Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                    DBI:mysql:spamassassin:localhost
user_awl_sql_username           spamassassin
user_awl_sql_password           SApassword
user_awl_sql_table              txrep

# bayesean
bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:spamassassin:localhost
bayes_sql_username              spamassassin
bayes_sql_password              SApassword

Purging txrep table

The awl table is going to grow day after day depending on the traffic on your mail server. Most of the records are single spam event that will rarely produce another hit so that you can decide to clean out them to optimize the volume of data stored in that table and speed up the mysql query consequently.

Thus, let's create a file which stores the MySQL query. Modify this example entering the MySQL executable and the spamassassin MySQL account:

cd /usr/local/etc

cat > txrep_purge.sql << __EOF__
USE spamassassin;
DELETE FROM txrep WHERE last_hit <= (now() - INTERVAL 120 day);
__EOF__

cat > txrep_purge.sh << __EOF__
#!/bin/sh
/usr/bin/mysql -uspamassassin -p[password] < /usr/local/etc/txrep_purge.sql
exit 0
__EOF__

So "spamassassin" is the myql user and "[password]" is the password (this account must have the priviledges for the "spamassassin" DB both from the mail server's IP, from the apache's IP (userprefs via Roundcube) and now from the mysql host (localhost). Don't add spaces after -u and -p.

Now assign only to the mysql user the read priviledges for the two files:

chown root:mysql /usr/local/etc/txrep*
chmod ug+x /usr/local/etc/txrep_purge.sh
chmod o-rwx /usr/local/etc/txrep*

Finally edit the crontab

crontab -e

and add a cronjob like this:

#minute hour    mday    month   wday    command
1       1       25      *       *      /usr/local/etc/txrep_purge.sh

Comments

Failed to parse line

Hi Roberto,

I am getting this error:

May 27 01:09:53 mail spamd[10301]: config: failed to parse line, skipping, in "/etc/mail/spamassassin/sql.cf": auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList

Cheers.

Reply | Permalink

Failed to parse line

did you commented this line?

#auto_whitelist_factory          Mail::SpamAssassin::SQLBasedAddrList

Reply | Permalink

Failed to parse line

Hi Roberto,

The error is in your guide. Where you have:

auto_whitelist_factory          Mail::SpamAssassin::SQLBasedAddrList

it should be

txrep_factory          Mail::SpamAssassin::SQLBasedAddrList

Thanks.

Reply | Permalink

Failed to parse line

Thank you. Actually I modified that line in my server but forgot to do the same in this page

Reply | Permalink

A small observation

and commenting out this line:

loadplugin Mail::SpamAssassin::Plugin::AWL

This line is located in the v310.pre file. So the text should read:

and commenting out this line in the /etc/mail/spamassassin/v310.pre file:

loadplugin Mail::SpamAssassin::Plugin::AWL

Reply | Permalink

Is MySQL really required?

Hi Roberto,

We have here bayes and TxRep enabled without MySQL, with all data being written to /etc/mail/spamassassin/.spamassassin, since we have no interest in using MySQL as we don't need userpref here.

Do you think the MySQL approach is really necessary?

I'd rather keep things simple here.

Cheers

Reply | Permalink

Is MySQL really required?

Hi, I choosed the mysql approach because I find it easier to purge the database by means of an SQL query, but I think you can get rid of mysql if you do the same with a command line script...

Reply | Permalink

Spamassassin 3.4.3 table column name changed

Hi Roberto,

The column "count" in table "txrep" is renamed to "msgcount" from version 3.4.3. Look into section "TxRep and Awl plugins has been modified..." at https://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_release_3_4_3/UPGRADE.

Please update your guide as underneath when creating new table:

CREATE TABLE txrep (
  username varchar(100) NOT NULL default '',
  email varchar(255) NOT NULL default '',
  ip varchar(40) NOT NULL default '',
  msgcount int(11) NOT NULL default '0',
  totscore float NOT NULL default '0',
  signedby varchar(255) NOT NULL default '',
  last_hit timestamp NOT NULL default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (username,email,signedby,ip),
  KEY last_hit (last_hit)
) ENGINE=InnoDB;

Or modifiy the table with the following command to upgrade from prior version:

ALTER TABLE `txrep` CHANGE `count` `msgcount` INT(11) NOT NULL DEFAULT '0';

Otherwise, the following error shall be recorded in spam log:

SQL error: Unknown column 'msgcount' in 'field list'

Reply | Permalink

Spamassassin 3.4.3 table column name changed

Thank you, corrected.

Reply | Permalink

cleanup old data

By adding

ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

to the awl table definition you can easily spot old entries and delete them.

echo -e 'USE spamassassin;
DELETE FROM awl WHERE ts < (NOW()-(28*24*3600));' | mysql -uspamassassin -pPASSWORD

Reply | Permalink

Oops, the sql statement for

Oops, the sql statement for clean up should be

DELETE FROM awl WHERE ts < (NOW() - INTERVAL 28 DAY)

Reply | Permalink

yes, that's even better. I'll

yes, that's even better. I'll update this page as soon as possible

Reply | Permalink

AWL and Bayesean

hi,

Ended up with this error message:

Mon Jan 17 21:14:27 2011 [5769] info: config: failed to parse line, skipping, in "(no file)": bayes_auto_learn_threshold_spa 10
Mon Jan 17 21:14:27 2011 [5769] info: config: failed to parse line, skipping, in "(no file)": bayes_auto_learn_threshold_spa 10
Mon Jan 17 21:14:27 2011 [5769] info: spamd: processing message <4D34A31D.7050002@domain.xyz for user@domain.xyz:5002
Mon Jan 17 21:14:27 2011 [5769] info: spamd: identified spam (99.3/4.0) for user@domain.xyz:5002 in 0.0 seconds, 1226 bytes.


Reply | Permalink

Userprefs

Hi,

it's seems like the messages was rejected correctly because the sender is blacklisted, as the score is close to 100. Was it rejected?

Reply | Permalink