e-mail indexing with Solr FTS Engine

April 21, 2022 Roberto Puzzanghera 0 comments

Solr is a Lucene indexing server. Dovecot communicates to it using HTTP/XML queries. With this indexing server, you can do text searches in your emails.

Installing

Solr is a java servlet which requires openjdk v. 8 or later. Be sure that you have the java binary in you path, for example

PATH=$PATH:/usr/lib64/java/bin/

Download the binary version of Solr and install

cd /usr/local/src
wget https://www.apache.org/dyn/closer.lua/lucene/solr/8.11.1/solr-8.11.1.tgz?action=download -O solr-8.11.1.tgz

Extract the installer from the archive and run it. The installer will work for most Linux distributions based on systemd.

tar xzf solr-8.11.1.tgz solr-8.11.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-8.11.1.tgz

The server will be launched by systemd at boot time.

Installing on Slackware

If you are a Slackware user like me, the installer will not work. Use instead my modified install_solr_slackware.sh script to do the installation:

wget https://notes.sagredo.eu/files/qmail/install_solr_slackware.sh
./install_solr_slackware.sh solr-8.11.1.tgz

Configuration

To use Solr with Dovecot, it needs to be configured specifically:

sudo -u solr /opt/solr/bin/solr create -c dovecot

The location of the server is /opt/solr, while the data and the configuration files for Dovecot are under /var/solr/data/dovecot. You can find the logs in /var/solr/logs.

In my installation I noticed that the /opt/solr/bin/solr.in.sh configuration file was not ready as it was named solr.in.sh.orig. So I ranamed it:

mv /opt/solr/bin/solr.in.sh.orig /opt/solr/bin/solr.in.sh

I enabled these options. The first one allows connections from my DMZ:

SOLR_IP_WHITELIST=127.0.0.1, 10.0.0.0/24
SOLR_SECURITY_MANAGER_ENABLED=true

At this point the official Dovecot documentation recommends to change a couple of configuration files:

cd /var/solr/data/dovecot/conf
rm -f schema.xml managed-schema solrconfig.xml
wget https://raw.githubusercontent.com/dovecot/core/master/doc/solr-config-7.7.0.xml -O solrconfig.xml
wget https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0.xml -O schema.xml
chown solr:solr solrconfig.xml schema.xml

The managed-schema file is generated based on schema.xml.

Setting the limits

Solr will run as the user solr:solr. This user will ask to raise the open file limit up to 65000. In my Slackware the default limit is 1024. To increase the limit for the solr user edit your /etc/security/limits.conf or add a solr.conf file in your limits.d folder with the following instructions:

solr    soft    nofile  65536 
solr    soft    nproc   65536 
solr    hard    nofile  65536 
solr    hard    nproc   65536

In my case solr lives in an LXC unprivileged container, so the above limit had to be set not only for the solr user inside the container, but also in the host for the user who runs the container itself. In addition, the container must have this option in its config file:

lxc.prlimit.nofile = 65536

Running

We are ready to run the server or restart if it's already running:

sudo systemctl stop solr
sudo systemctl start solr
sudo systemctl status solr

Slackware users will do as follows (remember to add the start command in your rc.local)

/etc/init.d/solr start
sleep 5
/etc/init.d/solr status

Dovecot plugin

We already compiled Dovecot with the Solr support (--with-solr configuration).

Add the plugin in the 20-imap.conf file:

mail_plugins = $mail_plugins fts fts_solr

and add the configuration in the plugin{...} block inside the 90-plugin.conf file:

plugin {
fts = solr
fts_solr = url=https://solr.mydomain.tld:8983/solr/dovecot/

...
}

Apache control panel

General settings are available also via web panel, at the same address that we already set for the Dovecot plugin. This is how to config an apache to serve the control panel. I have set a proxy to connect via regular port 443.

<VirtualHost *:443>
       SSL stuff here
       ServerName solr.mydmain.tld

       ErrorLog ${LOGDIR}/solr_error.log
       LogLevel warn
       CustomLog ${LOGDIR}/solr_access.f2b.log combined

       SSLProxyEngine On
       ProxyRequests Off
       ProxyPass        / http://solr.mydomain.tld:8983/
       ProxyPassReverse / http://solr.mydomain.tld:8983/
</VirtualHost>

When you browse to https://solr.mydomain.tld you will see that the control panel does not require any login and that it complains of the lack of any security policy.

Securing

Look to where your solr.home environment variable is set navigating to the dashboard:

Change to that directory and install a security.json file which will define your plugins for authentication, authorization and auditlogging:

cat > /var/solr/data/security.json << __EOF__
{
"authentication":{ 
"blockUnknown": true, 
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}, 
"realm":"My Solr users", 
"forwardCredentials": false 
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit",
"role":"admin"}], 
"user-role":{"solr":"admin"} 
}}

}
__EOF__

chown solr:solr /var/solr/data/security.json
chmod o-r /var/solr/data/security.json​

Now restart the server and you should be able to login. The login user is 'solr', with a temporary password 'SolrRocks'. Change it after your first login.

The security section suggests to fix some permission issues. This is my set up:

I have TLS disabled since the connection between the apache proxy and the solr server is via http.

Testing

# telnet 0 143 
Trying 0.0.0.0... 
Connected to 0. 
Escape character is '^]'. 
* OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE LITERAL+ STARTTLS AUTH=PLAIN AUTH=LOGIN] Dovecot ready. 
a login user@mydomain.tld password
a OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NA
MESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY PREVIEW STATUS=SIZE SAVEDATE LI
TERAL+ NOTIFY SPECIAL-USE QUOTA] Logged in 
a select Inbox 
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $MDNSent Junk $label3 $Forwarded) 
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $MDNSent Junk $label3 $Forwarded \*)] Flags permitted. 
* 308 EXISTS 
* 0 RECENT 
* OK [UIDVALIDITY 1285590712] UIDs valid 
* OK [UIDNEXT 19895] Predicted next UID 
* OK [HIGHESTMODSEQ 31675] Highest 
a OK [READ-WRITE] Select completed (0.001 + 0.000 secs). 
a SEARCH text "Dovecot"  
* SEARCH 35 50 51 55 56 57 58 62 63 74 75 76 77 121 129 130 146 150 151 158 163 164 165 173 196 201 202 203 213 225 226 227 228 230 231 232 249 250 262 263 264 289 
a OK Search completed (0.309 + 0.001 + 0.043 secs). 
a logout

This is my first installation of Solr and this page is under review. Feel free to send your suggestions and improvements in the comments.

Add a comment