e-mail indexing with Solr FTS Engine

March 4, 2024 by Roberto Puzzanghera 2 comments

Solr is a Lucene indexing server. Dovecot communicates to it using HTTP/XML queries. With this indexing server, you can do text searches in your emails.

Upgrading to version 9.5.0

Before starting check that your java is at least at version 11.

Download version 9.5.0:

SOLR_VER=9.5.0
wget https://www.apache.org/dyn/closer.lua/solr/solr/${SOLR_VER}/solr-${SOLR_VER}.tgz?action=download -O solr-${SOLR_VER}.tgz

Then stop your Solr server and run the upgrade with the -f (upgrade) and -n (do not start the server when finished) options:

tar xzf solr-${SOLR_VER}.tgz solr-${SOLR_VER}/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-${SOLR_VER}.tgz -f -n

Slackware users will have to do:

wget https://notes.sagredo.eu/files/qmail/solr/install_solr_slackware.sh
chmod +x install_solr_slackware.sh
./install_solr_slackware.sh solr-${SOLR_VER}.tgz -f -n

Now download and install the new schema and configuration files for Dovecot

cd /var/solr/data/dovecot/conf
rm -f schema.xml managed-schema.xml solrconfig.xml
wget https://notes.sagredo.eu/files/qmail/solr/9.5/solr-schema-${SOLR_VER}.xml -O schema.xml 
wget https://notes.sagredo.eu/files/qmail/solr/9.5/solrconfig-${SOLR_VER}.xml -O solrconfig.xml chown solr:solr solrconfig.xml schema.xml

The new configuration file replaces LRUCache with CaffeineCache and changes the location of the .jar libraries (diff here).

Configure your /etc/default/solr.in.sh file, as many options are changed. Then restart the Solr server.

Finally upgrade the indexes (edit the downloaded script to insert your Dovecot password)

wget https://notes.sagredo.eu/files/qmail/solr/solr_rescan_index.sh
chmod +x solr_rescan_index.sh
chown root:root solr_rescan_index.sh
chmod o-wrx solr_rescan_index.sh

./solr_rescan_index.sh
Stopping Dovecot 
. 
<?xml version="1.0" encoding="UTF-8"?> 
<response> 

<lst name="responseHeader"> 
 <int name="status">0</int> 
 <int name="QTime">20</int> 
</lst> 
</response> 
Starting Dovecot.

If the script does not return errors (status=0) you are ok. If you get errors, double check the Authorization and the Solr's dovecot user credentials.

Installing

Solr is a java servlet which requires openjdk v. 11 or later. Solr 9.5.0 is tested also against version 17.

# java -version        
openjdk version "17.0.10" 2024-01-16 
OpenJDK Runtime Environment (build 17.0.10+7) 
OpenJDK 64-Bit Server VM (build 17.0.10+7, mixed mode, sharing)

Be sure that you have the java binary in your path and that you have defined the variable JAVA_HOME, for example

PATH=$PATH:/usr/lib64/java/bin/
JAVA_HOME=/usr/lib64/java/

Download the binary version of Solr and install

SOLR_VER=9.5.0
wget https://www.apache.org/dyn/closer.lua/solr/solr/${SOLR_VER}/solr-${SOLR_VER}.tgz?action=download -O solr-${SOLR_VER}.tgz

Extract the installer from the archive and run it. The installer will work for most Linux distributions based on systemd.

tar xzf solr-${SOLR_VER}.tgz solr-${SOLR_VER}/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-${SOLR_VER}.tgz

The server will be launched by systemd at boot time.

Installing on Slackware

If you are a Slackware user like me, the installer will not work. Use instead my modified install_solr_slackware.sh script to do the installation:

wget https://notes.sagredo.eu/files/qmail/solr/install_solr_slackware.sh
chmod +x install_solr_slackware.sh
./install_solr_slackware.sh solr-${SOLR_VER}.tgz

Configuration

The location of the server is /opt/solr, while the logs are in /var/solr/logs.

The servers' configuration file is stored in /opt/solr/bin/solr.in.sh.orig​ file, which is conveniently copied to /etc/defaults/solr.in.sh so that we can keep it after a future upgrade. Be aware that the double quotes are important in this file.

SOLR_TIMEZONE="Europe/Rome"
SOLR_IP_ALLOWLIST="127.0.0.1, 10.0.0.0/24"
SOLR_JETTY_HOST="10.0.0.152"
SOLR_SECURITY_MANAGER_ENABLED=true
SOLR_OPTS="$SOLR_OPTS -Dsolr.allowUrls=http://solr.yourdomain.tld:8983"
SOLR_PID_DIR="/var/solr/data"
SOLR_HOME="/var/solr/data"

SOLR_IP_ALLOWLIST allows connections from my DMZ. Be aware that SOLR_IP_ALLOWLIST was SOLR_IP_WHITELIST before version 9.

Setting SOLR_JETTY_HOST="10.0.0.152" (the localnet IP of the Solr server) was a little difficult to guess for me, because I have Solr in a different virtual server than qmail and apache. It allows Solr to accept connections from the outnet. If you have all the servers in the same host, you can leave SOLR_JETTY_HOST=127.0.0.1 commented out.

SOLR_PID_DIR="/var/solr/data" solved me an issue where the pid file cannot be saved in solr/bin due to priviledge problems.

SOLR_HOME="/var/solr/data" will get Solr to store the core data (dovecot data in particular) on a separate directory. This is convenient to ease future upgrades where this directory should not be relocated.

Setting the limits

Solr will run as the user solr:solr, which is created for you during the installation process. This user will ask to raise the open file limit up to 65000. In my Slackware the default limit is 1024. To increase the limit for the solr user edit your /etc/security/limits.conf or add a solr.conf file in your limits.d folder with the following instructions:

solr    soft    nofile  65536 
solr    soft    nproc   65536 
solr    hard    nofile  65536 
solr    hard    nproc   65536

In my case solr lives in an LXC unprivileged container, so the above limit had to be set not only for the solr user inside the container, but also in the host for the user who runs the container itself. In addition, the container must have this option in its config file:

lxc.prlimit.nofile = 65536

Running

We are ready to run the server or restart if it's already running:

sudo systemctl stop solr
sudo systemctl start solr
sudo systemctl status solr

Slackware users will do as follows (the install script already installed the init script in your rc.local):

/etc/init.d/solr start
sleep 5
/etc/init.d/solr status

For our convenience let's create a symbolic link to the init script:

ln -s /etc/init.d/solr /usr/local/bin/solrctl

Dovecot core setup

To use Solr with Dovecot, it needs to be configured specifically (Solr must be running):

sudo -u solr /opt/solr/bin/solr create -c dovecot

The data and the configuration files for Dovecot are under /var/solr/data/dovecot.

The official Dovecot documentation recommends to change a couple of configuration files:

cd /var/solr/data/dovecot/conf
rm -f schema.xml managed-schema.xml solrconfig.xml
wget https://notes.sagredo.eu/files/qmail/solr/9.5/solr-schema-9.5.0.xml -O schema.xml
wget https://notes.sagredo.eu/files/qmail/solr/9.5/solrconfig-9.5.0.xml -O solrconfig.xml 
chown solr:solr solrconfig.xml schema.xml

The managed-schema file is generated based on schema.xml.

We already compiled Dovecot with the Solr support (--with-solr configuration). Add the plugin in the 10-mail.conf file:

mail_plugins = $mail_plugins fts fts_solr

and add the configuration in the plugin{...} block inside the 90-plugin.conf file:

plugin {
fts = solr
fts_solr = url=http://solr.mydomain.tld:8983/solr/dovecot/
# eventually add debug to the previous line

...
}

where solr.mydomain.tld is the domain where Solr is reachable (defined in the configuration file).

Control panel setup

General settings are available also via web panel. You have to setup a virtual domain where Solr will be listening. This is how to configure an apache virtual host to serve the control panel. I have set a proxy to connect via regular port 443.

<VirtualHost *:443>
       SSL stuff here
       ServerName solr.mydomain.tld

       ErrorLog ${LOGDIR}/solr_error.log
       LogLevel warn
       CustomLog ${LOGDIR}/solr_access.f2b.log combined

       SSLProxyEngine On
       ProxyRequests Off
       ProxyPass        / http://solr.mydomain.tld:8983/
       ProxyPassReverse / http://solr.mydomain.tld:8983/
</VirtualHost>

When you navigate to https://solr.mydomain.tld you will see that the control panel does not require any login and that it complains of the lack of any security policy.

Securing

Look to where your solr.home environment variable is set navigating to the dashboard:

Change to that directory and install a security.json file which will define your plugins for authentication, authorization and auditlogging:

cat > /var/solr/data/security.json << __EOF__
{ 
"authentication":{ 
"blockUnknown": true, 
"class":"solr.BasicAuthPlugin", 
"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}, 
"realm":"My Solr users", 
"forwardCredentials": false 
}, 
"authorization":{ 
"class":"solr.RuleBasedAuthorizationPlugin", 
"permissions":[{"name":"security-edit","role":"admin"}], 
"user-role":{"solr":"admin"} 
}, 
"auditlogging":{ 
"class": "solr.SolrLogAuditLoggerPlugin", 
"async": true, 
"blockAsync" : false, 
"numThreads" : 2, 
"queueSize" : 4096, 
"eventTypes": ["REJECTED", "ANONYMOUS_REJECTED", "UNAUTHORIZED", "COMPLETED", "ERROR"] 
} 
}
__EOF__

chown solr:solr /var/solr/data/security.json
chmod o-r /var/solr/data/security.json​

Now restart the server and you should be able to login. The login user is 'solr', with a temporary password 'SolrRocks'. Change it after your first login.

The security section suggests to fix some permission issues. In addition, you have to create a "dovecot" user and grant it read and update permissions in order to perform its tasks. Do not use special characters like "@" in your password, because you can have parsing failures in the dovecot connection.

This is my set up:

I have TLS disabled since the connection between the apache proxy and the Solr server is via http.

From now on, you can manage users, roles and permissions acting on the control panel rather than editing the above security.json file, which will be modified for you.

Now that we have secured the Dovecot's user connection to the Solr server, we must adjust the 90-plugin.conf file:

plugin {
fts = solr
fts_solr = url=http://dovecot:password@solr.mydomain.tld:8983/solr/dovecot/
# eventually add debug to the previous line

...
}

note the dovecot:password which are the credentials for the Solr's user dovecot.

Testing

# telnet 0 143 
Trying 0.0.0.0... 
Connected to 0. 
Escape character is '^]'. 
* OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE LITERAL+ STARTTLS AUTH=PLAIN AUTH=LOGIN] Dovecot ready. 
a login user@mydomain.tld password
a OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND URL-PARTIAL CATENATE UNSELECT CHILDREN NA
MESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY PREVIEW STATUS=SIZE SAVEDATE LI
TERAL+ NOTIFY SPECIAL-USE QUOTA] Logged in 
a select Inbox 
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $MDNSent Junk $label3 $Forwarded) 
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft NonJunk $MDNSent Junk $label3 $Forwarded \*)] Flags permitted. 
* 308 EXISTS 
* 0 RECENT 
* OK [UIDVALIDITY 1285590712] UIDs valid 
* OK [UIDNEXT 19895] Predicted next UID 
* OK [HIGHESTMODSEQ 31675] Highest 
a OK [READ-WRITE] Select completed (0.001 + 0.000 secs). 
a SEARCH text "Dovecot"  
* SEARCH 35 50 51 55 56 57 58 62 63 74 75 76 77 121 129 130 146 150 151 158 163 164 165 173 196 201 202 203 213 225 226 227 228 230 231 232 249 250 262 263 264 289 
a OK Search completed (0.309 + 0.001 + 0.043 secs). 
a logout

Open your dovecot log and be sure that there are no errors. Then look at the Solr's logs. This is a clean /var/solr/logs/solr.log:

2023-01-04 13:42:44.423 INFO  (qtp1141500277-21) [   dovecot] o.a.s.c.S.Request webapp=/solr path=/select params={q={!lucene+q.op%3DAND}(hdr:Dovecot+OR+body:Dovecot)&fl=uid,score&sort=uid+a sc&fq=%2Bbox:8af5d82b9ae4c94cbe610000364df272+%2Buser:user@mydomain.tld&rows=21842&wt=xml} hits=44 status=0 QTime=15 
2023-01-04 13:42:44.424 INFO  (audit-31-thread-1) [   ] o.a.s.s.SolrLogAuditLoggerPlugin type="COMPLETED" message="Completed" method="GET" status="200" requestType="SEARCH" username="dovecot" resource="/select" queryString="wt=xml&fl=uid,score&rows=21842&sort=uid+asc&q=%7b!lucene+q.op%3dAND%7d(hdr:Dovecot+OR+body:Dovecot)&fq=%2Bbox:8af5d82b9ae4c94cbe610000364df272+%2Buser:user@mydomain.tld" collections=[]

and this is the request on the log file /var/solr/logs/YYYY_MM_DD.request.log. You can see that it was served correctly with a final 200 code:

10.0.0.4 - - [04/Jan/2023:13:42:44 +0000] "GET /solr/dovecot/select?wt=xml&fl=uid,score&rows=21842&sort=uid+asc&q=%7b!lucene+q.op%3dAND%7d(hdr:Dovecot+OR+body:Dovecot)&fq=%2Bbox:8af5d82b9ae 4c94cbe610000364df272+%2Buser:user@mydomain.tld HTTP/1.1" 200 4550

If you see a 451 error code you have to double check the authorizations.

Comments

some errors ?

sudo -u solr /opt/solr/bin/solr create -c dovecot 

fails if security.json is active, so move

sudo -u solr /opt/solr/bin/solr create -c dovecot

before explaining and activating security.json

rm -f schema.xml managed-schema solrconfig.xml 

must be:

rm -f schema.xml managed-schema.xml solrconfig.xml

else managed-schema.xml overrides downloaded schema and solr thhrows undefined field error

maybe to add /etc/init.d/solr start & in rc.local - directly in install_solr_slackware.sh

Reply |

some errors ?

Thank you, I'll check it out

I think that this solr installation can be simplified a lot, also getting rid from the web interface

Reply |