SEO friendly URLs with apache mod_rewrite

March 27, 2011 Roberto Puzzanghera0 comments

I assume that the reader is familiar with the "SEO friendly URLs" concept. If not you can start reading these:

At the cost of beating my own drum, I will consider a real case: http://wildzone.it. It is an image gallery where the contents are organized in a structure like this:

LANGUAGE -> CATEGORY -> GALLERY -> IMAGE

So the smartest way to write the URLs would be: http://wildzone.it/en/italy-4/seaside-of-sardinia-19/orosei-117f.html (external URL). The internal URL, actually managed by the cms and more concise, is: http://wildzone.it?l=en&cat=4&gal=19&id=117. As you can see, the external URL is SEO and user friendly. It embeds the id number of the language/category/gallery/photo. In addition, reading the URL gives the user an understanding of the site's topic and organization.

Using mod_rewrite to perform the trasformation

mod_rewrite can help you to convert the SEO friendly URL addressed by the visitor to the internal URL (the one with queries) which is more suitable for a CMS.

You have to configure apache in such a way:

./configure --enable-rewrite

In addition you have to allow mod_rewrite  and .htaccess Options inside the virtual host configuration:

AllowOverride Options FileInfo

And these are the instructions that I put in my .htaccess file to accomplish the task:

<IfModule mod_rewrite.c>
 RewriteEngine on
 
 # photo page
 RewriteCond %{REQUEST_URI} .*f.html$
 RewriteRule ^(\w+)/.*-(\w+)/.*-(\w+)/.*-(\w+)f\.html$ ?l=$1&sez=$2&gal=$3&id=$4 [L]
 
 # thumbnails page
 RewriteCond %{REQUEST_URI} .*g.html$
 RewriteRule ^(\w+)/.*-(\w+)/.*-(\w+)g\.html$ ?l=$1&sez=$2&gal=$3 [L]
</IfModule>

Let's explain in details the regular expressions.

This selects pages which end with "f.html" and execute the following rule (the "f" character is used to distinguish photo's pages from the others).

RewriteCond %{REQUEST_URI}   .*f.html$ 

If the above condition is matched this rewrite rule is exetuted (I will explain in detail the regular expression):

RewriteRule

^ the expression must be at the beginning of the string

( beginning of the pattern

\w+ a sequence of characters

) end of the pattern. The string (\w+) matches up the language's string ("en" or "it"). It will be rewrited as the variable $1.

/ a slash character is expected

.*- zero or more characters (the category's title) followed by a minus symbol (which preceds the category's id)

(\w+) another pattern, which matches up the category's id. It will be rewrited as $2

/ a slash character

.*- zero or more characters followed by a minus symbol

(\w+) pattern which matches up the gallery's id. It will be rewrited as $3

/ a slash character

.*- zero or more characters followed by a minus symbol

(\w+) pattern which matches up the photo's id. It will be rewrited as $4

f the f character is used to recognize the rule for the photo's pages

\.html \ is the escape character, since the dot is a special one. Here a ".html" string is expected.

$  the expression must be at end of the string

?l=$1&sez=$2&gal=$3&id=$4 this is how the URL will be rewrited. $1, $2, $3, $4 are respectively the four pattern matched.

[L] tells Apache to stop processing the rewrite rules for that request.

Add a comment