https Redirects

In my last post, I had done a brief content audit of what was on my old site and noted what I wanted to redirect and to where. I also noted some issues about how the old site ran which can be fixed with redirect.

https

The first being that I wanted to redirect all traffic to use secure pages via https.

Many pages I visited had the same code and told visitors to place it within their .htaccess file.

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

but I couldn't find an understanding of what each line meant. I've spent too many years blindly copying and pasting (and doing things wrong) and I don't want to continue down that route. Let's get some understanding happening.

The .htaccess file lives at the root of your website on your server and provides a set of rules for how your server should perform when someones trying to access it.

You should be careful when editing your .htaccess file, one wrong move and your site will go down. Always make a backup.

​RewriteEngine On

This tells your server to load the code that enables URL rewriting to allow redirects.

RewriteCond %{HTTPS} off

This threw me when I first read it. I want HTTPS ON so why am I setting it to off? Well dear reader, this isn't a setting, it's a logical condition.

This line of code is a test. The RewriteCond portion tells the server that we're asking a question. The % denotes a server-variable, in this case https. And then the final part, off is our given result.

This line of code doesn't say Switch HTTPS off, it says IS the server variable HTTPS off?

If the variable is off, then the following code is executed. If it's on, it's skipped.

RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

This is the code that's run if the previous question is true and is broken down into 4 parts.

RewriteRule defines the rule to be followed, The Rule.

^(.*)$ defines what we're looking for, The Pattern.

https://%{HTTP_HOST}%{REQUEST_URL} is what we will replace the pattern with, The Substitution.

[L,R=301] are any special instructions, the Flags.

The Rule

So to break this down even further, the first part (The Rule) tells the server that this is what we're going to run if our RewriteCond is true.

The Pattern

^(.*)$ is the next part (The Pattern) of this string and at first glance, looks like a cat trod on the keyboard but it is actually a perl compatible regular expression.

^ means the beginning of a string, affectionately called an anchor.

() groups several characters into a string...

.* is the string within the above brackets and is actually two functions. The first element, the . period matches any single character and the second element, the * asterisk repeats the previous function zero or more times. Infinitely?

And our final part is the $ dollar sign which matches the end of the string, or the other anchor.

So now we know what mess the cat made each of the characters in ^(.*)$ means, we can piece it together. Match all characters or the URL.

An example would be if I visited http://mrqwest.co.uk; the entire URL would be matched. Because it's http, our RewriteCond checks to see if https is off (it is because we've not requested it) and then our Pattern matches the entire URL. The period matches the first character and the asterisks repeats the function so it continues to match each consecutive character until it reaches the end, denoted by the other anchor.

The Substitution

https://%{HTTP_HOST}%{REQUEST_URI}

This looks a bit scary but it isn't. This is what we're replacing our pattern with. We're basically recreating the URL that you see in the address bar.

https:// is us telling the server to use https://

%{HTTP_HOST} is like we mentioned above, the % denotes a server variable, in this instance, HTTP_HOST which matches the host name. If we went to http://mrqwest.co.uk/blog, the mrqwest.co.uk/ portion is the host name.

%{REQUEST_URI} is similar to above. It's another server variable but pulls in the URI of the file you're looking for. Using the same example as above, if we went to http://mrqwest.co.uk/blog, the /blog portion is the URI.

Piecing all of this together means we're substituting our pattern with a newly formed URL which includes https:// at the beginning, and the rest comes from the server variables, which in turn come from the original URL in the address bar.

The Flags

Lastly, the flags.

[L,R=301]

The RewriteRule Flags page on the Apache Docs pages lists a full list of flags that can be used here but let's review what we've got and not run before we can walk.

L means Last and tells the server that this is the last element of this Rule and to stop messing with this URL .

R=301 means Redirect and the 301 tells the server it's a permanent redirect.

www to non-www

Whilst we're here, we may as well remove the www. from the domain also. Try to avoid any repeatable or duplicated content anywhere.

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.example.com [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301,NC]

Here's the code we're looking at. It's looking fairly similar to the earlier code so it should be easier to understand.

Firsts point is though, if your .htaccess already has a RewriteEngine on tag, don't implement it again.

The first line we're looking at is the logical condition.

RewriteCond %{HTTP_HOST} ^www.example.com [NC]

But instead of asking if https is enabled, were asking the server (denoted by the %{HTTP_HOST}) where the URL entered matches the result. In the example above, we're asking if the domain entered matches www.example.com?

The [NC] flat at the end of the line tells the server to ignore upper & lower case in the result. You could type WwW.eXaMpLe.CoM into the URL bar and it'll still match.

The final line is the RewriteRule. Similar to the old one, we're matching what was entered into the URL bar using the expression ^(.*)$ and then replacing it with a string. We're unable to get a server variable with or without the www. so we need to enter the string here ourselves.

So my RewriteRule will be

RewriteRule ^(.*)$ https://mrqwest.co.uk/$1 [L,R=301,NC]

The $1 which follows the URL tells the server to take whatever was entered into the url outside of the {HTTP_HOST} in the logical condition and add it back on afterwards.

Combining the two

So now we've got a whole bunch of conditions and rules but can we combine the two?

We've now got two RewriteCond calls in our code.

RewriteCond %{HTTPS} off

and

RewriteCond %{HTTP_HOST} ^www.example.com [NC]

The first checks for https and the second checks for www.

We can combine the two using [OR]. If either of the above conditions are true, then the rule will be applied.

​RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www.mrqwest.co.uk [NC]
RewriteRule ^(.*)$ https://mrqwest.co.uk/$1 [L,R=301,NC]

I've also tweaked the actual RewriteRule and hard encoded the URL to avoid any calamites with any non-www or www links.

Conclusion

When I first looked at https redirects with htaccess, I wanted a bit more than to just copy & paste some code and hope for the best. I wanted (as with everything now) to understand what I was adding to my site.

And now my site redirects from http to https from the off and I actually understand how it's doing it! It's also removing the www from urls and it just seems cleaner.