Archiving Old Content with 301 Redirects
My last post, we looked at redirecting all traffic to https for a more secure web. This came about following a content audit from my friend Simon who reported a mix of http & https content in my site crawl.
Now that's fixed, we'll be using a similar technique to redirect all old content to the archive
.
First, make sure the RewriteEngine
is enabled
RewriteEngine On
Next, we're setting a redirect and the simplest method I've seen is this
Redirect 301 "oldpage.html" "newpage.html"
and that's it. This goes in .htaccess
and whenever a user visits oldpage.html
the server knows to redirect that user to newpage.html
Because I've got a lot of content to redirect, we're going to be looking at a lot of redirects. Fortunately, when Simon carried out the site audit, he was able to supply me with an excel spreadsheet of every internal link on my website.
I took the entire list of URLs and pasted it in a new spreadsheet. Because of the previous mix of http & https domains, I decided to strip the domain out of each URL. I'm sure there's easier ways of doing this but I did a search for /blog/
and then replaced it with an added comma at the beginning ,/blog/
. A minor change but this allowed me to do use Text to Columns and split on the comma we added which then separated the host and the remaining structure.
I deleted the host column leaving me with just the path or URI which will be our old URL
/blog/100/ie6
I then used the CONCAT
excel function on the next column to form our new URL. CONCAT
joins several cells together, but you can also use it to add new strings to your cells which is what we're going to do here by adding /archive
to our old URL.
=CONCAT("/archive",A1)
This is assuming A1 is our old URL. What this gives us now is a new URI
/archive/blog/100/ie6
This is where the archived content will live now.
We should have two columns in our excel sheet now, one with the old URI and one with our new URI.
I then moved to the next empty column over and used CONCAT
again to formulate my redirect string.
=CONCAT("Redirect 301 ",A1," ",B1)
This creates a new string made up of 4 parts, split by commas.
The first part, Redirect 301
is our server instruction telling the server to permanently redirect via 301.
The second part, A1
references our old URI in cell A1.
The third part is important but easily missed in the above function. It's a pair of quote marks wrapping a blank space, " "
. This tells excel to add a space between cells A1 & B1 which our server redirect needs to differentiate between the old & new urls.
The Fourth and final part, B1
references our new URI in cell B1
So that formula will then provide us with our server redirect string
Redirect 301 /blog/100/ie6 /archive/blog/100/ie6
Copy the formula for all the old URLs gave us a long list (515 in total) of redirect commands which I copied from the excel spreadsheet directly into my .htaccess
file.
I then selected a few random links and tested them to make sure it works.
You could try the example above (and testing our other redirects we've done) by going to the following URL, copy & paste it.
http://www.mrqwest.co.uk/blog/100/ie6
You'll notice that the link above not only includes http://
which should redirect to https://
, and you'll also notice I included the www.
which we also redirected in our last post.
Hitting that link in your browser will actually take you to
https://mrqwest.co.uk/archive/blog/100/ie6?s=100
You'll notice there's no http://
, no www.
and the added /archive
which tells us the redirects are working.