Archiving Old Content with 301 Redirects

My last post, we looked at redirecting all traffic to https for a more secure web. This came about following a content audit from my friend Simon who reported a mix of http & https content in my site crawl.

Now that's fixed, we'll be using a similar technique to redirect all old content to the archive.

First, make sure the RewriteEngine is enabled

RewriteEngine On

Next, we're setting a redirect and the simplest method I've seen is this

​Redirect 301 "oldpage.html" "newpage.html"

and that's it. This goes in .htaccess and whenever a user visits oldpage.html the server knows to redirect that user to newpage.html

Because I've got a lot of content to redirect, we're going to be looking at a lot of redirects. Fortunately, when Simon carried out the site audit, he was able to supply me with an excel spreadsheet of every internal link on my website.

I took the entire list of URLs and pasted it in a new spreadsheet. Because of the previous mix of http & https domains, I decided to strip the domain out of each URL. I'm sure there's easier ways of doing this but I did a search for /blog/ and then replaced it with an added comma at the beginning ,/blog/. A minor change but this allowed me to do use Text to Columns and split on the comma we added which then separated the host and the remaining structure.

I deleted the host column leaving me with just the path or URI which will be our old URL

/blog/100/ie6

I then used the CONCAT excel function on the next column to form our new URL. CONCAT joins several cells together, but you can also use it to add new strings to your cells which is what we're going to do here by adding /archive to our old URL.

​=CONCAT("/archive",A1)

This is assuming A1 is our old URL. What this gives us now is a new URI

​/archive/blog/100/ie6

This is where the archived content will live now.

We should have two columns in our excel sheet now, one with the old URI and one with our new URI.

I then moved to the next empty column over and used CONCAT again to formulate my redirect string.

​=CONCAT("Redirect 301 ",A1," ",B1)

This creates a new string made up of 4 parts, split by commas.

The first part, Redirect 301 is our server instruction telling the server to permanently redirect via 301.

The second part, A1 references our old URI in cell A1.

The third part is important but easily missed in the above function. It's a pair of quote marks wrapping a blank space, " ". This tells excel to add a space between cells A1 & B1 which our server redirect needs to differentiate between the old & new urls.

The Fourth and final part, B1 references our new URI in cell B1

So that formula will then provide us with our server redirect string

Redirect 301 /blog/100/ie6 /archive/blog/100/ie6

Copy the formula for all the old URLs gave us a long list (515 in total) of redirect commands which I copied from the excel spreadsheet directly into my .htaccess file.

I then selected a few random links and tested them to make sure it works.

You could try the example above (and testing our other redirects we've done) by going to the following URL, copy & paste it.

http://www.mrqwest.co.uk/blog/100/ie6

You'll notice that the link above not only includes http:// which should redirect to https:// , and you'll also notice I included the www. which we also redirected in our last post.

Hitting that link in your browser will actually take you to

​https://mrqwest.co.uk/archive/blog/100/ie6?s=100

You'll notice there's no http://, no www. and the added /archive which tells us the redirects are working.