Content Audits
As part of my move from my old Perch CMS system to my current 11ty setup, I want to archive my old content for several reasons.
- Its old content, most stretching back nearly 10 years so a lot of what is written is irrelevant or not modern practices.
- Whilst most of the content is redundant now, I don't simply want to delete it. I do want to keep it for posterity and whilst it's redundant to me, it may be useful to someone.
- Converting it all over to
markdown
for 11ty to pickup is a bit of a daunting task, but maybe that's something for a rainy day
The majority of the old content is blog posts so I could simply redirect all visits to mrqwest.co.uk/blog
over to mrqwest.co.uk/archive/blog
but that will mean I can no longer use the /blog
directory because it'll be redirected to the archive. I did toy with storing all my new posts under a separate header, /writing
perhaps, or /words
but it didn't feel right.
So the only alternative is to add 301 redirects for each page.
My friend Simon was able to crawl my existing site and produce a content report which lists all internal pages, nicely saved in a spreadsheet. This has given me some great insight into how poorly my old site was written (sorry!).
What happened?
My last site build was a hasty effort. I had wanted to redesign for a long time but I wasn't happy with what I was designing. Several designs later, I realised I was wasting time and that I'd never be happy so instead of working on one design to get it right and then launch, I pulled all design out of the live site and started building it back slowly and in public rather than building the site on my local server first. I did this to try and force my hand a bit.
What actually happened though is that I pulled the design and then rushed through bits and pieces to make it work and look presentable with the view of going back at a later date and fixing it... That was several years ago and here we are now. The code was sloppy, the responsive-ness wasn't great and the site never felt coherent.
So what's the problems?
Of which, there were many. I hadn't realised how badly my previous site behaved until I looked over it properly. How bad is that?
duplicate content
Simons' report had over 11oo entries of content on my site. I knew this to be incorrect because Perch tells me I've got 250 blog posts in the database.
The report did pick up some sub-directories but none of the actual content so that removed some of the links. It had also picked up all of my blog tags and the corresponding archive pages to each one so that also removed some of the links but I still had over a 1000.
Looking through the links, it became apparent that all of the content from my previous previous site was being crawled too.
An example would be a post I wrote back in 2011 about a small web conference I attended in Brighton called Insites. This was being crawled 3 separate times
- http://mrqwest.co.uk/blog/168/2011-07-19-insites-brighton
- https://mrqwest.co.uk/blog/168/2011-07-19-insites-brighton
- http://mrqwest.co.uk/blog/168/insites-brighton
All content pulled in from my old old site had the slug
prepended with the post date
, in this instance, 2011-07-19
and was linked both via http
& https
and then the post was crawled again without the post date
.
There were no canonical tags set up to try and unite everything and it's just a mess.
Weirdly, the post is only in the database once, so it's not duplicated in the database, it's just there's multiple URLs to access the same content with no rule to define the real url.
Moving forward, 11ty allows you to provide a slug for each post which should remove this issue.
And all pages will be https moving forward, like it should.
Malformed links
Simons report also advised that a lot of my pages had malformed links. Upon review, my title links were broken and resulted in 404s.
Where was my testing?
Headings
Simon also advised that all my page titles were in fact, h3
tags where they should be h1
. Not sure what my reasoning was before but I had the 'MrQwest' logo as a h1
on every page, no h2
s and the title for each page was a h3
. Ooof.
conclusion
So following the site audit, I need to do the following
- remove some of the irrelevant posts and redirect each one to a
moved
page - Create a
moved
page - Set canonical links for each post
- redirect http to https
- Perhaps understand content hierarchy a bit better
- Give past me a swift kick.
- Redirect all old content posts to the archive ready to start afresh!