Content Audits

As part of my move from my old Perch CMS system to my current 11ty setup, I want to archive my old content for several reasons.

  1. Its old content, most stretching back nearly 10 years so a lot of what is written is irrelevant or not modern practices.
  2. Whilst most of the content is redundant now, I don't simply want to delete it. I do want to keep it for posterity and whilst it's redundant to me, it may be useful to someone.
  3. Converting it all over to markdown for 11ty to pickup is a bit of a daunting task, but maybe that's something for a rainy day

The majority of the old content is blog posts so I could simply redirect all visits to mrqwest.co.uk/blog over to mrqwest.co.uk/archive/blog but that will mean I can no longer use the /blog directory because it'll be redirected to the archive. I did toy with storing all my new posts under a separate header, /writing perhaps, or /words but it didn't feel right.

So the only alternative is to add 301 redirects for each page.

My friend Simon was able to crawl my existing site and produce a content report which lists all internal pages, nicely saved in a spreadsheet. This has given me some great insight into how poorly my old site was written (sorry!).

What happened?

My last site build was a hasty effort. I had wanted to redesign for a long time but I wasn't happy with what I was designing. Several designs later, I realised I was wasting time and that I'd never be happy so instead of working on one design to get it right and then launch, I pulled all design out of the live site and started building it back slowly and in public rather than building the site on my local server first. I did this to try and force my hand a bit.

What actually happened though is that I pulled the design and then rushed through bits and pieces to make it work and look presentable with the view of going back at a later date and fixing it... That was several years ago and here we are now. The code was sloppy, the responsive-ness wasn't great and the site never felt coherent.

So what's the problems?

Of which, there were many. I hadn't realised how badly my previous site behaved until I looked over it properly. How bad is that?

duplicate content

Simons' report had over 11oo entries of content on my site. I knew this to be incorrect because Perch tells me I've got 250 blog posts in the database.

The report did pick up some sub-directories but none of the actual content so that removed some of the links. It had also picked up all of my blog tags and the corresponding archive pages to each one so that also removed some of the links but I still had over a 1000.

Looking through the links, it became apparent that all of the content from my previous previous site was being crawled too.

An example would be a post I wrote back in 2011 about a small web conference I attended in Brighton called Insites. This was being crawled 3 separate times

All content pulled in from my old old site had the slug prepended with the post date, in this instance, 2011-07-19 and was linked both via http & https and then the post was crawled again without the post date.

There were no canonical tags set up to try and unite everything and it's just a mess.

Weirdly, the post is only in the database once, so it's not duplicated in the database, it's just there's multiple URLs to access the same content with no rule to define the real url.

Moving forward, 11ty allows you to provide a slug for each post which should remove this issue.

And all pages will be https moving forward, like it should.

Malformed links

Simons report also advised that a lot of my pages had malformed links. Upon review, my title links were broken and resulted in 404s.

Where was my testing?

Headings

Simon also advised that all my page titles were in fact, h3 tags where they should be h1. Not sure what my reasoning was before but I had the 'MrQwest' logo as a h1 on every page, no h2s and the title for each page was a h3. Ooof.

conclusion

So following the site audit, I need to do the following