Reducing website errors with HTTP 301 redirects

This content is 12 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

A couple of weeks ago, I wrote about a WordPress plugin called Redirection. I mentioned that I’ve been using this to highlight HTTP 404 errors on my site but I’ve also been using the crawl errors logged by Google’s Webmaster Tools to track down a number of issues resulting from the various changes that have been made to the site over the  years, then creating HTTP 301 redirects to patch them.

Redirections as a result of other people’s mistakes

One thing that struck me was how other people’s content can affect my site – for example, many forums seem to abbreviate long URLs with … in the middle. That’s fine until the HTML anchor gets lost (e.g. in a cut/paste operation) and so I was seeing 404 errors from incomplete URLs like http://www.markwilson.co.uk/blog/2008/12/netboo…-file-systems.htm. These were relatively easy for me to track down and create a redirect to the correct target.

Unfortunately, there is still one inbound link that includes an errant apostrophe that I’ve not been able to trap – even using %27 in the redirect rule seems to fail. I guess that one will just have to remain.

Locating Post IDs

Some 404s needed a little more detective work – for example http://www.markwilson.co.uk/blog/2012/05/3899.htm is a post where I forgot to add a title before publishing and, even though I updated the WordPress slug afterwards, someone is linking to the old URL.  I used PHPMyAdmin to search for post ID 3899 in the wp_content table of the database, from which I could identify the post and  create a redirect.

Pattern matching with regular expressions

Many of the 404s were being generated based on old URL structures from either the Blogger version of this site (which I left behind several years ago) or changes in the WordPress configuration (mostly after last year’s website crash). For these I needed to do some pattern matching, which meant an encounter with regular expressions, which I find immensely powerful, fascinating and intimidating all at once.

Many of my tags were invalid as, at some point I obviously changed the tags from /blog/tags/tagname to /blog/tag/tagname but I also had a hierarchy of tags in the past (possibly when I was still mis-using categories) which was creating some invalid URLs (like http://www.markwilson.co.uk/blog/tag/apple/ipad).  The hierachy had to be dealt with on a case by case basis, but the RegEx for dealing with the change in URL for the tags was fairly simple:

  • Source RegEx: (\/tags\/)
  • Target RegEx: (\/tag\/)

Using the Rubular Ruby RegEx Editor (thanks to Kristian Brimble for the suggestion – there were other tools suggested but this was one I could actually understand), I was able to test the RegEx on an example URL and, once I was happy with it, that was another redirection created.  Similarly, I redirected (\/category\/) to (\/topic\/).

I also created a redirection for legacy .html extensions, rewriting them to .htm:

  • Source RegEx: (.*).html
  • Target  RegEx: $1.htm

Unfortunately, my use of a “greedy” wildcard meant this also sustituted html in the middle of a URL (e.g. http://www.markwilson.co.uk/blog/2008/09/creating-html-signatures-in-apple-mail.htm became http://www.markwilson.co.uk/blog/2008/09/creating-.htm-signatures-in-apple-mail.htm) , so I edited the source RegEx to (.*).html$.

More complex regular expressions

The trickiest pattern I needed to match was for archive pages using the old Blogger structure.  For this, I needed some help, so I reached out to Twitter:

Any RegEx gurus out there who fancy a challenge, please can you help me convert /blog/archive/yyyy_mm_01_archive.htm to /blog/yyyy/mm ?
@markwilsonit
Mark Wilson

and was very grateful to receive some responses, including one from Dan Delaney that let me to create this rule:

Source RegEx: /blog\/([a-zA-Z\/]+)([\d]+)(\D)(\d+)(\w.+)
Target RegEx: /blog/$2/$4/

Dan’s example helped me to understand a bit more about how match groups are used, taking the second and fourth matches here to use in the target, but I later found a tutorial that might help (most RegEx tuturials are quite difficult to follow but this one is very well illustrated).

A never-ending task

It’s an ongoing task – the presensce of failing inbound links due to incorrect URLs means that I’ll have to keep an eye on Google’s crawl errors but, over time, I should see the number of 404s drop on my site. That in itself won’t improve my search placement but it will help to signpost users who would otherwise have been turned away – and every little bit of traffic helps.

Redirection – an essential plug-in for WordPress users

This content is 12 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Last year, a combination of a loss of service from my hosting provider and my appalling backups meant that this website was temporarily wiped off the face of the Internet. It’s never recovered – at least not in terms of revenue – and it taught me an important lesson about backups (it’s all too easy to forget the hours of effort that go into a “hobby” site like this one…).

Whilst the blog posts were restored, and I took the opportunity to apply a new theme to the site (it’s probably due another one now…) but some of the images had got AWOL along the way. I’ve been ignoring that (mostly) but decided I really should do something about it when an old post was picked up by a journalist today and I realised it had a missing graphic.

I remembered a WordPress plugin that I used on another site recently, for managing redirects when access to the .htaccess file is not available. The plug-in, written by John Godley, is called Redirection, and one of its modules will report on HTTP 404 errors, like the ones that my missing graphics will create. I know there are other tools that can do this for me (Google’s Webmaster Tools, for example, or trawling through the web logs) but it’s an easy way to see when a 404 has been returned in order to investigate accordingly.  So far this afternoon, I’ve tracked down and replaced around 8 missing graphics and one broken permalink using the logs from Redirection.  I’m now scanning through the rest of John’s plugins to see what else I’m missing and will certainly be donating later…

Disabling comments for all posts on a WordPress blog

This content is 13 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Long-time readers of my blog will know that I used to manage the Fujitsu UK and Ireland CTO Blog (which we’ve recently closed, but have left the content in place for posterity) and I’m still getting the comment notifications (mostly spam).  Many of the posts have HTTP 301 redirects to either mine or David Smith‘s blogs (I found a great WordPress plugin for that – Redirection) but, for those that remain, I wanted to turn off comments.  Doing this individually for each post seemed unnecessarily clunky but there is, apparently, no way to do this from the WordPress user interface (with database access it would have been straightforward but I don’t have that level of access).

There is a plug-in that globally disables all comments – named, rather aptly, Disable Comments – except that the blog is part of a multi-site (network) install and I’m not sure what the broader impact would be…

No bother, I found a workaround – simply set all of the posts to close comments after a certain number of days. The theme that someone has applied to the site (since I stopped working with it) doesn’t seem to respect that, and still leaves a comment button visible, but anyone with a well-developed theme should be OK…

Deleting large quantities of Facebook notes

This content is 13 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

A few years ago, I followed the example of a “social media guru” and set Facebook up to consume my blog’s RSS feed and republish each post as a note.

This was A Bad Thing for a number of reasons, not least:

  1. Copyright – I’m sure that when I upload anything to Facebook, I give them some rights over it (which is why my images are still on Flickr).
  2. Traffic – reproducing content on Facebook might get eyeballs, but it takes that traffic away from your own website and only Facebook gains any revenue. This may be OK if you are selling goods/services that can be monetised via Facebook links but my revenue is from ads: ads on my site = revenue for me; ads on a Facebook copy = revenue for Facebook.
  3. Layout – invariably, despite my best efforts to write good XHTML, the blog posts look better on my site than when scraped into Facebook as notes.

I turned off the feed but deleting the notes was far from trivial. There is no bulk delete option that I can find, and that meant opening each note, scrolling down, clicking delete, etc. In a word, tedious.

I forgot about the notes until last week, when I switched over to timeline view. Arghh. Yes. Must delete those…

…and then I found another method – much quicker – using the iOS Facebook app.

By opening the Notes section of the Facebook app on my iPad, a quick swipe and press was all it took to delete each note. Still tedious, but a lot quicker to get through…

Adding extra social sharing services to WordPress with JetPack (ShareDaddy)

This content is 13 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Last night, as part of the rebuild of this site, I reinstated the social sharing links for each post. In the old site they had been implemented as bespoke code using each social network’s recommended approach (e.g. Twitter or Facebook‘s official button codes) but presentation becomes problematic, with each button having a slightly different format and needing some CSS trickery to get it right.

I looked into a variety of plugins but they all had issues – either with formatting or functionality – until I stumbled across reference to WordPress.com’s social sharing capabilities.  If only I could have that functionality on a self-hosted (WordPress.org) site…

…As it happens, I can – WordPress.com’s social sharing is based on the ShareDaddy plugin, which is part of a collection called JetPack. ShareDaddy is also available as a freestanding plugin but now I have JetPack installed I’m finding some of the other functionality it gives me useful (and it’s not possible to activate ShareDaddy if you have JetPack installed).

I need to make some changes (like working out how to hack the code and turn off the count next to my Tweet/Like/+1 buttons – it’s embarrassing when the number is small!) but I’m happy enough with the result for now.  One thing I did need to do though was to add some services that are not yet in the JetPack version of the plugin (one of the major advantages of ShareDaddy is how simple it is to do this).

Rebuilding my site: please excuse the appearance

This content is 13 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Regular readers may have noticed that this site is looking a little… different… right now.

Unfortunately, my hosting provider told me last night that they had a disk failure on the server. Normally that wouldn’t be a problem (that’s why servers have redundant components right? Like RAID on the disks?) but it seems this “server” is just a big PC. I can’t get too mad though… the MySQL database backup scripts have been failing for a month and it was my sloppyness that didn’t chase that up, and it was me who hadn’t made sure I had a recent copy of the file system…

So, as things stand:

  • I think I have restored all posts from 2004 until almost the end of August 2011;
  • I need to restore the later posts and comments (using copies from FeedBlitz, Google Reader, etc.);
  • There are no plugins (so things look odd); Some of the plugins have been reinstalled (but things may still look odd);
  • There are no graphics (they were hosted outside WordPress) I’ve restored all most of the graphics and other external media but there are still some I need to track down;
  • I have not restored the theme (so I’m using the WordPress defaults and there is no mobile theme);
  • The theme I’m using does not specify UTF-8 encoding so lots of  characters; Still some spurious characters appearing on some pages…
  • There are no fewer ads (which you might be happy about, but I do still need to pay the bills).

Please bear with me whilst I get things back… it may take some time as it needs to fit in between other activities but it might also be a good thing (new theme has been long overdue and I might even get smarter about my backups…).

And, if you spot another problem, please let me know.

[Updated at various points as the site has been restored]

Attempting to track RSS subscribers on a WordPress blog

This content is 14 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

As well as my own website (which has precious little content these days due to my current workload), I also manage the Fujitsu UK and Ireland CTO Blog. Part of that role includes keeping an eye on a number of metrics to make sure that people are actually interested in what we have to say (thankfully, they seem to be…). Recently though, I realised that, whilst I’m tracking visitors to the blog, I’m missing hits on the RSS feed (because it’s not actually a page with the tracking script included) - and that’s a problem.

There are ways around this (I use Google Feedburner on my own blog, or it’s possible to put a dummy page with a meta refresh in front of the feed to pick up some metrics) but they have their own issues (for example the meta refresh methods breaks autodiscovery for some RSS readers) and will only help with new subscribers going forwards, not with my legacy issue of how many subscribers do I have right now.

There is another approach though: using a popular web-based RSS subscription service like Google Reader to see how many subscribers it tracks for our feed (the same metrics are available from Google’s Webmaster Tools).  The trouble is, that’s not all of the subscribers (for example, a good chunk of people use Outlook to manage their feeds, or other third-party RSS readers). If I use my own blog as an example, Google Reader shows that I have 247 subscribers but Feedburner says I have 855.  Those subscribers come from all manner of feed readers and aggregators, email subscription services and web browsers (Firefox accounts for almost 20% of them) so it’s clear that I’m not getting the whole picture from Google’s statistics. 

Google Reader Subscribers

Google Feedburner Subscribers

Does anyone have any better ideas for getting some subscriber stats for RSS feeds on a WordPress blog using Google Analytics? Or maybe from the server logs?

Google Analytics: Honing in on the visits that count

This content is 14 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Every week I create a report that looks at a variety of social media metrics, including visits to the Fujitsu UK and Ireland CTO Blog.  It’s developing over time – I’m also working on a parallel activity with some of my marketing colleagues to create a social media listening dashboard – but my Excel spreadsheet with metrics cobbled together from a variety of sources and measuring against some defined KPIs seems to be doing the trick for now.

One thing that’s been frustrating me is that I know a percentage of our visits are from employees and, frankly, I don’t care about their visits to our blog.  Nor for that matter do I want my own visits (mostly administrative) to show in the stats that I take from Google Analytics.

I knew it should be possible to filter internal users and, earlier this week, I had a major breakthrough.

I created an advanced segment that checked the page (to filter out one blog from the rest of the content on the site) and the source (to filter anyone whose referral source contained certain keywords – for example our company name!).  I then tested the segment and, hey presto – I can see how many results apply to each of the queries and the overall result – now I can concentrate on those visits that really matter.

Google Analytics advanced segment settings to remove internal referrals

Of course, this only relates to referrals, so it doesn’t help me where internal users access the content from an email link (even if I could successfully filter out all the traffic via the company proxy servers, which I haven’t managed so far, some users access the content directly whilst working from home), but it’s a start.

The other change was one I made a few months ago, by defining a number of filters to adjust the reporting:

Unfortunately filters do not apply retrospectively, so it’s worth defining these early in the life of a website.

London Bloggers Meetup (#LBM): January 2011 – 10 lessons and tips for blogging

This content is 14 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

A couple of nights ago, I went along to the London Blogger’s Meetup – which is basically a big social event for bloggers! It’s wierd, most of the bloggers I meet normally are in tech but I’m never stopped to think that an event like this doesn’t just attract geeks like me (duh!).  I’m a bit shy at these things, but I did meet some great people – as well as lusting after the Dell Vostro V130 laptop that was given away.

The highlight of the evening though was Andy Bargery’s short presentation giving 10 lessons and tips for blogging.  Andy has shared the Prezi and I’ve embedded it below:

Blog Recap: 2010 in review

This content is 14 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

A couple of days ago, SQL Server MVP, Brent Ozar took a look back at what he’d been posting on his blog in 2010. I thought that was a good idea, so I’m shamelessly stealing his idea to highlight some of the key posts from the last twelve months on this blog. There were many more, technically-focused, ones but these are a good summary of the year’s events:

January

February

March

April

May

June

July

  • Move along folks, nothing to see here (well, there were a couple of posts, but nothing really worth shouting about)…

August

September

October

November

December

  • Tumbleweed (and some geekery) – although there are plenty of posts in the pipeline for next year.

Even though 2010 was a quiet year on the blog (120 posts this year is a record low – especially when considering I averaged almost one a day in 2008!), I did win a Computer Weekly Blog Award, and I have been busy elsewhere:

As for 2011, well, expect this blog to remain one of my main online activities but, as I spend less time working directly with technology and more working on strategic IT issues, the focus is changing.  Indeed, some people think blogging is dead (it’s not) – others say it is now more about content marketing! Whatever the semantics, I’ll be here for a while yet. Thanks to everyone who reads my “stuff” and engages with me – whether it’s as a blog comment, an e-mail or a tweet – and have a happy and prosperous 2011.