Since the middle of last year, I’ve been using a sitemap to help spiders to crawl around my little bit of the web. After looking into the various options, the easiest method for me (by far) was to use the XML-Sitemaps generator but the free version is limited to 500 pages. Upgrading to the paid version was the best $19.99 I ever spent as Google now indexes all my pages (therefore increasing my exposure on the ‘net and hence my advertising revenue, which may be small but is worth having).
Unfortunately, when I tried to run the generator yesterday, it refused to index my blog (which, at the time of writing, represents 98.8% of my website’s pages) but luckily (and this is another reason for having the paid XML Sitemap generator), within a few hours I had an answer to my problem from the administrator of the XML Sitemaps Forum – for some reason, my blog pages contained the following tag:
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
It’s no wonder that the pages were being skipped as this is a directive for robots that says not to index this page and not to follow links!
Now, I didn’t add that tag… so how did it get into my code? It seems that it was added by Blogger. Blogger uses a system of template tags to generate content, one of which is <$Blog MetaData$>
, used to insert all of the blog’s meta data. This has been working for me up to now, but it seems that the upgrade has added the directive for robots not to index my pages, nor to follow links. According to Blogger’s help text, this is only inserted if a blog is set not to be added to listings, but mine has a very definite yes (I do want to be listed):
After replacing the template tag with the correct (manually-inserted) meta data, I was able to crawl the site successfully and create an updated sitemap.
I’m not denying that Blogger is a great system for people starting out with their own blog (and many of the new features are good for more advanced bloggers too) but it seems to me that considering it’s owned by Google (a company with many products that seem to be in perpetual beta) it has more than its fair share of problems and it looks as if a major upgrade has been rushed out of the door (I’ve already had to apologise to subscribers that old posts are creeping back in to the Atom and RSS feeds). I wanted to stay on the old platform for as long as possible but when I logged in a few days back I was given no choice but to upgrade.
Thankfully, my pages didn’t drop out of the Google index (as I upload the sitemap manually and so spotted the error) but this directive may well have affected the way in which other search engines index my site… luckily I caught it within a few days of the offending code being inserted.