So you thought that old version of your website was gone forever? It may have been a little naive of me, but I figured that once I put up a new version of my website, then that was it, the old one was overwritten.
Not so, it seems – today I stumbled across the Internet archive wayback machine, which is a service that allows people to enter a URL, select a date range, and then surf on an archived version of the website. Scarily, I was able to search on old versions of my website going back several years. Not everything is in there, it takes a while to load, many graphics are missing, and if a site wasn’t picked up by the Internet archive crawler then it just won’t appear, but how about seeing old versions of www.microsoft.com?
I guess this can be useful. For example, I used to work for a company called ICL. That name is long since consigned to the history books (they are now trading as Fujitsu Services), but it is still available on the wayback machine. I managed to find a press release from back when the BBC and ICL jointly announced BBC Online in September 1996; as well as what ICL was saying about millennium date compliance in the middle of 1997.
Most web administrators will know that they can control web crawlers (like the one behind the Internet archive) using a robots.txt file in the root of the site (there is even an online robots.txt generator). After the robots.txt file is loaded in the root of the webserver, the wayback machine can be forced to crawl the site, pick up the new file, and remove all documents.
Now it seems I need to go and update the robots.txt files on my websites…