Over the last month or so, I have gone Twitter crazy. I’ve been transformed from someone who didn’t “get it” into someone who uses Twitter as his main source of news… leaving behind a big pile of unread RSS feeds from blogs (which is exactly why this blog integrates with my Twitter feed). I’d like to further integrate Twitter with this blog (using something like Twitter Tools) but I’m still on an old release of WordPress and still have a way to go on testing the new site (although you can catch a a sneak preview as I inch forward in my development).
In the meantime, I wanted to archive my “tweets” in order to keep a backup as well as to manually transpose the useful ones (not all of the inane babble) into a blog post – sort of like the ones that come from my Delicious feed (although I use Postalicious for that).
I tried various scripts in Python (this one looked hopeful but it uses a deprecated API call), and PowerShell (incidentally, James O’Neill and Joe Pruitt have done some interesting stuff using PowerShell to interface with Twitter) but eventually I realised that a simple curl command could pull all of my Twitter status updates into one or more local XML files. Stage 2 is working out how to apply XSLT (or some other developer magic) to the XML and present it the way I would like, but at least I know I have a local copy of my tweets. The command I used is simple:
curl -O -k -u username:password “https://twitter.com/statuses/user_timeline.xml?count=100&page=[1-32]”
(thanks to Damon Cortesi for posting this – more information on the statuses user_timeline method can be found in the Twitter API documentation.)
I’d like to give one more piece of advice though: the Twitter API restricts the number of calls you can make in an hour to 150. With TweetDeck polling every minute or so, and this command pulling multiple pages of updates through the API, it didn’t take long for me to hit my limit during testing, so you may like to use the maximum page size of 200 tweets (up to 16 times to pull the maximum of 3200 updates that Twitter allows):
curl -O -k -u username:password “https://twitter.com/statuses/user_timeline.xml?count=200&page=[1-16]”
This gives me the data in XML format but I noticed that I can also get hold of it in JSON, RSS or ATOM format – unfortunately I can’t seem to retrieve results based on multiple parameters (e.g. http://twitter.com/statuses/user_timeline.rss?count=200?screen_name=markwilsonit) so Google Reader (or another RSS reader) is limited to the last 20 updates.
Just before I sign off, I’ll mention that, as I was writing this post, I saw that I’ve even begun to open my colleagues’ eyes to the power of Twitter… David Saxon (@dmsaxon) has just joined the party (mind you he pretty much had to after asking our IT Security guys to remove the proxy server restrictions on Twitter use during core working hours today…). Welcome to the fold Dave.
You can follow me on Twitter @markwilsonit.
Hey Mark, check out http://tweetake.com/ I’ve used it a few times. It may be a little more difficult to automate than curl, but it does make it easy to get your tweet “data” in an .xls file.
Bill
@Bill – thanks for that tip… it looks like a good solution. There are a few others out there too, but I was concerned about anything that didn’t give me a local file (like the .XLS from Tweetake) in case the sites ever ceased to exist! M
In addition to Bill’s suggestion of using Tweetake, Johann Burkard’s TwitterBackup automates the process of backing up Tweets to an XML file.