5 “stars” to linked open data

This content is 12 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Every now and again I have a peek into the world of linked and open data. It’s something that generates a lot of excitement for me in that the possibilities are enormous but, as a non-developer and someone whose career has tended to circle around infrastructure architecture rather than application or information architectures, it’s not something I get to do much work with (although I did co-author a paper earlier this year looking at linked data in the context of big data).

Earlier this year (or possibly last), I was at a British Computer Society (BCS) event that aimed to explain linked data to executives, with promises of building a business case. At that event Antonio Acuna, Head of Data at data.gov.uk presented a great overview of linked and open data*. Although I did try, I was unable to get a copy of Antonio’s slides (oh, the irony!) but one of them sprung to mind when I saw a tweet from Dierdre Lee (@deirdrelee) earlier today:

Star rating of #opendata can be improved sequentially. Describe metadata using #RDF even if content isn't yet #dcat #LinkedData #datadrive
@deirdrelee
Deirdre Lee

The star rating that Dierdre is referring to is Sir Tim Berners-Lee’s 5 star model for linked open data. Sir Tim’s post has a lot more detail but, put simply, the star ratings are as follows:

No star web data Available on the web (whatever format) without an open license
One star open web data Available on the web (whatever format) but with an open licence, to be Open Data
Two star open web data Available as machine-readable structured data (e.g. excel instead of image scan of a table)
Three star open web data As for 2 stars, but in a non-proprietary format (e.g. CSV instead of Excel)
Four star open web data All the above plus, use open standards from W3C (RDF and SPARQL) to identify things, so that “people can point at your stuff”
Five star open web data All the above, plus: link your data to other people’s data to provide context

It all sounds remarkable elegant – and is certainly a step-by-step approach that can be followed to opening up and linking data, without trying to “do everything in one go”.

*Linked and open data are not the same but they are closely related. In the context of this post we can say that open data is concerned with publishing data sets (with an open license) and linked data is concerned with creating links between data sets (open or otherwise) to form a semantic web.

Attribution: The data badges used on this post are from Ireland’s Digital Enterprise Research Institute (DERI), licensed under a Creative Commons Attribution 3.0 License.

 

The annotated world – the future of geospatial technology? (@EdParsons at #DigitalSurrey)

This content is 12 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Tonight’s Digital Surrey was, as usual, a huge success with a great speaker (Google’s @EdParsons) in a fantastic venue (Farnham Castle).  Ed spoke about the future of geospatial data – about annotating our world to enhance the value that we can bring from mapping tools today but, before he spoke of the future, he took a look at how we got to where we are.

What is geospatial information? And how did we get to where we are today?

Geospatial information is very visual, which makes it powerful for telling stories and one of the most famous and powerful images is that of the Earth viewed from space – the “blue marble”. This emotive image has been used many times but has only been personally witnessed by around 20 people, starting with the Apollo 8 crew, 250000 miles from home, looking at their own planet. We see this image with tools like Google Earth, which allows us to explore the planet and look at humankind’s activities. Indeed about 1 billion people use Google Maps/Google Earth every week – that’s about a third of the Internet population, roughly equivalent to Facebook and Twitter combined [just imagine how successful Google would be if they were all Google+ users…]. Using that metric, we can say that geospatial data is now pervasive – a huge shift over the last 10 years as it has become more accessible (although much of the technology has been around longer).

The annotated world is about going beyond the image and pulling out info otherwise invisible information, so, in a digital sense, it’s now possible to have map of 1:1 scale or even beyond. For example, in Google Maps we can look at StreetView and even see annotations of buildings. This can be augmented with further information (e.g restrictions in the directions in which we can drive, details about local businesses) to provide actionable insight. Google also harvests information from the web to create place pages (something that could be considered ethically dubious, as it draws people away from the websites of the businesses involved) but it can also provide additional information from image recognition – for example identifying the locations of public wastebins or adding details of parking restrictions (literally from text recognition on road signs). The key to the annotated web is collating and presenting information in a way that’s straightforward and easy to use.

Using other tools in the ecosystem, mobile applications can be used to easily review a business and post it via Google+ (so that it appears on the place page); or Google MapMaker may be used by local experts to add content to the map (subject to moderation – and the service is not currently available in the UK…).

So, that’s where we are today… we’re getting more and more content online, but what about the next 10 years?

A virtual (annotated) world

Google and others are building a virtual world in three dimensions. In the past, Google Earth pulled data from many sets (e.g. building models, terrain data, etc.) but future 3D images will be based on photographs (just as, apparently, Nokia have done for a while). We’ll also see 3D data being using to navigate inside buildings as well as outside. In one example, Google is working with John Lewis, who have recently installed Wi-Fi in their stores – to use this to determine a user’s location determination and combine this with maps to navigate the store. The system is accurate to about 2-3 metres [and sounds similar to Tesco’s “in store sat-nav” trial] and apparently it’s also available in London railway stations, the British Museum, etc.

Father Ted would not have got lost in the lingerie department if he had Google's mapping in @! says @ #DigitalSurrey
@markwilsonit
Mark Wilson

Ed made the point that the future is not driven by paper-based cartography, although there were plenty of issues taken with this in the Q&A later, highlighting that we still use ancient maps today, and that our digital archives are not likely to last that long.

Moving on, Ed highlighted that Google now generates map tiles on the fly (it used to take 6 weeks to rebuild the map) and new presentation technologies allow for client-side rendering of buildings – for example St Pauls Cathedral, in London. With services such as Google Now (on Android), contextual info may be provided, driven by location and personality

With Google’s Project Glass, that becomes even more immersive with augmented reality driven by the annotated world:

Although someone also mentioned to me the parody which also raises some good points:

Seriously, Project Glass makes Apple’s Siri look way behind the curve – and for those who consider the glasses to be a little uncool, I would expect them to become much more “normal” over time – built into a normal pair of shades, or even into prescription glasses… certainly no more silly than those Bluetooth earpieces the we used to use!

Of course, there are privacy implications to overcome but, consider what people share today on Facebook (or wherever) – people will share information when they see value in it.

Big data, crowdsourcing 2.0 and linked data

At this point, Ed’s presentation moved on to talk about big data. I’ve spent most of this week co-writing a book on this topic (I’ll post a link when it’s published) and nearly flipped when I heard the normal big data marketing rhetoric (the 3 Vs)  being churned out. Putting aside the hype, Google should know quite a bit about big data (Google’s search engine is a great example and the company has done a lot of work in this area) and the annotated world has to address many of the big data challenges including:

  • Data integration.
  • Data transformation.
  • Near-real-time analysis using rules to process data and take appropriate action (complex event processing).
  • Semantic analysis.
  • Historical analysis.
  • Search.
  • Data storage.
  • Visualisation.
  • Data access interfaces.

Moving back to Ed’s talk, what he refers to as “Crowdsourcing 2.0” is certainly an interesting concept. Citing Vint Cerf (Internet pioneer and Google employee), Ed said that there are an estimated 35bn devices connected to the Internet – and our smartphones are great examples, crammed full of sensors. These sensors can be used to provide real-time information for the annotated world: average journey times based on GPS data, for example; or even weather data if future smartphones were to contain a barometer.

Linked data is another topic worthy of note, which, at its most fundamental level is about making the web more interconnected. There’s a lot of work been done into ontologies, categorising content, etc. [Plug: I co-wrote a white paper on the topic earlier this year] but Google, Yahoo, Microsoft and others are supporting schema.org as a collection of microformats, which are tags that websites can use to mark up content in a way that’s recognised by major search providers. For example, a tag like <span itemprop="addresscountry">Spain</span> might be used to indicate that Spain is a country with further tags to show that Barcelona is a city, and that Noucamp is a place to visit.

Ed’s final thoughts

Summing up, Ed reiterated that paper maps are dead and that they will be replaced with more personalised information (of which, location is a component that provides content). However, if we want the advantages of this, we need to share information – with those organisations that we trust and where we know what will happen with that info.

Mark’s final thoughts

The annotated world is exciting and has stacks of potential if we can overcome one critical stumbing point that Ed highliughted (and I tweeted):

In order to create a more useful, personal, contextual web, organisations need to gain our trust to share our information #DigitalSurrey
@markwilsonit
Mark Wilson

Unfortunately, there are many who will not trust Google – and I find it interesting that Google is an advocate of consuming open data to add value to its products but I see very little being put back in terms of data sets for others to use. Google’s argument is that it spent a lot of money gathering and processing that data; however it could also be argued that Google gets a lot for free and maybe there is a greater benefit to society in freely sharing that information in a non-proprietary format (rather than relying on the use of Google tools). There are also ethical concerns with Google’s gathering of Wi-Fi data, scraping website content and other such issues but I expect to see a “happy medium” found, somewhere between “Don’t Be Evil” and “But we are a business after all”…

Thanks as always to everyone involved in arranging and hosting tonight’s event – and to Ed Parsons for an enlightening talk!