This year’s OpenTech was, as usual, full of interesting and inspiring talks. It’s interesting seeing how it’s come on from its roots in NTK’s Festival of Inappropriate Technology and NotCon. It’s a bit tidier and shinier than its predecessors – no more loyalty card swapping or best carrier bag competitions, and the infamous iPod Shuffle Shuffle was nowhere to be seen. Perhaps it’s a sign of how the UK tech scene has grown up, just a little, as this year’s OpenTech was more serious and down to business, but nevertheless as earnest and excited as ever. Unlike other years I’m not going to cover what happened in painstaking detail (not when you can just follow the hashtag on Twitter) but I have scribbled down some random thoughts spilling off my brain…
The political subtext of government data
One of the reasons we can be more serious these days is that the years of relentless campaigning have paid off, and we are now getting more and more open data from the government, and other sources, to mash up, with a variety of results such as traffic injury maps, finding postboxes or visualising spending cuts.
This is, by and large, fab – getting the full potential out of the data that has been gathered by the government at public expense by letting the public explore it. It’s even possible to envision how this data can be used to disrupt or even disprove party political beliefs and theories by exposing them to cold, hard data. But a deluge of data does not mean the end of theory (as Chris Anderson has expounded). Data is not some cut-and-dried artefact free of politics or prejudices.
Hadley Beeman‘s talk on the challenges facing those playing with such data. She explained often there are things such as reference numbers, acronyms – a whole unspoken culture behind the data – which can get stripped out when presented to the wider public. And this got me thinking about subtext behind datasets; what are the unspoken assumptions being made in their collection, or the process behind the design of the system that has collected them?
For example, lets take performance data from schools; usually visualised as a league table, they become the focus of obsession by parents. The league tables have however become the focus of much ire from within the education profession, BERG have attempted to make more of the data with their Schooloscope project – which looks lovely and more user-friendly than columns of figures in newspapers, but misses the point – the league tables aren’t demonised because of their format, but because they may not accurately represent the performance of a school – they may ignore social disadvantages a school’s intake may suffer from, ignore extra-curricular activity or that “soft” subjects are given the same prominence as “hard” subjects. Which of these factors you think really matters will largely be down to your political beliefs, and conversely, the decisions that led to this data being recorded and the way it was assessed, broken down and analysed will also be politically influenced.
There is a feedback loop as well – recording this data in a particular way can end up affecting the very thing we’re trying to assess. After years of being incentivised to perform better in league tables, schools are now accused by some of being little more than coaching centres for children to pass exams than providing them with a full and rich education to prepare them for life. This is not a universally-agreed fact either, but an opinion shaped and refracted by the critic’s political beliefs; even if you agree it’s happening, you may disagree on whether it’s a good or bad thing. In short, the whole process of collecting data – supposedly simple, neutral and objective – opens a can of political worms and can create polarised debate. Simply opening up data and casting many eyes over it is not going to make the controversies about these data go away. And in fact, by doing so without questioning the subtext, we can end up unwittingly complying with the social and political aims of those who collect it.
This might sound a bit paranoid, and wanky, and so I’ll stress that this should not dampen enthusiasm for doing more with out data. The data being opened up (not just by the government, but by the BBC, the Guardian and many other providers) has so many potential uses and ways of enriching us socially. But at the same time we should always be questioning the provenance of data, think about the decisions that had to be made in structuring that data, and asking not just about the data we have got, but what useful data might be missing.
Context, failure and hindsight
Another thing that Hadley mentioned, and worth considering is that occasionally civil servants make mistakes collating data. At the moment this can be difficult to annotate, to explain where a mistake has happened and how it was made – and this is something we need to encourage, for else how will organisations learn? But to demand this we also need to possess a degree of tolerance ourselves. Hindsight is 20/20, and it’s easy to apportion blame quickly (especially in a post-Twitter age of instant reaction), but doing so may end up being counterproductive. Indeed, a culture of fear may already encouraging civil servants and politicians to stop recording controversial meetings or opinions for fear of being found out later with an FoI request (as anecdotes from Ireland have hinted at). Less apoplectic rage and a greater tolerance of sharing stories of failure are needed if we want free thought and debate inside our governments – indeed, that’s a lesson that has lots of applications outside of government data as well.
Futureproofing and archiving
Bill Thompson gave a stirring talk on the need for archiving our analogue past digitally, before we become so detached from analogue that we don’t think any of it is worth saving. Bill took the pessimistic view that our kids might not archive our analogue stuff, which I think is a little unfair on them, but if that warning spurs us on to get it done then the ends justify the means, I guess.
The obsession with archiving now has struck me as somewhat odd – we live in era where storage space is near-infinitely abundant and yet we are more worried about losing our culture than any other age in history. Did the scribes of the Lindisfarne Gospels factor in the possibility their work would still be around 1,300 years in the future? Even once a cultural artefact has become deemed a classic, preservation has often not been on the minds of those in charge of them – Michelangelo’s David was left outside exposed to the elements for centuries. Even attempts to preserve works, such as with Da Vinci’s The Last Supper, or Stonehenge, can involve damaging or radically altering the original so it is no longer the same as what it was, leaving us potentially with a Ship of Theseus rather than a “genuine” cultural artefact. Then again, in an age where people make money selling fake versions of forgeries, maybe that doesn’t matter so much?
As information has become less scarce (we now apparently produce more bits in two days than we did in all human history up to 2003), paradoxically we’ve become increasingly obsessed with preserving it. Maybe it has something to do with the volatility of our storage – all it takes is your hard disk to be corrupted and you could lose years of your work. Or the effect of the internet on giving us information at our fingertips means we’re now capable of knowing what we would lose if these archives disappeared. Or maybe it’s just hindsight and a selective memory – we lament all those thoughtlessly-wiped episodes of Doctor Who, and are now much more sensitive to data loss, but we’re not so fussed about all the editions of The Cliff Richard Show that got deleted too.
Or maybe it’s because digital archiving implies, with the limitless copying it allows, perfection and immortality. Once a cultural artefact is scanned, ripped and uploaded, then we can make as many copies of the digital version as we like, and that digital version will be perfect – so we don’t have to worry about losing or mutilating the original. But then that relies on an awful lot of assumptions. How long will the hard drives or DVDs we store them on stay true, and will we always have device drivers for them? Will the HTTP protocol, or the JPEG compression algorithm exist in 100 years time? Will we even think of computer data as something stored on machines as 1s and 0s by then?
There is a warning from the not-too-distant past. The BBC Domesday Project of 1986 was an attempt to digitally record Britain on laserdisc, like the original Domesday Book of 1086, yet within 15 years the discs were almost unreadable due to a lack of suitable equipment (thankfully, geeks have now made sure the format lives on). A working group I once attended at Cambridge discussed points like this when talking about approaches to digitising the university’s library; the magnitude of time one person was talking about was in the tens of thousands of years. It was exciting stuff – rarely do we ever consider our future as long as that – but also sobering.
Fortunately, the BBC are more forward-thinking than some other organisations, and the lessons of Domesday were learned a long time ago; judging from their posts about archiving, futureproofing digital formats is foremost in their thinking. Digital doesn’t necessarily entail persistent. As an aside, this can act as a reassurance to those worried that youthful digital transgressions could ruin their future lives. Most of the stuff I’ve created online up until 2003 (when I started this blog and properly archived stuff) has now disappeared into the aether, maybe only accessible through dipping into archive.org; whatever youthful transgressions there may have been are now gone. A lot of data can and does get lost over time.
So yes, let’s archive as much of our analogue past while we still can, we will be the richer for it culturally, but let’s not think it necessarily means it will live forever. And while we’re at it, we should become more comfortable with the notion that it’s okay if we lose some stuff from time to time – it’s a fact of life, and if we get too obsessed with preserving everything, we’ll never have time to make anything new.
Excellent stuff to look out for
In short: Ben Goldacre‘s launching a project to keep track of abandoned or never-published medical trials. Keep also an eye out for Rob McKinnon’s Whoslobbying.com as well. The guys at Young Rewired State showed that despite the relatively poor provision of teaching code in schools, there are some great young talented enthusiastic hackers coming up and making things like this. I missed the talk about Frontline SMS but really like the idea – not everyone has a fancy smartphone after all (see also Terence’s excellent talk on designing for all phones). Finally, I will probably be playing a bit with Scraperwiki and the datasets on data.gov.uk, amongst other things…