Archive for January, 2009

So you want to be a geek…

23 January 2009

Charles Arthur had a nice post at the weekend entitled: If I had one piece of advice to a journalist starting out now, it would be: learn to code. As any modern journalist is able to Google around for facts, Charles tells any budding journo to set themselves above and beyond the normal set of “IT skills”; being able to get a more powerful grip on data is now becoming part of what a journalist should know:

None of which is saying you shouldn?t be talking to your sources, and questioning what you?re told, and trying to find other means of finding stuff out from people. But nowadays, computers are a sort of primary source too. You?ve got to learn to interrogate them effectively – and quote them meaningfully – too.

It’s great advice – playing with data and getting a feel for how to get the best out of it not only helps you find new things out, but also helps open your mind up to a more healthy appreciation of data. It allows you to explore the possibilities of data as well as its flaws, when it can be trusted and when it should be taken with a pinch of salt. And it’s things like this that contribute towards a sense of joyful skepticism that any self-respecting geek should possess (and you thought it was just about watching every episode of Battlestar Galactica).

I gave up programming as a full-time career more than three years ago but have still kept my hand in programming since, either for fun or to make work quicker and easier. Working in the digital and social media PR sector isn’t just about going to the pub (truth be told, it’s actually about going to very expensive pubs) but also about dealing with vast quantities of information – so you can see how programming can help. Making tasks faster is part of it, but the programming mindset is equally if not more important: it has taught me skills such as looking to optimise and make things quicker, filtering noise from the signal, reusing what you have to save effort in the future, and not being surprised by the unexpected.

So I’m going to say what Charles Arthur said, but bigger. If you work in any information industry, or are thinking about a career in it, learn to code. And by code I don’t mean learn something hardcore like Java or C++, or even learn a full programming language (as you’ll see below). But it means getting above the usual abstractions you see – your web browser, Word, Excel – and getting involved at a deeper level, get to appreciate what the data it is you’re reading and realise it’s not just something to look at.

So where and what would I recommend getting started with programming? From my own weary geek’s viewpoint, here’s six ways of getting into it – three of which really aren’t strictly programming at all:

Regular expressions. I cannot begin to think how many times these have bailed me out of an otherwise unrecoverable situation. Regular expressions are ways of finding and replacing text that are much more powerful than the bog standard. For example, you might want to get all the telephone numbers or postcodes, out of a document, but they are all different so a simple search wouldn’t be able to do it, so you have to do it by hand (and might miss one out).

A regular expression on the other hand can say “find me any group of eleven digits that begin with a 0, and either match the patern 0xx xxxx xxxx, or 0xxxx xxxxxx” – and bingo, you have all your phone numbers. Get clever and you can even tell it to not worry about whether it’s a space or a hyphen in between the groups of numbers. Be careful – they can get complicated, so build them up slowly and step by step – and they can do unexpected things, so always back stuff up.

CSV. Many people work with Excel spreadsheets and while it is great for tabulating data it isn’t a very portable format. Often you want to copy data in or out of Excel into other applications and it ends up being a horrible mess of numbers separated by spaces and tabs that you have to re-align yourself. CSV (comma-separated values) is the very boring but portable way of getting data in and out of Excel – it just consists of text with no styles, with commas to mark in between each column.

CSV looks like shit but it makes up for it by being able to be extremely portable and lightweight. Combined with regular expressions above and you’re able to take the useful data out of a horrible mess, replace everything between it with commas, and you can now import it straight into a spreadsheet. Or vice-versa – extracting numbers out of the spreadsheet and allowing other apps to play with it (like I did with the general election map)

Yahoo! Pipes. I am still waiting for Yahoo to piss this one up against the wall like they have done with Technorati and Flickr. So much of the web already runs on RSS (Really Simple Syndication) – streams of links and articles – that being able to manipulate them like this is a real boon. Yahoo! Pipes takes RSS feeds and allows you to merge them together, filter them, cross-reference them and more. When I was looking for a job last year I used a series of Pipes to pull feeds from various job websites, filter out the kinds of jobs I didn’t like, and then remove the duplicates so I wasn’t wasting my time – all delivered to Google Reader for easy perusal, as and when they came in. The interface is as reasonably usable as you could expect and has led to some really useful apps being created.

JavaScript. JavaScript is indispensible part of the web, although originally it seemed destined for little more than launching popups and stupid messages in your status bar. Now virtually every page is interactive in some way, and JavaScript’s true power is being exploited. One of the most obvious ways of getting it to work for you is Greasemonkey, which allows you to add scripts to change the behaviour of what appears on your screen – such as making Google Reader more readable, getting rid of Xeni Jardin, or (with the help of regular expressions) making postcodes turn into links of maps.

Python. The biggie and the one I use the most. Python‘s strengths lie in its simplicity – it’s quite simple and human-friendly and runs pretty much on anything. It also has a sensible structure and organisation, which teaches you to code well and clearly. Finally, the vast libraries available mean you can play with pretty much any data format, such such as BeautifulSoup, which allows you extract data from webpages easily. Python’s one drawback is that it falls down on its relatively poor documentation & tutorials, with some honourable exceptions such as Mark Pilgrim, so do hunt around and don’t let the technicalese put you off.

Finally, PHP gets an honourable mention – a easy enough language to learn and used widely, but with so many evolutions and a complicated past the language is a mess, and it teaches several bad programming habits.

I wouldn’t recommend doing all six at once, or even ever, nor would I set expectations too high. In some respects, it’s not even about the code or the results you get – it’s as much about the philosophy and understanding it brings with it: that data is not a static thing but ours to play with, making us able to create wonderful new things or change society for the better.


Single-serving Tumblelogs

13 January 2009

Last year, Jason Kottke charted the rise of the single-serving site – “web sites comprised of a single page with a dedicated domain name and do only one thing”, as he puts it. They range from the facetious (IsItChristmas.com), to the possibly useful (IsLostARepeat.com), to the downriight marvellous (ItsNotLup.us). They’re almost the anti-Web 2.0 – uninteractive, dry-looking – devoted to a single psychotic purpose, neatly spelt out in the URL.

Since then, another kind of sites with a single purpose has sprung up – they can feature content – typically photos – and just that. They’re updated, but unlike blogs focused on single subjects, there’s no commentary, no snarking, no linky enthusiasm. It’s just content shovelled up for you straight away – and a lot of them use Tumblr. Brokers With Hands on their Faces, Garfield Minus Garfield and Fuck Yeah Sharks are three of my favourites, whiile Bale Yeah! and White People Trying to Look Serious get honourable mentions. The less commentary the better – let the content do the talking. Dude Totally Punched A Horse slips on this last one – it should just be the videos. Ditto for Arrested Development Stills – just leave them be!

Tumblr is ideal for this kind of streaming – having managed a super-secret Tumblr myself these past few months (I’ll reveal all sooner or later), the interface is a cinch, and as comments are not enabled by default it saves the messiness of moderating or dealing with the content. It’s halfway between single-serving and full-on blogging, leaving you free to obsess about whatever you obsess about. It’s light, fun and often hilarious. Long may it continue.


The Voice of Fate

9 January 2009

thevoiceNote: Mostly written while watching the film version of V for Vendetta over Christmas with a hangover, spoilers galore for both it and the book within, so proceed at your own risk.

Of the many things wrong with the Wachowski Brothers’ flawed adaptation of V for Vendetta, the omission of the computer Fate is by far the biggest. Fate is the computer that runs the society in V’s alternate future; it hooks into to the surveillance systems used throughout British society and makes all the decisions. As the novel progresses, the high chancellor Adam Susan, supposedly the fascist dictator in charge of society, turns out to be in thrall to Fate’s machinations, believing it to be a goddess; with it his truly wretched and lonely character is revealed

From Fate and her omniscience and omnipotence, all the best complexities of the characters come – for example, the curious hidden nature of Lewis Prothero, the “Voice of Fate”, a sociopathic concentration camp commandant with a nevertheless seductively charismatic voice (and a natty line in girls’ dolls). In the book he is the human voice of the computer, broadcasting sonorously to the nation, but in the film, robbed of his duality he gets turned into a shitty cross between Richard Littlejohn and Bill O’Reilly, ranting away incoherently on national television every night.

Despite bring set in Britain, the Wachowskis’ adaptation is very Americocentric (as demonstrated by the recharacterisation of Prothero); it details a narrative based on opposition to the neoconservative agenda in America and the resulting foreign policy; the film is peppered with references to the Iraq war, Islamophobia and homophobia, and the bioterror plot within is a little reminiscent of the 9/11 conspiracy theories. The film is very much a product of the early 2000s – and with the crushing defeat of the neoconservatives in the US mid-term and presidential elections, now already seeming a little dated.

With this in mind, the more I think about it, the better allegory for our times doesn’t come from the post-war on terror ostentatious authoritarianism but on the Fate plotline, of a more insidious system of control. Successive governments have become increasingly in thrall to mass surveillance, but it has especially been the case with the present one – whether it be CCTV cameras, the national identity register, DNA databases (even if you’re innocent), mass-snooping of emails and phone calls, or even outright hacking of your computer without a warrant.

fate_smessage

And thrall is the right word to use here, as decisions are made not on evidence based on their efficacy but on an ideology that the more is more: the more data the government has, the more able it is to govern. Focusing on the quantity rather than the relevance of data has various unfortunate consequences; we fall risk to garbage in-garbage out: supposedly reliable databases turn out to be heavily flawed. It leads to greater risk of security breaches, whether they be accidental or malicious. And most importantly it leads to a system of governance where everybody is treated as a datapoint – and thus governments manipulate people just like they would like to manipulate datapoints. The end result is a dehumanised and rather bleak polity, with every facet of public service characterised with targets, performances and star ratings, human beings reduced to automata in a fabulous number-crunching system.

There’s another twenty blog posts I could write on this theme, but I won’t for now. But do check Adam Curtis’ The Trap as a primer on it from a philosophical/psychological point of view; The Tiger That Isn’t by Michael Blastland and Andrew Dilnot for a mathematical examination; there is no equivalent from a sociotechnical or economical aspect exists, as far as I know.

Anyway, back to V for Vendetta, and Alan Moore. The comic was set against the spectre of nuclear war (from which the putative fascist Britain would rise), with a hint of a warning about where the right-wing agenda of the early Thatcher government could take us. And through this system, the monstrous system of Fate is created, and we are beholden to it. The odd thing is that we’re being taken towards the end without going through the intermediate stages – which is a relief in some ways (eating dead rats out of radioactive rusty hubcaps is never a good thing) but also oddly chilling. The pessimistic conclusion is that supreme control and omniscience is the goal of anyone in power and with the technology at hand it’s an inevitability. The optimistic conclusion is that despite the steady encroachment, it’s never too late to turn it back, if only we have the will. What’s it going to be? Hopefully it is not a matter of Fate.


Daily Mail-o-matic: Now updated

7 January 2009
ARE ASYLUM SEEKERS GIVING PROPERTY PRICES CANCER?

Back in 2003, before I even started blogging, I created the Daily Mail Headline Generator, and within a couple of weeks a friend suggested some extra things to put in it. “Good idea”, I said, “I’ll update the code when I have the time”.

I’m nothing if not prompt, so a mere six years later, I’ve finally got round to doing it. Hey, it still beats Duke Nukem Forever. The code’s updated – it’s now in JavaScript, not Flash, and GPL-licenced (source). I’ve updated the dictionary for more contemporary feel (out goes Tony Blair, in comes Russell Brand) and it can handle the past and present tenses now. Suggestions for extra things to put in are welcome (add them in the comments below).

And while I’m on the subject of the Daily Mail – and I’d normally refrain from even telling you it has a website, let alone linking to it, the hatemongering bogroll that it is, but something in the latest column by homo-obsessed walking shitbag Richard Littlejohn slipped in unnoticed by the Mail’s irony detectors, it seems:

Apparently, my column is a constant reminder of why they did the right thing in emigrating to New Zealand.