So you want to be a geek…

23 January 2009

Charles Arthur had a nice post at the weekend entitled: If I had one piece of advice to a journalist starting out now, it would be: learn to code. As any modern journalist is able to Google around for facts, Charles tells any budding journo to set themselves above and beyond the normal set of “IT skills”; being able to get a more powerful grip on data is now becoming part of what a journalist should know:

None of which is saying you shouldn?t be talking to your sources, and questioning what you?re told, and trying to find other means of finding stuff out from people. But nowadays, computers are a sort of primary source too. You?ve got to learn to interrogate them effectively – and quote them meaningfully – too.

It’s great advice – playing with data and getting a feel for how to get the best out of it not only helps you find new things out, but also helps open your mind up to a more healthy appreciation of data. It allows you to explore the possibilities of data as well as its flaws, when it can be trusted and when it should be taken with a pinch of salt. And it’s things like this that contribute towards a sense of joyful skepticism that any self-respecting geek should possess (and you thought it was just about watching every episode of Battlestar Galactica).

I gave up programming as a full-time career more than three years ago but have still kept my hand in programming since, either for fun or to make work quicker and easier. Working in the digital and social media PR sector isn’t just about going to the pub (truth be told, it’s actually about going to very expensive pubs) but also about dealing with vast quantities of information – so you can see how programming can help. Making tasks faster is part of it, but the programming mindset is equally if not more important: it has taught me skills such as looking to optimise and make things quicker, filtering noise from the signal, reusing what you have to save effort in the future, and not being surprised by the unexpected.

So I’m going to say what Charles Arthur said, but bigger. If you work in any information industry, or are thinking about a career in it, learn to code. And by code I don’t mean learn something hardcore like Java or C++, or even learn a full programming language (as you’ll see below). But it means getting above the usual abstractions you see – your web browser, Word, Excel – and getting involved at a deeper level, get to appreciate what the data it is you’re reading and realise it’s not just something to look at.

So where and what would I recommend getting started with programming? From my own weary geek’s viewpoint, here’s six ways of getting into it – three of which really aren’t strictly programming at all:

Regular expressions. I cannot begin to think how many times these have bailed me out of an otherwise unrecoverable situation. Regular expressions are ways of finding and replacing text that are much more powerful than the bog standard. For example, you might want to get all the telephone numbers or postcodes, out of a document, but they are all different so a simple search wouldn’t be able to do it, so you have to do it by hand (and might miss one out).

A regular expression on the other hand can say “find me any group of eleven digits that begin with a 0, and either match the patern 0xx xxxx xxxx, or 0xxxx xxxxxx” – and bingo, you have all your phone numbers. Get clever and you can even tell it to not worry about whether it’s a space or a hyphen in between the groups of numbers. Be careful – they can get complicated, so build them up slowly and step by step – and they can do unexpected things, so always back stuff up.

CSV. Many people work with Excel spreadsheets and while it is great for tabulating data it isn’t a very portable format. Often you want to copy data in or out of Excel into other applications and it ends up being a horrible mess of numbers separated by spaces and tabs that you have to re-align yourself. CSV (comma-separated values) is the very boring but portable way of getting data in and out of Excel – it just consists of text with no styles, with commas to mark in between each column.

CSV looks like shit but it makes up for it by being able to be extremely portable and lightweight. Combined with regular expressions above and you’re able to take the useful data out of a horrible mess, replace everything between it with commas, and you can now import it straight into a spreadsheet. Or vice-versa – extracting numbers out of the spreadsheet and allowing other apps to play with it (like I did with the general election map)

Yahoo! Pipes. I am still waiting for Yahoo to piss this one up against the wall like they have done with Technorati and Flickr. So much of the web already runs on RSS (Really Simple Syndication) – streams of links and articles – that being able to manipulate them like this is a real boon. Yahoo! Pipes takes RSS feeds and allows you to merge them together, filter them, cross-reference them and more. When I was looking for a job last year I used a series of Pipes to pull feeds from various job websites, filter out the kinds of jobs I didn’t like, and then remove the duplicates so I wasn’t wasting my time – all delivered to Google Reader for easy perusal, as and when they came in. The interface is as reasonably usable as you could expect and has led to some really useful apps being created.

JavaScript. JavaScript is indispensible part of the web, although originally it seemed destined for little more than launching popups and stupid messages in your status bar. Now virtually every page is interactive in some way, and JavaScript’s true power is being exploited. One of the most obvious ways of getting it to work for you is Greasemonkey, which allows you to add scripts to change the behaviour of what appears on your screen – such as making Google Reader more readable, getting rid of Xeni Jardin, or (with the help of regular expressions) making postcodes turn into links of maps.

Python. The biggie and the one I use the most. Python‘s strengths lie in its simplicity – it’s quite simple and human-friendly and runs pretty much on anything. It also has a sensible structure and organisation, which teaches you to code well and clearly. Finally, the vast libraries available mean you can play with pretty much any data format, such such as BeautifulSoup, which allows you extract data from webpages easily. Python’s one drawback is that it falls down on its relatively poor documentation & tutorials, with some honourable exceptions such as Mark Pilgrim, so do hunt around and don’t let the technicalese put you off.

Finally, PHP gets an honourable mention – a easy enough language to learn and used widely, but with so many evolutions and a complicated past the language is a mess, and it teaches several bad programming habits.

I wouldn’t recommend doing all six at once, or even ever, nor would I set expectations too high. In some respects, it’s not even about the code or the results you get – it’s as much about the philosophy and understanding it brings with it: that data is not a static thing but ours to play with, making us able to create wonderful new things or change society for the better.


Single-serving Tumblelogs

13 January 2009

Last year, Jason Kottke charted the rise of the single-serving site – “web sites comprised of a single page with a dedicated domain name and do only one thing”, as he puts it. They range from the facetious (IsItChristmas.com), to the possibly useful (IsLostARepeat.com), to the downriight marvellous (ItsNotLup.us). They’re almost the anti-Web 2.0 – uninteractive, dry-looking – devoted to a single psychotic purpose, neatly spelt out in the URL.

Since then, another kind of sites with a single purpose has sprung up – they can feature content – typically photos – and just that. They’re updated, but unlike blogs focused on single subjects, there’s no commentary, no snarking, no linky enthusiasm. It’s just content shovelled up for you straight away – and a lot of them use Tumblr. Brokers With Hands on their Faces, Garfield Minus Garfield and Fuck Yeah Sharks are three of my favourites, whiile Bale Yeah! and White People Trying to Look Serious get honourable mentions. The less commentary the better – let the content do the talking. Dude Totally Punched A Horse slips on this last one – it should just be the videos. Ditto for Arrested Development Stills – just leave them be!

Tumblr is ideal for this kind of streaming – having managed a super-secret Tumblr myself these past few months (I’ll reveal all sooner or later), the interface is a cinch, and as comments are not enabled by default it saves the messiness of moderating or dealing with the content. It’s halfway between single-serving and full-on blogging, leaving you free to obsess about whatever you obsess about. It’s light, fun and often hilarious. Long may it continue.


The Voice of Fate

9 January 2009

thevoiceNote: Mostly written while watching the film version of V for Vendetta over Christmas with a hangover, spoilers galore for both it and the book within, so proceed at your own risk.

Of the many things wrong with the Wachowski Brothers’ flawed adaptation of V for Vendetta, the omission of the computer Fate is by far the biggest. Fate is the computer that runs the society in V’s alternate future; it hooks into to the surveillance systems used throughout British society and makes all the decisions. As the novel progresses, the high chancellor Adam Susan, supposedly the fascist dictator in charge of society, turns out to be in thrall to Fate’s machinations, believing it to be a goddess; with it his truly wretched and lonely character is revealed

From Fate and her omniscience and omnipotence, all the best complexities of the characters come – for example, the curious hidden nature of Lewis Prothero, the “Voice of Fate”, a sociopathic concentration camp commandant with a nevertheless seductively charismatic voice (and a natty line in girls’ dolls). In the book he is the human voice of the computer, broadcasting sonorously to the nation, but in the film, robbed of his duality he gets turned into a shitty cross between Richard Littlejohn and Bill O’Reilly, ranting away incoherently on national television every night.

Despite bring set in Britain, the Wachowskis’ adaptation is very Americocentric (as demonstrated by the recharacterisation of Prothero); it details a narrative based on opposition to the neoconservative agenda in America and the resulting foreign policy; the film is peppered with references to the Iraq war, Islamophobia and homophobia, and the bioterror plot within is a little reminiscent of the 9/11 conspiracy theories. The film is very much a product of the early 2000s – and with the crushing defeat of the neoconservatives in the US mid-term and presidential elections, now already seeming a little dated.

With this in mind, the more I think about it, the better allegory for our times doesn’t come from the post-war on terror ostentatious authoritarianism but on the Fate plotline, of a more insidious system of control. Successive governments have become increasingly in thrall to mass surveillance, but it has especially been the case with the present one – whether it be CCTV cameras, the national identity register, DNA databases (even if you’re innocent), mass-snooping of emails and phone calls, or even outright hacking of your computer without a warrant.

fate_smessage

And thrall is the right word to use here, as decisions are made not on evidence based on their efficacy but on an ideology that the more is more: the more data the government has, the more able it is to govern. Focusing on the quantity rather than the relevance of data has various unfortunate consequences; we fall risk to garbage in-garbage out: supposedly reliable databases turn out to be heavily flawed. It leads to greater risk of security breaches, whether they be accidental or malicious. And most importantly it leads to a system of governance where everybody is treated as a datapoint – and thus governments manipulate people just like they would like to manipulate datapoints. The end result is a dehumanised and rather bleak polity, with every facet of public service characterised with targets, performances and star ratings, human beings reduced to automata in a fabulous number-crunching system.

There’s another twenty blog posts I could write on this theme, but I won’t for now. But do check Adam Curtis’ The Trap as a primer on it from a philosophical/psychological point of view; The Tiger That Isn’t by Michael Blastland and Andrew Dilnot for a mathematical examination; there is no equivalent from a sociotechnical or economical aspect exists, as far as I know.

Anyway, back to V for Vendetta, and Alan Moore. The comic was set against the spectre of nuclear war (from which the putative fascist Britain would rise), with a hint of a warning about where the right-wing agenda of the early Thatcher government could take us. And through this system, the monstrous system of Fate is created, and we are beholden to it. The odd thing is that we’re being taken towards the end without going through the intermediate stages – which is a relief in some ways (eating dead rats out of radioactive rusty hubcaps is never a good thing) but also oddly chilling. The pessimistic conclusion is that supreme control and omniscience is the goal of anyone in power and with the technology at hand it’s an inevitability. The optimistic conclusion is that despite the steady encroachment, it’s never too late to turn it back, if only we have the will. What’s it going to be? Hopefully it is not a matter of Fate.


Daily Mail-o-matic: Now updated

7 January 2009
ARE ASYLUM SEEKERS GIVING PROPERTY PRICES CANCER?

Back in 2003, before I even started blogging, I created the Daily Mail Headline Generator, and within a couple of weeks a friend suggested some extra things to put in it. “Good idea”, I said, “I’ll update the code when I have the time”.

I’m nothing if not prompt, so a mere six years later, I’ve finally got round to doing it. Hey, it still beats Duke Nukem Forever. The code’s updated – it’s now in JavaScript, not Flash, and GPL-licenced (source). I’ve updated the dictionary for more contemporary feel (out goes Tony Blair, in comes Russell Brand) and it can handle the past and present tenses now. Suggestions for extra things to put in are welcome (add them in the comments below).

And while I’m on the subject of the Daily Mail – and I’d normally refrain from even telling you it has a website, let alone linking to it, the hatemongering bogroll that it is, but something in the latest column by homo-obsessed walking shitbag Richard Littlejohn slipped in unnoticed by the Mail’s irony detectors, it seems:

Apparently, my column is a constant reminder of why they did the right thing in emigrating to New Zealand.


Andy Burnham – in ur internetz, classifyin ur sitez

28 December 2008

In the quietness of the holiday season, the Secretary of State for Culture, Andy Burnham has come forth with plans to age-certify the web like is currently done with films and DVDs. Coming in wake of the IWF’s horribly misguided attempt to block Wikipedia, this is another hamfisted approach to regulating the Internet as if it were old media that solves very little.

So what will it look like? It certainly won’t look like a BBFC for the web: First there’s questions of scale: the total number of sites (not counting subdomains) alone is around 156 million, while the Google index is in the billions of individual pages and there may be up to a trillion unique URLs on the web. Compared to the 639 films and 11,439 videos and DVDs that the BBFC classified in 2008, that’s more than just a few orders of magnitude. No human-oriented solution would be able to get the job done – it’ll face a hard enough job coping with the 120,000 blogs created every day. So any such system will be automated.

Secondly of course, there’s the international dimension. How is a site in Russia or Tuvalu going to be compelled to undergo certification by a UK body? Answer: none at all. So the idea of a website being clearly labelled “PG” or “18″ like a DVD is can go right out of the window – expect it all to be done on the ISP level as it comes into the UK, filtered as you access the site.

Oddly enough, automated filtering like this has existed for years, in corporate firewalls and software specifically targeted at parents such as CyberPatrol and NetNanny. You pay for a licence and it monitors what comes in and out, a bit like a virus scanner, for specific keywords or pictures that might look like nudity. These are hideously imperfect and have their faults by being too over-zealous – how do you prevent filtering out of information about sex education, or other health issues such as breast cancer, for example? But an imperfect solution is better than none for some parents, so why not fork out on the software if you’re worried about your kids, and leave the rest of us be?

Censorship of legal but possibly offensive material in this way is a private, not a public, good – most of us are adults and want our access unfettered. But rather than just tell parents to buy a copy of censorware and install, Burnham wants ISPs to spend millions at the network level to implement it. This is a fairly idiotic waste of money, but then the more you look at what Burnham says, it’s clear he hasn’t got a full grasp of facts on the issue:

Mr Burnham said: ?If you look back at the people who created the internet they talked very deliberately about creating a space that Governments couldn?t reach.”

This is utter bollocks. If you’re talking about the ARPANET, the Internet’s predecessor, it was created by the United States Department of Defense. Burnham is probably thinking of John Perry Barlow’s A Declaration of the Independence of Cyberspace, with its famous quote:

“Governments of the Industrial World, you weary giants of flesh and steel, I come from Cyberspace, the new home of Mind. On behalf of the future, I ask you of the past to leave us alone. You are not welcome among us. You have no sovereignty where we gather.”

But this was written in 1996, long after the Internet had taken hold; Barlow was not a creator of the Internet, far from it, instead while the TCP/IP protocol was being proposed and the early Internet assembled, he was writing lyrics for the Grateful Dead. Back to Burnham:

I think we are having to revisit that stuff seriously now. It?s true across the board in terms of content, harmful content, and copyright. Libel is [also] an emerging issue.

Libel online is an emerging issue? The first Internet libel case, Godfrey v. Demon Internet was over eleven years ago (and a dangerous precedent it set too). It has since been clear with cases such as the Alisher Usmanov blog silencing that like all libel cases, the plaintiff has an unfair advantage. Far from being lawless, it’s all too easy for the rich and powerful to silence anything online that is in the UK’s jurisdiction.

“There is content that should just not be available to be viewed. That is my view. Absolutely categorical. This is not a campaign against free speech, far from it; it is simply there is a wider public interest at stake when it involves harm to other people. We have got to get better at defining where the public interest lies and being clear about it.”

Categorical as he likes to be, Burnham burbles over what exact content should not be available, or to whom – is he still talking about child protection here or is he going further? Far from being clear about it he’s muddying the water, starting off by talking about child protection but now touching on the wider issues of freedom of speech and what content can be seen by anyone.

?It worries me – like anybody with children,? he says. ?Leaving your child for two hours completely unregulated on the internet is not something you can do.”

Well then don’t do it. Supervise your own bloody kids. Or cough up for some supervisory software. Or learn about what’s out there and talk to them about it first.

?I think there is definitely a case for clearer standards online,? he said. ?More ability for parents to understand if their child is on a site, what standards it is operating to. What are the protections that are in place??

Actually most sites children use online (such as Bebo, Habbo or MySpace) have quite clear and helpful parental advice sections which if he took the time to read, could be quite edifying.

“This isn?t about turning the clock back. The internet has been empowering and democratising in many ways but we haven?t yet got the stakes in the ground to help people navigate their way safely around?what can be a very, very complex and quite dangerous world.?

You could start with yourself, minister. This bit tickles me the most:

He is planning to negotiate with Barack Obama?s incoming American administration to draw up new international rules for English language websites.

Given how web-savvy the Obama administration is, I expect their response to be mostly along the lines of “WTF?”.

So, in conclusion, the minister for fun doesn’t really have much of a clue – all he knows is there is a problem of some sort and he must be seen to be doing something about it. And the truth is there are already plenty of cheap software solutions, which flawed as they may be, offer a quick fix to the problem. But rather than tell people to fork out themselves, it will eventually cost all Internet users both money and convenience.

A better solution is to not let kids go online alone without educating yourself about what sites are good and what child protection policies they have, talking to your kids about it and showing them how to use the web safely. But in this government’s bizarre world, telling parents how to bring up their kids would be seen as nannying and intrusive, while quietly classifying & censoring everything they download is nothing more than a matter of course.

Extra: John has some extra good points over at Sore Eyes while Tom Watson MP is clever enough to open up discussion to everyone on his blog, with a promise he’ll feed them back to Burnham. Now there’s Government 2.0 for you.

And a bit more: Alex has an excellent rant – although I don’t quite agree with him it’s purely a class-based thing, the English-language bit is an excellent point I hadn’t picked up on. Terence meanwhile argues it’s merely the fear of the new and unknown. Finally – Richard Clayton has a rather excellent summary of the problems with age ratings and content filtering.


Little bit of Boxing Day geekery

26 December 2008

Here’s a little bit of Christmas fun for you – using Wordle to make tag clouds of major Chrismas speeched. Compare & contrast, the Queen’s Christmas Message to the Channel 4 Alternative Christmas Message by Mahmoud Ahmadinejad:

Perhaps unsurprisingly, Ahmadinejad’s is not only full of religious rhetoric (Christ, Jesus, prophets, almighty) but also lots of stirring politicalisation (Humanity, nations, justice, demands). The Queen’s on the other hand is lot quieter -words such as service, family, life, as well as a curious verbal tic in overusing the word “many”. It’s also worth comparing both in comparison with Barack Obama’s address, which is overwhelmingly optimistic and positive (although not much use of the word “hope”) – it’s as if there was nothing wrong with the world right now:

Anyway, there’s not much insight one can really draw from the above – my original intention was to instead look at the Queen’s own messages to the nation(s) over the years to see if there were changes in her outlook. As with all such looks – this is just a bit of fun as we’re just picking a few samples – but it’s still interesting to see the clouds from over the years – every ten from the past half-decade:

1958 was very much about family and domesticity – perhaps not surprising as Her Maj herself was mother of two young children at this point in time.

1968 was much more a message of peace and reconciliation, after a turbulent year of social unrest and change.

1978 was a weird speech, as it contained many excerpts from the Queen’s father and grandfather. Excluding those, it looked very much to the future, no doubt influenced by the birth of her first grandchild the previous year.

1988 on the other hand turned about 180 degrees, looking historically and talking about the many anniversaries and commemorations (including the 500th of the Spanish Armada and 300th of the Glorious Revolution).

Finally, 1998, and again the Queen is talking about family and what different generations can learn from each other. Interestingly, this is the one that least mentions Christmas.

And back to 2008 again. Still a lot of family stuff but more touching on religious themes than in previous decades.

So what is there to learn from the above? Firstly, although the Queen is sensitive to world events (as characterise in 1968 and 1988), she seems to be quite influenced by her family and her immediate surrounds more than she may think. And finally, her outlook on the world seems to have shrunk over the years to being much more closer to home.


Pages: « Prev 1 2 3 ... 7 8 9 10 11 12 13 ... 159 160 161 Next »