Sunday afternoon in Oxford, what is the probability that 200 students will attend a statistics lecture? Rather good, apparently.
Nate Silver is a 35-year-old statistician surfing a wave of popular acclaim, having correctly predicted the outcome of the US Presidential race last year, in all fifty states. His computer model for the election, an elaborate poll-of-polls, gave Obama a 50.2% chance of winning Florida State: statistician speak for “I haven’t got a clue”. Luckily for sales of his book, the coin landed the right way up.
Being accurate on election nights has made Nate something of a celebrity among political junkies. He writes the popular FiveThirtyEight.com blog on the New York Times website and as he proudly demonstrated with a PowerPoint slide – last November his name briefly got more Google searches than US Vice-President Joe Biden. Somewhat tragically, Justin Bieber remained three times more popular than either of them. It has not been easy gaining credibility in this crowded field. Nate Silver’s passions of baseball and gambling meant his early training came from developing a baseball prediction system called PECOTA, which became very successful in the US. But while baseball and statistics is a perfect match, opinion polling in the real world is messy, labour intensive, and stressful when thousands of people start scouring your blog daily.
On the technical side, Silver is well qualified, having studied at Chicago and LSE. Most of his work involves Bayes theorem, a “mathematical algebraic formula for how you weigh new information against old information, and update your beliefs over time.” The human brain, Nate argues, is Bayesian too. Silver’s method for predicting elections is to revise his estimates very slightly whenever a new poll is released. The effect of that new data could vary: “some polls are more reliable and they get more weight.” On 31st May last year, Nate thought that Obama had a 60 per cent chance of winning re-election; he revised his estimate every single day until the 6th November, eventually declaring hat Obama would win with 90 per cent certainty. Lots of pundits accused him of ‘cheating’ by endlessly revising his estimates. They missed the point. Just as you would expect the opinions of 311 million people to vary, Nate’s estimates changed frequently and displayed inertia.
Romney supporters whinged endlessly about his methodology, mocking the idea that Silver could claim to be objective while publicly professing to be a Democrat. He’d overlooked minority turnouts; he’d use biased weighting; he’d cherry picked results; and anyhow, people make up their minds in the queue before they vote so the polls are irrelevant. Silver quipped in response, “The Republican Party has moved away from empirical analysis in general.”
Nate says he enjoys working in areas where the existing quality of statistics is terrible, because a little bit of careful work can make you look like a relative genius. He says people are “perceiving information in a very often partisan, bias and jaded way”. The FiveThirtyEight model by contrast is “really simple”, largely empirical, and freely admits to its large uncertainties. Those things made it considerably more accurate than other polls-of-polls.
Silver was in Oxford to promote his first book, The Signal and the Noise, which is a popular science adventure in the tradition of Freakonomics. It’s an entertaining collection of statistical cock-ups and applications – the financial crisis makes an early appearance – and is accessible without dumbing down.
In book and in person he seemed keen to talk about the psychology of statistics. The main alternative to Bayesian methods is the frequency- based approach, developed by Ronald Fisher.“Frequentists are too pristine,” he says, however. They’ll say: we’re more objective, because as a Bayesian you allow your own judgement about the prior state of the world to cloud all your future predictions. Silver thinks their mistake is to treat political questions as you might physics or biology. He argues that when you explicitly leave a gap for a human assumption in a statistical model, rather than implicitly as under other methods, you become acutely aware of its effects. Consequently, you make better predictions.
Silver borrows Isaiah Berlin’s classification of ‘Foxes and Hedgehogs’ to make this point. A ‘hedgehog’ personality likes to boil the world down to one simple theme (e.g., Nietzsche). A ‘fox’ accumulates evidence from many fields, and is sceptical (e.g., Aristotle). When studied, foxes seem to make the best predictions while hedgehogs allow everything to get “trumped up to be highly significant”. All too predictably, the people we elect into politics and find entertaining on TV tend to have the hedgehog personality.
Some optimism is in order for the way Silver’s approach is transforming fields like medicine. Big Data, the new zeitgeist, wants to begin helping doctors diagnose conditions by digesting more published research and online information than a human ever could. Doctors could then supplement their memories with a digital research assistant, who makes them aware of the range of evidence available. It’s not as far-fetched as it sounds. Big internet businesses are already categorising your spending andsocial behaviour online, all in the name of marketing.
As a young, aspiring public intellectual, Nate Silver is taking a risk as he branches out to other fields. A cynic would point out that he has gained his major following thanks to just two data points: correctly calling 49/50 states in the 2008 US election, and all 50 last year in 2012. Two strong results shouldn’t normally be enough to change your mind but, with a compelling talk at the Union this week, Nate Silver convinced me he was the genuine article, and no Paul the Octopus.
Nate Silver’s book The Signal and the Noise is published by Penguin, £7