Thursday, November 6, 2008

The Amazing Nate Silver Projection Machine

First, I want no political talk in the comments section. This is about polling practices and data aggregation.

Nate Silver designed the Baseball Prospectus computer program PECOTA. PECOTA has pulled off some nifty feats in its day.

This year, Nate turned his talents towards polling, and election predictions at his site, and as you might expect, he proved to be superb once again:

Shortly before Tuesday's vote, Chief Numba-Cruncher posted his final prediction for the 2008 Presidential Election: Barack Obama would win the election with 52.3% of the popular vote, while John McCain would collect 46.2%. The final vote tally as of this morning? Obama 52.4%. McCain 46.2%. One-hundred-and-twenty million votes were cast and the dude was off by one-tenth of one percent. (He also called 49 of the 50 stats correctly.) Holy. Crap.

I've been walking around for the last month or so telling people that Obama was a lock. I did this because of FiveThirtyEight. The first time that I set eyes on the site, I wondered why no one had done this before. I don't mean to denigrate Silver or his work with that statement. Works of genius often seem obvious after the fact, and I would call FiveThirtyEight a genuine work of genius.

So what did Nate do? I might get this a bit wrong, as these are just my observations and I don't have any special insight into PECOTA or FiveThirtyEight, (especially as PECOTA is largely proprietary), but I think I can boil it down.

PECOTA uses comparables, and what I would describe as a Bayesian algorithm to predict career paths for players. In short, you take a bunch of similar guys, you decide how similar they are (probabilistically speaking), and then you dump all of those probabilities into a big computer and run simulations, Monte Carlo style.

Elections are perfect for this type of analysis, because other institutions will provide those probabilities for you, and the value of wins is fixed by the electoral college. Nate did not just take poll numbers, of course. Most poll numbers are terrible. He aggregated (Wisdom of Crowds style) and then corrected for bias. This is where it gets interesting, and why Obama was much more of a sure thing than most pundits acknowledged.

Two candidates can be fairly close in the popular vote while remaining far apart in the electoral count. The media tends to focus on the popular vote, but the electoral count is really all that matters. Most states are not swing states and can be counted upon, with near certainty, to go Republican or Democrat almost every time. This situation leaves certain paths available to the candidates for victory. If you apply the probabilities you've created by aggregating polling data to the "swing state pathways" necessary for each candidate to win, you can determine how likely the candidate is to win.

Let's say, for example, that McCain had 30 (to choose a round number) paths to victory available, but that 20 of them required him to win Pennsylvania. If Obama has a 55% chance of winning Pennsylvania, that doesn't just add some votes to Obama's popular total or 21 votes to his electoral vote total. What it does is significantly reduce the number of ways that John McCain could win. McCain needed so much to go right for him that it was a near impossibility. FiveThirtyEight had Obama winning in over 95% of scenarios for most of the last month because there just were not many paths available for McCain. Even if he would have picked up Pennsylvania (which was always unlikely) he STILL would have needed a bunch of other states to break his way. Obama simply had more outs. While most pundits were reciting poll numbers of 52-48, or something like that, making the race sound reasonably close, Nate had it (for all intents and purposes) 95-5, which is more in tune with the electoral vote blowout that ensued.

Anyway, Silver is an excellent prognosticator, and all of his new-found recognition is well deserved and well earned. Moreover, this is an excellent advertisement for BP and PECOTA for next year, which is an absolute necessity for any real baseball fan.

For more on Bayes theorem, you could read this, or the prologue to The Wisdom of Crowds (the part about The Scorpion), or you could understand the Mont Hall Problem. Then read Anathem.


E.S.K. said...

I had no idea he was behind fivethirtyeight. Damn good prognosticator.

DannyNoonan said...

Nate Silver is definately near the top of my "Guys I'd like to have a beer with" list. Nothing more interesting than math, baseball and politics.

What the hell is that video at the end? I'm about 100 pages into Anathem and it's really awesome, but that video is almost enough to make me want to put it down. Is it just some fan-made promo?

E.S.K. said...

I must have missed that in my first're a Neal Stephenson fan? And you call me a geek for building computers?

This is a terrible injustice.

PaulNoonan said...

I thought the promo was funny. Especially the saeculum in football uniforms. E, I've never claimed to be less nerdy than you.