Thursday, November 3, 2016

That Didn't Last Long

I was trying to ride out the Cubs win over stressing about the election a little longer, but around noon today, I failed and began checking statistics on the election and forecasts about who would win. I've been following FiveThirtyEight's election forecast pretty closely, and am really liking some of the figures they use to display the results.

Right now, they give Clinton a 65.9% chance of winning - note, depending on when you check that link above, the figure may have changed. They continually update their models as data becomes available. Their map of the US looks a little too red for my taste, though. Cue momentary tachycardia. Just keep scrolling.

After displaying some data on trends in opinion polling, as well as a chart of state-by-state results, we get to their electoral vote figures. I really liked this particular figure, not just because it reminds me of a board game, but because of how well it communicates a large of amount of data:

At the bottom of the page, they provide some probabilities of different outcomes, including Electoral College deadlock (1.3%), Clinton wins popular vote (77.5%), and map exactly the same as 2012 (0.2%). You can also read more about the methodology used in the election forecast here. Not only does their model aggregate polling data from across the country, and use prediction variables found to be useful in previous elections, they even account for uncertainty in polling:
Our probabilities are based on the historical accuracy of election polls since 1972. When we say a candidate has a 30 percent chance of winning despite being down in the polls, we’re not just covering our butts. Those estimates reflect the historical uncertainty in polling.
So they're factoring in the rate at which polls have previously been inaccurate in other elections. The model also is more reactive to fluctuations closer to the election. So while changes in the polls over the summer change the model very little, changes in the polls now result in more radical changes in the predictions.

Finally, they use a process known as boot-strapping, where the analysis program randomly draws a large number of samples from the data (something on the order of 20,000 in this case) to generate a range of values. This helps minimize the influence of sampling error - drawing a sample that doesn't represent the population well. There's also some interesting info on which probability distributions they use and their rationale. If you're a stats-lover, you'll probably enjoy those details.

1 comment:

  1. Keep in mind that there's an inverse correlation between how big a state is and its population. Montana, the Dakotas, and Wyoming are a big chunk of the middle of the country--but altogether they account for fewer electoral college votes than tiny New Jersey.