Nate Silver: What Big Data can't predictApril 26, 2013: 2:20 PM ET
Fortune talked to the statistics whiz about the limit of data's impact on business.
By Kurt Wagner, reporter
FORTUNE -- Statistician Nate Silver isn't famous because he's a mathematical genius. (Although, he is.) Silver's well-known because he knows how to apply his craft to the real world. The country's most popular data cruncher is known for his spot-on election predictions -- he accurately called the winner in all 50 states of November's presidential election; in 2008, he went 49 for 50 -- but Silver's big data analytics have also translated to the worlds of sports (March Madness, Major League Baseball), gambling (Silver will play in his third World series of Poker event this summer), and even dating. Silver once wrote for the baseball website Baseball Prospectus but has since expanded his offerings; he is now a published author, a political pundit, and the creator of his very own New York Times blog, FiveThirtyEight.
Silver was in San Francisco Thursday to talk analytics as the keynote speaker at Lithium Technologies' annual LiNC Conference. Fortune sat down with him to talk about big data's limitations, its role in the stock market, how it applies to dating, and even his predictions for the 2016 presidential election. A lightly edited transcript follows.
Fortune: I'm sure you get people coming up to you all the time to discuss how you helped them win their NCAA March Madness pool.
Nate Silver: I went against my bracket in my own pool because I thought other people would be using it. I would have gotten second place if I had taken my own advice.
Maybe take a small royalty fee next year?
Absolutely. Or we need to put out a fake bracket [first], and then put out a real one [later]. Oops, there was a coding error! [Laughs]
You started out using stats to better understand and predict success in baseball -- why did you move towards politics?
Of course it's easy to say in retrospect why you did certain things instead of what rational motivations were pushing you in that direction in real time, but I think part was that I was involved working for Baseball Prospectus for about five years -- 2003 to 2008 -- and you saw a great amount of progress in the baseball industry during that time. The start of that era was the era described in [the book-turned movie] Moneyball where you really had a lot of tension between stat-heads and traditionalists. People were terrified that nerds would come over and take their jobs. And really now that's been totally reversed, where it's not just that you have some stat-head that you've hired and have locked into a closet somewhere, but that every team -- almost every team, there are some exceptions -- understands analytics at different levels of the organization.
But seeing how quickly that progressed in a span of just a few years, and how behind politics coverage seemed to be where it's all about the narrative -- there's a lot of bullshit basically both in the news coverage of politics and from politicians themselves -- so it seemed like it was ripe to apply some very basic analytics tools to the coverage of elections.
Is it hard to keep your own political beliefs separate from your work predicting elections?
It's always hard for us to be objective in any walk of life. None of us has a monopoly on reality, we all have rather jaded points of view. I do think the sports training helps though, where I can be a Detroit Tigers fan as I am [and was] growing up, I still thought Mike Trout [Los Angeles Angels] should have won the MVP award last year. What I think differentiates politics a bit is that you have an industry full of people who not only have views but are [also] used to manipulating public opinion. They're used to thinking they can create their own reality. That's why I think you have such trouble on the uptake there. People think that, well, if I can spin a fact a certain way or spin polls a certain way, [the problem] goes away. When you have a political press where some people are very good, but some other people are very compliant and happy to pass along spin from the campaigns, I think that's the issue. People aren't used to getting a reality check in politics as much as in sports.
So how are you able to sift through that information then to pick out the BS?
The idea is to ignore what the politicians say and stick with publically available data. The record shows that in general, most political observers tend to overrate the importance of a gaffe or a debate -- there are always exceptions -- but in general the polls provide a pretty reliable benchmark. And the public, who have real lives and are not constantly consuming political news, are [sometimes] weighing things in a very sophisticated way where they're looking at things like the economy or are we involved in any stupid wars or major scandals from the administration. Those are the things that explain a lot about who wins the elections and not so much the petty stuff that the political pundits can focus on.
There is more data now than ever before. How are you able to determine which information to pull in order to properly answer your question?
Part of it is that you do need -- as Vegas might say -- you do need a system instead of an ad hoc way of doing it. So we have a model that we designed in 2008 that was updated for 2012 that was designed to account for every single poll. Some polls, if they're from a pollster that has a better track record, get more weight in the system. It doesn't mean that others are ignored. So it's not like we're just looking at a poll and sticking our fingers up in the air and saying, "Oh that poll is important, and that poll's not." Basically all the hard work and all the decision-making process comes from designing this model before the fact. Based on theory and practice and past experience, what are a good set of rules for processing this information? And then sticking to that. We don't make any alterations to the model once we launch it in June every year, unless there's a bug, which fortunately there hasn't been. But the principles are always the same, and then you have a disciplined way to analyze data in that context.
Are there any questions out there that can't be answered using data and analytics?
So I think it all exists along a spectrum. It's important to know, too, that there's a difference between how good we are relative to our potential and how intrinsically predictable something might be. So for example if you look at baseball where analytics have come an awful long way, well it's still the case that the best baseball teams only win two-thirds of their games. The best hitters only get on base about 40% of the time. So it's still intrinsically unpredictable in a sense, but we have a good way of measuring and knowing what we know and what we don't know.
But there are a lot of fields where analytics have not come very far. I discuss earthquake forecasting in my book [The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't] for instance, where people have been trying for centuries. We know something -- there are more earthquakes here in California than in New Jersey -- but the ability to anticipate a particular earthquake with any precision at a particular moment in time has not gone very well at all. Even economics -- when we try to do long-term economic forecasting, it has been pretty poor for the most part.
Are there any industries out there that are overlooking the possible impact of big data analytics?
It's sometimes industries that aren't very sexy necessarily, so big retail businesses for example have tons of records on every consumer transaction they have [completed]. They have a ton of data on supply chain management, so things like that in terms of optimal inventory strategies or optimal pricing strategies or robust strategies for disruption in the supply chain. Not super-sexy stuff, but those people have really good data sets, often high quality data, and can make better decisions as a result. I'm sure that some are doing it already, and there are some efficiencies there that you weren't seeing before.
There are also cases like if you look at how people consume television, for example. I think the advertising industry is something that's gotten more sophisticated in terms of targeting customers. The irony is that efficiency has become harmful in some ways for media companies, because what was the old adage? "Half your advertising budget is well spent, but you don't know which half." Now people might know which half, so they're only spending half as much.
Can people use data or analytics to accurately predict the stock market in any way?
The problem is the stock market is this whole contest where you're competing against other creators. So the question is: Are there some traders that are better than others? I think the answer is probably, Yes. I'm not a pure markets guy, I played poker long enough which I think has parallel skills to trading in many respects where you know that some people are better over the long-term and better at accounting for uncertainty and so forth. But there's a lot of volatility and a lot of luck where a market cycle can last for months or years. There are a lot of perverse incentives that get in the way. So while I think there are some good traders, in the near term, even over a period of five or 10 years, it will mostly be dictated by luck, so it's tricky.
Have you ever applied your model to dating?
I did one little analysis with OkCupid for a New York Times Magazine piece a couple of years ago where we were trying to figure out the best night of the week to go out and get laid basically. So OkCupid collected the data of some status reports of people who were out and about using their mobile application. And we looked at the ratio of people who wanted a long-term relationship vs. people who wanted just something for [that night]. On Wednesday apparently you get the highest ratio of people who are just looking for something quick and dirty.
2016 presidential election: Who should we be looking out for?
So I guess I'm disappointingly in line with the conventional wisdom here. If Hillary [Clinton] runs, it's hard to see her not winning the Democratic nomination. On the GOP side I don't think there's any way to avoid having a big messy primary, they have some good candidates and some bad candidates. But no one has a monopoly on that party right now, so they're going to have to fight it out. And then the general election of course depends on who wins the primaries, but people should be a little wary because look, Hillary Clinton might be a very good candidate if she wins [the primary], and she'd be a slight favorite, but still it's hard for any party to win the White House three terms in a row. If you have a poor economy by 2016, or if Obama's approval ratings are 38% or something like that, then it's hard for a Democratic processor to prevail even if it is Hillary. Trying to make predictions at this point is a little early. Actually, it's a lot early.