By Joshua Klein
FORTUNE -- Big Data and the cloud are putting supercomputer capabilities into everyone's hands. But what's getting lost in the mix is that the tools we use to interpret and apply this tidal wave of information often have a fatal flaw. Much of the data analysis we do rests on erroneous models, meaning mistakes are inevitable. And when our outsized expectations exceed our capacity, the consequences can be dire.
This wouldn't be such a problem if Big Data wasn't so very, very big. But the amount of data that we have access to is enabling us to use even flawed models to produce what are often useful results. The trouble is that we're frequently confusing those results for omniscience. We're falling in love with our own technology, and when the models fail it can be pretty ugly, especially when the mistakes all that data produces are concomitantly large.
Part of the issue is oversimplification of the models computer programs are based on, rather than actual errors in their programming. For example, in early April 2011, Peter Lawrence's The Making of a Fly, a classic work in developmental biology that many biologists consult regularly, was listed on Amazon.com (AMZN) as having 17 copies for sale: 15 used from $35.54, and two new from $23,698,655.93 (plus $3.99 shipping).
The book, last published in 1992, is now out of print, but that doesn't quite explain the multimillion-dollar price tag. What had happened was that two automated programs, one run by seller "bordeebook" and one by seller "profnath," were engaged in an iterative and incremental bidding war. Once a day profnath would raise their price to 0.9983 times bordeebook's listed price. Several hours later, bordeebook would increase their price to 1.270589 times profnath's latest amount.
It's a classic example of how unanticipated factors can foil even the best-prepared computer models, and it's not an isolated incident.
For example, does this sound anything like the subprime mortgage crisis? Before 2008, the best minds with the best technology running the most advanced hypothetical scenarios completely missed the looming crisis and then failed to understand its severity. The more broadly a model is scoped, the more possibilities for error it includes. It sounds obvious, but we often miss the fact that those models are not, and will never be, as accurate as reality itself.
Here's another example. One t-shirt seller on Amazon.co.uk put up a shirt for sale emblazoned with the statement, "Keep Calm and Rape a Lot." One might wonder who thought such a shirt would be a good idea. But Solid Gold Bomb, the company that made the shirt, wasn't necessarily aware that it was even selling it. The company apologized publicly and copiously, but in its defense the only mistake it made was a small coding error. That's because the shirt wasn't designed by anyone. Nor were the shirts even necessarily ever printed. Solid Gold Bomb's business isn't in artfully designing T-shirts. Instead, it writes code that takes libraries of words that slot into popular phrases (such as "Keep Calm and Carry On," which enjoyed a brief mimetic popularity online) to make derivations that get dropped onto a template of a T-shirt and automatically get posted as an Amazon item for sale. Their mistake was overlooking a single word in a list of 4,000 or so others (the company was lucky no other offensive words or phrases made it onto the site). The problem was context.
Again, a simple model, with serious social consequences. The program that made the Solid Gold Bomb T-shirt isn't aware of how its intended audience perceives the concept of rape, let alone how the business process that rendered the T-shirt works. And yet that context turned a one-word oversight into a massively damaging event.
In both these instances an inability to anticipate how the program would interact with other programs, or of the broader context in which it would operate, caused significant harm. Those are just two ways in which a model on which code is based can be flawed.
Big Data still has big issues. For example, the information we're gathering is often not being properly normalized (put into a format where all data is apples-to-apples), the models we're making aren't often peer tested or reviewed (witness the problems with the ranking tool Klout as a standard for social media influence), and, most crucially, the information itself is usually siloed inside of large corporations instead of being democratically available and verifiable.
Which isn't to say our technology is doomed. Most of the applications we use every day work tremendously well, and in some cases really do produce amazing capabilities that improve our lives in countless ways every day. But it behooves us to examine the models that underpin them. Because someday, somehow, they will fail.
Joshua Klein is a hacker, consultant, television host, and author of Reputation Economics: Why Who You Know is Worth More than What You Have (Palgrave Macmillan), from which this essay is adapted.
"We are on the cusp of a smarter planet," says Godfrey Sullivan.
By Chanelle Bessette, reporter
FORTUNE -- Fortune's annual Brainstorm Tech conference brings together the best and brightest minds in tech innovation. Fortune periodically turns the spotlight on a different conference attendee to offer their personal insight into business, tech, and entrepreneurship.
Godfrey Sullivan is CEO and chairman of Splunk, a software company that helps users analyze machine data (the information generated when IT MOREOct 30, 2013 2:21 PM ET
Map Engine Pro permits easy creation of maps for businesses.
By Michal Lev-Ram, writer
FORTUNE -- Google wants everyone to become a cartographer -- and pay for it. On Monday the search company launched Map Engine Pro, a new offering that enables employees to easily turn datasets into interactive, shareable maps. The latest addition to Google Apps for Business, the company's online suite of productivity tools for corporate customers, is designed MOREMichal Lev-Ram, writer - Oct 22, 2013 11:00 AM ET
Twitter might often seem like a big pile of nonsense, but subscribers pay thousands of dollars in order to find and analyze the useful data it contains.
FORTUNE -- People who were 13 years old in 2006 are 20 now. Many of them no doubt would like to erase much of their online histories, especially the stuff they wrote on Twitter in their early-teen years: say, somebody who's now a fan MOREDan Mitchell, contributor - Sep 4, 2013 2:23 PM ET
The hottest trend in enterprise technology is fueling the market.
FORTUNE --- If you hadn't heard of Tableau Software before its glamorous debut on the public market last Friday, you're not alone. The Seattle-based company makes visual analytics tools for technical and non-technical employees alike but is far from a household name. And yet, it raised around $254 million in its initial public offering and closed its first day of trading at MOREMichal Lev-Ram, writer - May 20, 2013 10:42 AM ET
With Splunk, Godfrey Sullivan pulled off one of 2012's strongest IPOs. But it's his hobbies and no-drama management that have made his reputation.
FORTUNE -- Godfrey Sullivan has an odd idea of fun. For the past two decades the 59-year-old chairman and CEO of data analytics firm Splunk has competed in a little-known type of race called Ride and Tie: Two partners alternate between riding a horse and running on long, MOREMichal Lev-Ram, writer - Feb 1, 2013 5:00 AM ET
|November jobs report: Unemployment falls to 7%|
|Fast food worker: Protest didn't cost me pay|
|2 million Facebook, Gmail and Twitter passwords stolen in massive hack|
|Premarkets: Stocks looking stronger before jobs report|
|Where should you put your money now?|