Netflix, Facebook — and the NSA: They’re all in it together
NSA, Netflix, Facebook and other e-commerce goliaths are collaborating on tools that track us in very intimate ways
By Andrew LeonardTopics: NSA, gift economy, open source software, Hadoop, Surveillance, surveillance state, Privacy, online privacy, Yahoo, Facebook, CIA, Big Data, Editor's Picks, Technology News, News
On June 9, the Wall Street Journal reported that for the last few years the National Security Agency has been relying on a software program “with the quirky name Hadoop” to help it make sense of its enormous collections of data. Named after a toy elephant that belonged to the child of one of the original developers of the program, “Hadoop,” reported the Journal, is a crucial part of “a computing and software revolution … a piece of free software that lets users distribute big-data projects across hundreds or thousands of computers.”
“Revolution” is probably the most overused word in the chronicle of Internet history, but if anything, the Wall Street Journal undersold the real story. Hadoop’s importance to how we live our lives today is hard to overstate. By making it economically feasible to extract meaning from the massive streams of data that increasingly define our online existence, Hadoop effectively enabled the surveillance state.
And not just in the narrowest, Big Brother, government-is-watching-everyone-all-the-time sense of that term. Hadoop is equally critical to private sector corporate surveillance. Facebook, Twitter, Yahoo, Amazon, Netflix — just about every big player that gathers the trillions of data “events” generated by our everyday online actions employs Hadoop as a part of their arsenal of Big Data-crunching tools. Hadoop is everywhere — as one programmer told me, “it’s taken over the world.”
The Journal’s description of Hadoop as “a piece of free software” barely scratches the surface of the significance of this particular batch of code. In the past half-decade Hadoop has emerged as one of the triumphs of the non-proprietary, open-source software programming methodology that previously gave us the Apache Web server, the Linux operating system and the Firefox browser. Hadoop belongs to nobody. Anyone can copy it, modify, extend it as they please. Funny, that: A software program developed collaboratively by programmers who believe that their code should be shared in as open and transparent a process as possible has resulted in the creation of tools that everyone from the NSA to Facebook uses to annihilate any semblance of individual privacy. But what’s even more ironic, and fascinating, is the sight of intelligence agencies like the NSA and CIA joining in and becoming integral players in the world of open source big data software. The NSA doesn’t just use Hadoop. NSA programmers have improved and extended Hadoop and donated their changes and additions back to the larger community. The CIA actively invests in start-ups that are commercializing Hadoop and other open source projects.
They’re all in it together. The spooks and the social media titans and the online commerce goliaths are collaborating to improve data-crunching software tools that enable the tracking of our behavior in fantastically intimate ways that simply weren’t possible as recently as four or five years ago. It’s a new military industrial open source Big Data complex. The gift economy has delivered us the surveillance state.

Hadoop’s earliest roots go back to 2002, when Doug Cutting, then the search director at the Internet Archive, and Michael Cafarella, a graduate student at the University of Washington, started working on an open-source search engine called “Nutch.” But the project did not get serious traction until Cutting joined Yahoo and began to merge his work into Yahoo’s larger strategic goal of improving its search engine technology so as to better compete with Google. Significantly, Yahoo executives decided not to make the project proprietary. In 2006, they blessed the formation of Hadoop, an open-source project managed under the auspices of the Apache Software Foundation. (For a much more detailed look at the history of Hadoop, please read this four-part history of Hadoop at GigaOm.)
Hadoop is basically a nifty hack. The definition, per Wikipedia, is surprisingly simple: “It supports the running of applications on large clusters of commodity hardware.” Bottom line, Hadoop provides a means for distributing both the storage and processing of an enormous amount of data over lots and lots of relatively inexpensive computers. Hadoop turned out to be cheap, fast and scalable — meaning it could expand smoothly in capacity as the flows of data it was crunching burgeoned in size, simply though plugging in extra computers to the network. Hadoop was also fundamentally modular — different parts of it could be easily replaced by custom designed chunks of software, making it seamlessly adaptable to the individual circumstances of different corporations — or government agencies.
Hadoop’s debut was timely, addressing not only the problems Yahoo faced in managing the enormous amounts of data produced by its users, but also those that the entire Internet industry was simultaneously struggling to cope with. Basically, the Internet had become a victim of its own success. The enormous flows of data generated by users of the likes of Facebook and Twitter far overwhelmed the ability of those companies to make sense of it. There was too much coming in too fast. Hadoop helped companies cope with the tsunami — it was, in the words of Jeff Hammerbacher, an early employee of Facebook, “our tool for exploiting the unreasonable effectiveness of data.”
Before Hadoop, you were at the mercy of your data. After Hadoop, you were in charge. You could figure out all kinds of interesting things. You could recognize patterns in the data and start to make inferences about what might happen if you made tweaks to your product. What did users do when the interface was adjusted like this? What kinds of ads made them more likely to pull out their credit cards? What did that batch of millions of Verizon calls reveal about the formation of a potential terrorist cell? Facebook wouldn’t be able to exploit the insights of its so-called social graph without tools like Hadoop.
“Hadoop has become the de facto standard tool for cost-effectively processing Big Data,” says Raymie Stata, who served as chief technology officer at Yahoo before eventually starting his own Hadoop-focused start-up, Altiscale. And the significance of being able to cheaply process Big Data, to accurately “measure” what your users are doing, he added, is a “big deal.”
“Once you can measure what’s happening ‘out there’ — [you can] then use those measurements to understand and ultimately influence what’s happening out there.”
With engineers at multiple companies recognizing that Hadoop offered solutions to the specific challenges they faced on a daily basis, Hadoop quickly secured the critical mass of cross-industry support necessary for an open-source software program to become an essential part of Internet infrastructure. Even engineers at Google chipped in, although Hadoop, at its core, was basically an attempt to reverse-engineer proprietary Google technology. But that’s just how the Internet has historically worked. For decades, so-called gift economy collaboration, in which the community as a whole benefits from the freely donated contributions of its members, has been a potent driver of Internet software evolution. As I wrote 16 years ago, when chronicling the birth of the Apache Web server, the success of open source software “testifies to the enduring vigor of the Internet’s cooperative, distributed approach to solving problems.” Hadoop, which down to its fundamental structural essence is a distributed approach to solving problems, emblematized this philosophy at its core.
So, in a sense, Hadoop’s success was just the same old story. But back in the mid-’90s, around the time that one of the first open source success stories, the Apache Web server, was taking off, I’m not sure that anyone would have predicted that the National Security Agency and CIA would end up becoming stalwart participants in the gift economy. Even though it makes total sense, in principle, that the fruits of government-funded software development should be shared with the general public, there’s still something cognitively disjunctive about intelligence agencies that shroud their every activity in great secrecy contributing to projects built on openness and transparency. On the one hand, employees of the NSA are appearing at conferences discussing how they have adapted Hadoop to solve the problems of dealing with unimaginably huge data sets, but on the other hand, we’re not supposed to know anything about what they are actually doing with that data.
The intertwining of the intelligence agencies with the larger open source software community could hardly be more incestuous. In 2008, a group of Yahoo employees that eventually included Doug Cutting formed a start-up designed to commercialize Hadoop called Cloudera. The CIA, through its In-Q-Tel (named after James Bond’s Q character) venture capital arm, was an early investor in, and customer of, Cloudera. The NSA built a significant piece of software that works “on top” of Hadoop called Accumulo designed to add sophisticated security controls managing how data could be accessed, and then promptly donated that code to the Apache Software Foundation. Later, a group of NSA software engineers formed another spinoff company, Sqrrl, to commercialize Accumulo.
What all this means is that the improvements to tools that the NSA is making, with the aim of more efficiently catching terrorists, are propagating into the private sector where they will be used by Facebook and Neftlix and Yahoo to more accurately target ads or influence our purchasing behavior or provide us with content algorithmically shaped to our very specific desires. And vice versa. Innovations and increased capabilities pioneered by private companies trickle back to the NSA. The collective boot-strapping never stops.
Again, in principle, there is nothing necessarily wrong going on here. There is no one to blame. Some of the fiercer apologists for unfettered free markets might complain that government involvement in open source projects unfairly competes with private sector proprietary businesses, but a much stronger case can be made that any software development work that is funded by taxpayer money should by definition be considered freely sharable with the wider public. The NSA should probably be applauded for helping to improve Hadoop. And if the capabilities unlocked by Hadoop result in the prevention of some horrific terrorist act, then every programmer who contributed a line of code to the project justly deserves some congratulation.
But there’s also an intriguing inversion occurring here of what, for better or worse, we might call the purpose of the Internet. The Internet was initially created by the U.S. government to facilitate the sharing of information between geographically separate research centers. The Internet took off in the mid-’90s in large part because the general public recognized it as a phenomenal tool for sharing information with each other. The fact that so much of the Internet’s infrastructure was also built from code that was freely shared seemed like a pleasing match of form and function.
Free software and open-source software evolution is frequently driven not so much by hope for financial gain but by individuals looking to solve their immediate engineering problems. Over time, on the Internet at large, one of those problems has turned out to be the gnarly challenge of how to manage all the data created by all those people sharing so promiscuously with each other. Hadoop can justly be seen as the natural response to all that promiscuous sharing. And it certainly helped solve the problems faced by engineers at Facebook and elsewhere.
But what ended up getting enabled by the success of Hadoop is something significantly different than good old peer-to-peer sharing. The ability to make sense out of petabytes of data isn’t necessarily useful to you or me. But it’s god’s gift to the profit-minded corporations and terrorist-seeking intelligence agencies seeking to leverage the data we generate for their own purposes, to measure our behavior and ultimately to influence it. That could mean Netflix figuring out exactly what combination of plot twists and acting talent proves irresistible to streaming video watchers or Facebook figuring out exactly how to stock our newsfeeds with advertisements that generate acceptable click-through or Twitter knowing exactly where we are on the surface of the planet so it can pop up a sponsored tweet pushing a coupon for a happy hour at the bar just down the street — or the NSA spotting a peculiar pattern of pressure cooker purchases. This is no longer about sharing information with each other; it’s about manipulation, control and punishment. It’s about keeping stock prices up. We’re a long, long way here from the ideal gift economy, where everyone brings their home-cooked delicacy to the potlatch. We’ve arrived at a destination where the tools offer more power to them than to us.
I posed a version of this analysis to Michael Cafarella, one of the original authors of Hadoop, now a computer scientist at the University of Michigan. He conceded that “there’s a certain irony that the open ideas of open source have enabled the construction of systems that can undermine openness so substantially.”
But Raymie Stata, who has been closely involved with the growth of Hadoop for the last seven years, warned against “conflating ‘open source software’ with ‘Open Society.’”
“Everyone involved with Hadoop in the early days certainly did believe that Hadoop, as a piece of open source software, would make the world a better place. I can’t say, back then, that we saw Hadoop moving from cyberspace to the real world, but we did recognize that it would become foundational to building Internet applications of the future, and we wanted to contribute to advancing that agenda.
“But individuals who find common ground in contributing to open source projects do not, as a whole, share beliefs on what constitutes the ideal ‘Open Society,’” said Stata. “Is using Big Data to make inferences about people a Bad Thing at all, no matter who does it? Or is it no big deal? Or does it depend on who’s doing it, and for what reason (and with what transparency)? Should we be more worried about Big Business, or Big Government?”
“I guess in some ways this incident is evidence that it’s hard to encode ideals in a piece of software,” said Cafarella. “The right way to do that is via legislation.”
Cafarella’s point is hard to dispute. Brian Behlendorf, one of the founders of the Apache Software Foundation, told me that at one juncture, contributors to the various software projects managed by Apache had argued over whether the license that determined the rules for how their code could be shared should include restrictions against organizations using that code for purposes deemed morally or ethically unacceptable by the open source software programmer community. But it was relatively quickly determined that to attempt such restrictions would open up an impossible to resolve subjective can of worms. Society at large has to figure out what limits it wants to put on the surveillance state, on what either Facebook or the NSA is allowed to do.
It’s also important to acknowledge that as users of online services, we benefit in many ways from our instant-gratification, access-to-everything, always on lives. But still: When we first started to log on, did we realize what the tradeoffs would be? Did we know that we were entering the Panopticon? That we would be making it substantially easier than ever before for governments and businesses to track our behavior and monitor our every whim?
Behlendorf says we kind of did. He recalls his days, fresh out of college in 1995, working for HotWired, Wired magazine’s first foray into online publishing. AT&T was running an ad on HotWired, under the theme “Imagine the Future,” that pictured an arm with a “wrist-watch phone” on it.
“Someone printed it out,” said Behlendorf, “put it up on the wall, and wrote in black marker over the top of the ad, ‘NSA primate tracking device.’”
And guess what? We went ahead and built it.
Andrew Leonard is a staff writer at Salon. On Twitter, @koxinga21. More Andrew Leonard.
You Might Also Like
More Related Stories
-
X-ray vision, coming soon
-
Why "real journalists" hate Sean Parker's wedding
-
Zynga CEO to step down
-
Tribune Co. to acquire 19 TV stations in billion-dollar deal
-
Please stop the bogus tech nostalgia eulogies
-
U.S. to Europe: Our snooping is the same as yours
-
App of the Week: Duck Duck Go
-
Will mercury be removed from vaccines?
-
Taming mother nature, one flight at a time
-
NSA reportedly spied on European Union offices
-
Phantom noise could spark diplomatic dispute
-
The mad genius of Vi Hart
-
The NSA's early years: Exposed!
-
Aero heads to Chicago
-
Suffer from social anxiety? Try this "anti-social media" app
-
Report: NSA tracked U.S. emails for a decade
-
WikiLeaks volunteer was paid FBI informant
-
Study: Monsanto GMO food claims probably false
-
When Twitter does what journalism can't
-
Social media's wildest 24 hours
-
NSA won't confirm or deny it has your data
Featured Slide Shows
7 motorist-friendly camping sites
close X- Share on Twitter
- Share on Facebook
- Thumbnails
- Fullscreen
- 1 of 9
- Previous
- Next
Sponsored Post
-
White River National Forest via Lower Crystal Lake, Colorado For those OK with the mainstream, White River Forest welcomes more than 10 million visitors a year, making it the most-visited recreation forest in the nation. But don’t hate it for being beautiful; it’s got substance, too. The forest boasts 8 wilderness areas, 2,500 miles of trail, 1,900 miles of winding service system roads, and 12 ski resorts (should your snow shredders fit the trunk space). If ice isn’t your thing: take the tire-friendly Flat Tops Trail Scenic Byway — 82 miles connecting the towns of Meeker and Yampa, half of which is unpaved for you road rebels. fs.usda.gov/whiteriveryou
Image credit: Getty
-
Chattahoochee-Oconee National Forest via Noontootla Creek, GeorgiaBoasting 10 wildernesses, 430 miles of trail and 1,367 miles of trout-filled stream, this Georgia forest is hailed as a camper’s paradise. Try driving the Ridge and Valley Scenic Byway, which saw Civil War battles fought. If the tall peaks make your engine tremble, opt for the relatively flat Oconee National Forest, which offers smaller hills and an easy trail to the ghost town of Scull Shoals. Scaredy-cats can opt for John’s Mountain Overlook, which leads to twin waterfalls for the sensitive sightseer in you. fs.usda.gov/conf
Image credit: flickr/chattoconeenf
-
Nordhouse Dunes Wilderness Area via Green Road, Michigan The only national forest in Lower Michigan, the Huron-Mainstee spans nearly 1 million acres of public land. Outside the requisite lush habitat for fish and wildlife on display, the Nordhouse Dunes Wilderness Area is among the biggest hooks for visitors: offering beach camping with shores pounded by big, cerulean surf. Splash in some rum and you just might think you were in the Caribbean. fs.usda.gov/hmnf
Image credit: umich.edu
-
Canaan Mountain via Backcountry Canaan Loop Road, West Virginia A favorite hailed by outdoorsman and author Johnny Molloy as some of the best high-country car camping sites anywhere in the country, you don’t have to go far to get away. Travel 20 miles west of Dolly Sods (among the busiest in the East) to find the Canaan Backcountry (for more quiet and peace). Those willing to leave the car for a bit and foot it would be remiss to neglect day-hiking the White Rim Rocks, Table Rock Overlook, or the rim at Blackwater River Gorge. fs.usda.gov/mnf
Image credit: Getty
-
Mt. Rogers NRA via Hurricane Creek Road, North CarolinaMost know it as the highest country they’ll see from North Carolina to New Hampshire. What they may not know? Car campers can get the same grand experience for less hassle. Drop the 50-pound backpacks and take the highway to the high country by stopping anywhere on the twisting (hence the name) Hurricane Road for access to a 15-mile loop that boasts the best of the grassy balds. It’s the road less travelled, and the high one, at that. fs.usda.gov/gwj
Image credit: wikipedia.org
-
Long Key State Park via the Overseas Highway, Florida Hiking can get old; sometimes you’d rather paddle. For a weekend getaway of the coastal variety and quieter version of the Florida Keys that’s no less luxe, stick your head in the sand (and ocean, if snorkeling’s your thing) at any of Long Key’s 60 sites. Canoes and kayaks are aplenty, as are the hot showers and electric power source amenities. Think of it as the getaway from the typical getaway. floridastateparks.org/longkey/default.cfm
Image credit: floridastateparks.org
-
Grand Canyon National Park via Crazy Jug Point, Arizona You didn’t think we’d neglect one of the world’s most famous national parks, did you? Nor would we dare lead you astray with one of the busiest parts of the park. With the Colorado River still within view of this cliff-edge site, Crazy Jug is a carside camper’s refuge from the troops of tourists. Find easy access to the Bill Hall Trail less than a mile from camp, and descend to get a peek at the volcanic Mt. Trumbull. (Fear not: It’s about as active as your typical lazy Sunday in front of the tube, if not more peaceful.) fs.usda.gov/kaibab
Image credit: flickr/Irish Typepad
-
As the go-to (weekend) getaway car for fiscally conscious field trips with friends, the 2013 MINI Convertible is your campground racer of choice, allowing you and up to three of your co-pilots to take in all the beauty of nature high and low. And with a fuel efficiency that won’t leave you in the latter, you won’t have to worry about being left stranded (or awkwardly asking to go halfsies on gas expenses).
Image credit: miniusa.com
-
Recent Slide Shows
-
7 motorist-friendly camping sites
-
Gripping photos: The people of the Turkey protests (slideshow)
-
The week in 10 pics
-
Photos: Turmoil and tear gas in Instanbul's Gezi Park - Slideshow
-
- Share on Twitter
- Share on Facebook
- Thumbnails
- Fullscreen
- 1 of 9
- Previous
- Next
-
The week in 10 pics
-
10 summer food festivals worth the pit stop
-
The week in 10 pics
-
The week in 10 pics
-
9 amazing drive-in movie theaters still standing
-
The week in 10 pics
-
The week in 10 pics
-
The week in 10 pics
-
The week in 10 pics
-
The week in 10 pics
-
The week in 10 pics
-
Netflix's April Fools' Day categories
-
The week in 10 pics
-
The week in 10 pics
-
The week in 10 pics
Related Videos
Most Read
-
NSA reportedly has secret data collection agreement with several European countries Prachi Gupta
-
The best of Tumblr porn Tracy Clark-Flory
-
The smearing of Rachel Jeantel Mary Elizabeth Williams
-
SCOTUS: No right to remain silent unless you speak up Christopher Zara, International Business Times
-
You are how you sneeze Ryan O'Hanlon, Pacific Standard
-
NYT columnist Michael Powell slams NYT columnist Thomas Friedman Jillian Rayfield
-
Thanks for nothing, college! Tim Donovan
-
The Atlantic's latest silly idea is wrong: No, fast food won't cure obesity Deena Shanker
-
"Do it again or I’m gonna call your wife”: Inside the world of financial domination Ej Dickson
-
New Bank of America whistle-blower emerges: More customer abuse secrets David Dayen
Popular on Reddit
links from salon.com
From Around the Web
Presented by Scribol
-

Coinstar (now called Outerwall) to buy up gadget recycling kiosk startup ecoATM
-

Ripple allows payments to any Bitcoin address straight from its client
-

J.K. Rowling’s Pottermore.com loses CEO as Redmayne heads back to HarperCollins
-

Heavily redesigned Opera exits beta for Windows and OS X
-

Come hang out with GigaOM in London next week










Comments
26 Comments