Netflix, Facebook — and the NSA: They’re all in it together

NSA, Netflix, Facebook and other e-commerce goliaths are collaborating on tools that track us in very intimate ways

Topics: NSA, gift economy, open source software, Hadoop, Surveillance, surveillance state, Privacy, online privacy, Yahoo, Facebook, CIA, Big Data, Editor's Picks, ,

Netflix, Facebook -- and the NSA: They're all in it together Kevin Spacey as Francis Underwood in "House of Cards," Edward Snowden

On June 9, the Wall Street Journal reported that for the last few years the National Security Agency has been relying on a software program “with the quirky name Hadoop” to help it make sense of its enormous collections of data. Named after a toy elephant that belonged to the child of one of the original developers of the program, “Hadoop,” reported the Journal, is a crucial part of “a computing and software revolution … a piece of free software that lets users distribute big-data projects across hundreds or thousands of computers.”

“Revolution” is probably the most overused word in the chronicle of Internet history, but if anything, the Wall Street Journal undersold the real story. Hadoop’s importance to how we live our lives today is hard to overstate. By making it economically feasible to extract meaning from the massive streams of data that increasingly define our online existence, Hadoop effectively enabled the surveillance state.

And not just in the narrowest, Big Brother, government-is-watching-everyone-all-the-time sense of that term. Hadoop is equally critical to private sector corporate surveillance. Facebook, Twitter, Yahoo, Amazon, Netflix — just about every big player that gathers the trillions of data “events” generated by our everyday online actions employs Hadoop as a part of their arsenal of Big Data-crunching tools. Hadoop is everywhere — as one programmer told me, “it’s taken over the world.”

The Journal’s description of Hadoop as “a piece of free software” barely scratches the surface of the significance of this particular batch of code. In the past half-decade Hadoop has emerged as one of the triumphs of the non-proprietary, open-source software programming methodology that previously gave us the Apache Web server, the Linux operating system and the Firefox browser. Hadoop belongs to nobody. Anyone can copy it, modify, extend it as they please. Funny, that: A software program developed collaboratively by programmers who believe that their code should be shared in as open and transparent a process as possible has resulted in the creation of tools that everyone from the NSA to Facebook uses to annihilate any semblance of individual privacy. But what’s even more ironic, and fascinating, is the sight of intelligence agencies like the NSA and CIA joining in and becoming integral players in the world of open source big data software. The NSA doesn’t just use Hadoop. NSA programmers have improved and extended Hadoop and donated their changes and additions back to the larger community. The CIA actively invests in start-ups that are commercializing Hadoop and other open source projects.

They’re all in it together. The spooks and the social media titans and the online commerce goliaths are collaborating to improve data-crunching software tools that enable the tracking of our behavior in fantastically intimate ways that simply weren’t possible as recently as four or five years ago. It’s a new military industrial open source Big Data complex. The gift economy has delivered us the surveillance state.

Hadoop’s earliest roots go back to 2002, when Doug Cutting, then the search director at the Internet Archive, and Michael Cafarella, a graduate student at the University of Washington, started working on an open-source search engine called “Nutch.” But the project did not get serious traction until Cutting joined Yahoo and began to merge his work into Yahoo’s larger strategic goal of improving its search engine technology so as to better compete with Google. Significantly, Yahoo executives decided not to make the project proprietary. In 2006, they blessed the formation of Hadoop, an open-source project managed under the auspices of the Apache Software Foundation. (For a much more detailed look at the history of Hadoop, please read this four-part history of Hadoop at GigaOm.)

Hadoop is basically a nifty hack. The definition, per Wikipedia, is surprisingly simple: “It supports the running of applications on large clusters of commodity hardware.” Bottom line, Hadoop provides a means for distributing both the storage and processing of an enormous amount of data over lots and lots of relatively inexpensive computers. Hadoop turned out to be cheap, fast and scalable — meaning it could expand smoothly in capacity as the flows of data it was crunching burgeoned in size, simply though plugging in extra computers to the network. Hadoop was also fundamentally modular —  different parts of it could be easily replaced by custom designed chunks of software, making it seamlessly adaptable to the individual circumstances of different corporations — or government agencies.

Hadoop’s debut was timely, addressing not only the problems Yahoo faced in managing the enormous amounts of data produced by its users, but also those that the entire Internet industry was simultaneously struggling to cope with. Basically, the Internet had become a victim of its own success. The enormous flows of data generated by users of the likes of Facebook and Twitter far overwhelmed the ability of those companies to make sense of it. There was too much coming in too fast. Hadoop helped companies cope with the tsunami — it was, in the words of Jeff Hammerbacher, an early employee of Facebook, “our tool for exploiting the unreasonable effectiveness of data.”

Before Hadoop, you were at the mercy of your data. After Hadoop, you were in charge. You could figure out all kinds of interesting things. You could recognize patterns in the data and start to make inferences about what might happen if you made tweaks to your product. What did users do when the interface was adjusted like this? What kinds of ads made them more likely to pull out their credit cards? What did that batch of millions of Verizon calls reveal about the formation of a potential terrorist cell? Facebook wouldn’t be able to exploit the insights of its so-called social graph without tools like Hadoop.

“Hadoop has become the de facto standard tool for cost-effectively processing Big Data,” says Raymie Stata, who served as chief technology officer at Yahoo before eventually starting his own Hadoop-focused start-up, Altiscale. And the significance of being able to cheaply process Big Data, to accurately “measure” what your users are doing, he added, is a “big deal.”

“Once you can measure what’s happening ‘out there’ — [you can] then use those measurements to understand and ultimately influence what’s happening out there.”

With engineers at multiple companies recognizing that Hadoop offered solutions to the specific challenges they faced on a daily basis, Hadoop quickly secured the critical mass of cross-industry support necessary for an open-source software program to become an essential part of Internet infrastructure. Even engineers at Google chipped in, although Hadoop, at its core, was basically an attempt to reverse-engineer proprietary Google technology. But that’s just how the Internet has historically worked. For decades, so-called gift economy collaboration, in which the community as a whole benefits from the freely donated contributions of its members, has been a potent driver of Internet software evolution. As I wrote 16 years ago, when chronicling the birth of the Apache Web server, the success of open source software “testifies to the enduring vigor of the Internet’s cooperative, distributed approach to solving problems.” Hadoop, which down to its fundamental structural essence is a distributed approach to solving problems, emblematized this philosophy at its core.

So, in a sense, Hadoop’s success was just the same old story. But back in the mid-’90s, around the time that one of the first open source success stories, the Apache Web server, was taking off, I’m not sure that anyone would have predicted that the National Security Agency and CIA would end up becoming stalwart participants in the gift economy. Even though it makes total sense, in principle, that the fruits of government-funded software development should be shared with the general public, there’s still something cognitively disjunctive about intelligence agencies that shroud their every activity in great secrecy contributing to projects built on openness and transparency. On the one hand, employees of the NSA are appearing at conferences discussing how they have adapted Hadoop to solve the problems of dealing with unimaginably huge data sets, but on the other hand, we’re not supposed to know anything about what they are actually doing with that data.

The intertwining of the intelligence agencies with the larger open source software community could hardly be more incestuous. In 2008, a group of Yahoo employees that eventually included Doug Cutting formed a start-up designed to commercialize Hadoop called Cloudera. The CIA, through its In-Q-Tel (named after James Bond’s Q character) venture capital arm, was an early investor in, and customer of, Cloudera. The NSA built a significant piece of software that works “on top” of Hadoop called Accumulo designed to add sophisticated security controls managing how data could be accessed, and then promptly donated that code to the Apache Software Foundation. Later, a group of NSA software engineers formed another spinoff company, Sqrrl, to commercialize Accumulo.

What all this means is that the improvements to tools that the NSA is making, with the aim of more efficiently catching terrorists, are propagating into the private sector where they will be used by Facebook and Neftlix and Yahoo to more accurately target ads or influence our purchasing behavior or provide us with content algorithmically shaped to our very specific desires. And vice versa. Innovations and increased capabilities pioneered by private companies trickle back to the NSA. The collective boot-strapping never stops.

Again, in principle, there is nothing necessarily wrong going on here. There is no one to blame. Some of the fiercer apologists for unfettered free markets might complain that government involvement in open source projects unfairly competes with private sector proprietary businesses, but a much stronger case can be made that any software development work that is funded by taxpayer money should by definition be considered freely sharable with the wider public. The NSA should probably be applauded for helping to improve Hadoop. And if the capabilities unlocked by Hadoop result in the prevention of some horrific terrorist act, then every programmer who contributed a line of code to the project justly deserves some congratulation.

But there’s also an intriguing inversion occurring here of what, for better or worse, we might call the purpose of the Internet. The Internet was initially created by the U.S. government to facilitate the sharing of information between geographically separate research centers. The Internet took off in the mid-’90s in large part because the general public recognized it as a phenomenal tool for sharing information with each other. The fact that so much of the Internet’s infrastructure was also built from code that was freely shared seemed like a pleasing match of form and function.

Free software and open-source software evolution is frequently driven not so much by hope for financial gain but by individuals looking to solve their immediate engineering problems. Over time, on the Internet at large, one of those problems has turned out to be the gnarly challenge of how to manage all the data created by all those people sharing so promiscuously with each other. Hadoop can justly be seen as the natural response to all that promiscuous sharing. And it certainly helped solve the problems faced by engineers at Facebook and elsewhere.

But what ended up getting enabled by the success of Hadoop is something significantly different than good old peer-to-peer sharing. The ability to make sense out of petabytes of data isn’t necessarily useful to you or me. But it’s god’s gift to the profit-minded corporations and terrorist-seeking intelligence agencies seeking to leverage the data we generate for their own purposes, to measure our behavior and ultimately to influence it. That could mean Netflix figuring out exactly what combination of plot twists and acting talent proves irresistible to streaming video watchers or Facebook figuring out exactly how to stock our newsfeeds with advertisements that generate acceptable click-through or Twitter knowing exactly where we are on the surface of the planet so it can pop up a sponsored tweet pushing a coupon for a happy hour at the bar just down the street — or the NSA spotting a peculiar pattern of pressure cooker purchases. This is no longer about sharing information with each other; it’s about manipulation, control and punishment. It’s about keeping stock prices up. We’re a long, long way here from the ideal gift economy, where everyone brings their home-cooked delicacy to the potlatch. We’ve arrived at a destination where the tools offer more power to them than to us.

I posed a version of this analysis to Michael Cafarella, one of the original authors of Hadoop, now a computer scientist at the University of Michigan. He conceded that “there’s a certain irony that the open ideas of open source have enabled the construction of systems that can undermine openness so substantially.”

But Raymie Stata, who has been closely involved with the growth of Hadoop for the last seven years, warned against “conflating ‘open source software’ with ‘Open Society.’”

“Everyone involved with Hadoop in the early days certainly did believe that Hadoop, as a piece of open source software, would make the world a better place. I can’t say, back then, that we saw Hadoop moving from cyberspace to the real world, but we did recognize that it would become foundational to building Internet applications of the future, and we wanted to contribute to advancing that agenda.

“But individuals who find common ground in contributing to open source projects do not, as a whole, share beliefs on what constitutes the ideal ‘Open Society,’” said Stata. “Is using Big Data to make inferences about people a Bad Thing at all, no matter who does it? Or is it no big deal? Or does it depend on who’s doing it, and for what reason (and with what transparency)? Should we be more worried about Big Business, or Big Government?”

“I guess in some ways this incident is evidence that it’s hard to encode ideals in a piece of software,” said Cafarella. “The right way to do that is via legislation.”

Cafarella’s point is hard to dispute. Brian Behlendorf, one of the founders of the Apache Software Foundation, told me that at one juncture, contributors to the various software projects managed by Apache had argued over whether the license that determined the rules for how their code could be shared should include restrictions against organizations using that code for purposes deemed morally or ethically unacceptable by the open source software programmer community. But it was relatively quickly determined that to attempt such restrictions would open up an impossible to resolve subjective can of worms. Society at large has to figure out what limits it wants to put on the surveillance state, on what either Facebook or the NSA is allowed to do.

It’s also important to acknowledge that as users of online services, we benefit in many ways from our instant-gratification, access-to-everything, always on lives. But still: When we first started to log on, did we realize what the tradeoffs would be? Did we know that we were entering the Panopticon? That we would be making it substantially easier than ever before for governments and businesses to track our behavior and monitor our every whim?

Behlendorf says we kind of did. He recalls his days, fresh out of college in 1995, working for HotWired, Wired magazine’s first foray into online publishing. AT&T was running an ad on HotWired, under the theme “Imagine the Future,” that pictured an arm with a “wrist-watch phone” on it.

“Someone printed it out,” said Behlendorf, “put it up on the wall, and wrote in black marker over the top of the ad, ‘NSA primate tracking device.’”

And guess what? We went ahead and built it.

Andrew Leonard

Andrew Leonard is a staff writer at Salon. On Twitter, @koxinga21.

Featured Slide Shows

  • Share on Twitter
  • Share on Facebook
  • 1 of 11
  • Close
  • Fullscreen
  • Thumbnails

    Ten spectacular graphic novels from 2014

    Beautiful Darkness by Fabien Vehlmann & Kerascoët
    Kerascoët's lovely, delicate pen-and-watercolor art -- all intricate botanicals, big eyes and flowing hair -- gives this fairy story a deceptively pretty finish. You find out quickly, however, that these are the heartless and heedless fairies of folk legend, not the sentimental sprites beloved by the Victorians and Disney fans. A host of tiny hominid creatures must learn to survive in the forest after fleeing their former home -- a little girl who lies dead in the woods. The main character, Aurora, tries to organize the group into a community, but most of her cohort is too capricious, lazy and selfish to participate for long. There's no real moral to this story, which is refreshing in itself, beyond the perpetual lessons that life is hard and you have to be careful whom you trust. Never has ugly truth been given a prettier face.

    Ten spectacular graphic novels from 2014

    Climate Changed: A Personal Journey Through the Science by Philippe Squarzoni
    Squarzoni is a French cartoonist who makes nonfiction graphic novels about contemporary issues and politics. While finishing up a book about France under Jacques Chirac, he realized that when it came to environmental policy, he didn't know what he was talking about. "Climate Changed" is the result of his efforts to understand what has been happening to the planet, a striking combination of memoir and data that ruminates on a notoriously elusive, difficult and even imponderable subject. Panels of talking heads dispensing information (or Squarzoni discussing the issues with his partner) are juxtaposed with detailed and meticulous yet lyrical scenes from the author's childhood, the countryside where he takes a holiday and a visit to New York. He uses his own unreachable past as a way to grasp the imminent transformation of the Earth. The result is both enlightening and unexpectedly moving.

    Ten spectacular graphic novels from 2014

    Here by Richard McGuire
    A six-page version of this innovative work by a regular contributor to the New Yorker first appeared in RAW magazine 25 years ago. Each two-page spread depicts a single place, sometimes occupied by a corner of a room, over the course of 4 billion years. The oldest image is a blur of pink and purple gases; others depict hazmat-suited explorers from 300 years in the future. Inset images show the changing decor and inhabitants of the house throughout its existence: family photos, quarrels, kids in Halloween costumes, a woman reading a book, a cat walking across the floor. The cumulative effect is serene and ravishing, an intimation of the immensity of time and the wonder embodied in the humblest things.

    Ten spectacular graphic novels from 2014

    Kill My Mother by Jules Feiffer
    The legendary Pulitzer Prize-winning cartoonist delivers his debut graphic novel at 85, a deliriously over-the-top blend of classic movie noir and melodrama that roams from chiaroscuro Bay City to Hollywood to a USO gig in the Pacific theater of World War II. There's a burnt-out drunk of a private eye, but the story is soon commandeered by a multigenerational collection of ferocious women, including a mysterious chanteuse who never speaks, a radio comedy writer who makes a childhood friend the butt of a hit series and a ruthless dame intent on making her whiny coward of a husband into a star. There are disguises, musical numbers and plenty of gunfights, but the drawing is the main attraction. Nobody convey's bodies in motion more thrillingly than Feiffer, whether they're dancing, running or duking it out. The kid has promise.

    Ten spectacular graphic novels from 2014

    The Motherless Oven by Rob Davis
    This is a weird one, but in the nervy surreal way that word-playful novels like "A Clockwork Orange" or "Ulysses" are weird. The main character, a teenage schoolboy named Scarper Lee, lives in a world where it rains knives and people make their own parents, contraptions that can be anything from a tiny figurine stashable in a pocket to biomorphic boiler-like entities that seem to have escaped from Dr. Seuss' nightmares. Their homes are crammed with gadgets they call gods and instead of TV they watch a hulu-hoop-size wheel of repeating images that changes with the day of the week. They also know their own "death day," and Scarper's is coming up fast. Maybe that's why he runs off with the new girl at school, a real troublemaker, and the obscurely dysfunctional Castro, whose mother is a cageful of talking parakeets. A solid towline of teenage angst holds this manically inventive vision together, and proves that some graphic novels can rival the text-only kind at their own game.

    Ten spectacular graphic novels from 2014

    NOBROW 9: It's Oh So Quiet
    For each issue, the anthology magazine put out by this adventurous U.K.-based publisher of independent graphic design, illustration and comics gives 45 artists a four-color palette and a theme. In the ninth issue, the theme is silence, and the results are magnificent and full of surprises. The comics, each told in images only, range from atmospheric to trippy to jokey to melancholy to epic to creepy. But the two-page illustrations are even more powerful, even if it's not always easy to see how they pertain to the overall concept of silence. Well, except perhaps for the fact that so many of them left me utterly dumbstruck with visual delight.

    Ten spectacular graphic novels from 2014

    Over Easy by Mimi Pond
    When Pond was a broke art student in the 1970s, she took a job at a neighborhood breakfast spot in Oakland, a place with good food, splendid coffee and an endlessly entertaining crew of short-order cooks, waitresses, dishwashers and regular customers. This graphic memoir, influenced by the work of Pond's friend, Alison Bechdel, captures the funky ethos of the time, when hippies, punks and disco aficionados mingled in a Bay Area at the height of its eccentricity. The staff of the Imperial Cafe were forever swapping wisecracks and hopping in and out of each other's beds, which makes them more or less like every restaurant team in history. There's an intoxicating esprit de corps to a well-run everyday joint like the Imperial Cafe, and never has the delight in being part of it been more winningly portrayed.

    Ten spectacular graphic novels from 2014

    The Shadow Hero by Gene Luen Yang and Sonny Liew
    You don't have to be a superhero fan to be utterly charmed by Yang and Liew's revival of a little-known character created in the 1940s by the cartoonist Chu Hing. This version of the Green Turtle, however, is rich in characterization, comedy and luscious period detail from the Chinatown of "San Incendio" (a ringer for San Francisco). Hank, son of a mild-mannered grocer, would like to follow in his father's footsteps, but his restless mother (the book's best character and drawn with masterful nuance by Liew) has other ideas after her thrilling encounter with a superhero. Yang's story effortlessly folds pathos into humor without stooping to either slapstick or cheap "darkness." This is that rare tribute that far surpasses the thing it celebrates.

    Ten spectacular graphic novels from 2014

    Shoplifter by Michael Cho
    Corinna Park, former English major, works, unhappily, in a Toronto advertising agency. When the dissatisfaction of the past five years begins to oppress her, she lets off steam by pilfering magazines from a local convenience store. Cho's moody character study is as much about city life as it is about Corinna. He depicts her falling asleep in front of the TV in her condo, brooding on the subway, roaming the crowded streets after a budding romance goes awry. Like a great short story, this is a simple tale of a young woman figuring out how to get her life back, but if feels as if it contains so much of contemporary existence -- its comforts, its loneliness, its self-deceptions -- suspended in wintery amber.

    Ten spectacular graphic novels from 2014

    Through the Woods by Emily Carroll
    This collection of archetypal horror, fairy and ghost stories, all about young girls, comes lushly decked in Carroll's inky black, snowy white and blood-scarlet art. A young bride hears her predecessor's bones singing from under the floorboards, two friends make the mistake of pretending to summon the spirits of the dead, a family of orphaned siblings disappears one by one into the winter nights. Carroll's color-saturated images can be jagged, ornate and gruesome, but she also knows how to chill with absence, shadows and a single staring eye. Literary readers who cherish the work of Kelly Link or the late Angela Carter's collection, "The Bloody Chamber," will adore the violent beauty on these pages.

  • Recent Slide Shows



Comment Preview

Your name will appear as username ( settings | log out )

You may use these HTML tags and attributes: <a href=""> <b> <em> <strong> <i> <blockquote>