Why did Bradley Cooper and Jessica Alba fail to record a tip when they paid their cabbies during New York City taxi rides back in 2013? Why was Cooper near a Mediterranean restaurant in Greenwich Village? Why was Alba at a ritzy hotel in Soho?
We don't know the answers, but we do know exactly when and where the movie stars were going, and we also know there's no record of them forking over any gratuity. What's worrisome, say privacy experts, is that we know all of this not from some special government sting operation but from publicly available data about millions of people's movements throughout New York City.
That information, released in an open records request, validates the concerns of those who argue that while consumers' digital metadata may seem to be anonymous, it actually isn't. It takes just one or two other pieces of information to turn seemingly anonymous tranches of metadata into specific information about individuals -- and not just those who are famous.
"The more computing power and publicly available data, the easier it becomes to identify individuals in the data," says Utrecht University's Stefan Kulk. "In a time when even government institutions upload large online data sets for the sake of open-data policies, the scale of the problem of de-anonymized data providing insights into everyone’s day-to-day life will only increase."
In the case of the taxi info, data analyst Christopher Whong filed an open records request in March 2014 for New York's database of cab fare, tip and location information after seeing a tweet from the city's Taxi and Limousine Commission. Though that database of 174 million cab rides in 2013 includes no passenger names, software engineer Vijay Pandurangan was able to link the data to other publicly available information about license plates, cab driver identities and taxi companies' medallion numbers.
Then, to show the individualized surveillance power of the seemingly anonymous data, Anthony Tockar of Neustar Research cross-referenced the information with publicly available photos of celebrities getting into cabs with identifiable license plates. That allowed Tockar to declare that Cooper's "cab took him to Greenwich Village, possibly to have dinner at Melibea, and that he paid $10.50, with no recorded tip." He also revealed that "Alba got into her taxi outside her hotel, the Trump SoHo, and somewhat surprisingly also did not add a tip to her $9 fare." (If Cooper or Alba tipped with cash, then that might not show up on the records.)
To dispel any notions that such information could be used only to track celebrities, Tockar showed how the same data could be employed to pinpoint the home addresses -- and possible identities -- of frequent visitors to Larry Flynt’s Hustler Club.
News of taxi metadata being turned into individual-specific information follows similar stories that emerged in the wake of Edward Snowden's disclosures about the National Security Agency vacuuming up metadata.
Last year, for example, Stanford University researchers showed how medical, financial and other personal information could be disclosed just by cross-referencing phone metadata with publicly available databases. Similarly, Susan Landau, former Sun Microsystems engineer and author of the book "Surveillance or Security?" told the New Yorker that metadata can reveal details about everything from upcoming corporate transactions to journalists' sources to political negotiations.
To illustrate that, Duke University associate professor Kieran Healy published a now-legendary essay, explaining how British forces could have come to target Paul Revere -- and potentially snuff out the American Revolution -- if they had access to the same kind of metadata the NSA collects.
But, then, it’s not just the NSA that's vacuuming up data -- it can also be local governments and corporations.
Of course, they may not all have nefarious motives for collecting data. The problem, though, is that the data itself can be used in nefarious ways.