Big Tech can't handle Big Data: There's too much information, and no way to use it

We have vastly more information than ever before. But no one knows how to use it to stop trolls or terrorists

Published November 3, 2017 5:00AM (EDT)

 (Salon/Ilana Lidagoster)
(Salon/Ilana Lidagoster)

The overall takeaway from the recent congressional hearings where executives from three of the world's top tech firms told members of the House and Senate about their efforts to detect and examine Russian trolling operations is just how little everyone involved seems to understand the situation.

While the members of Congress seemed duly sensitive to the larger security implications of the issues involved, many of them seemed to lack a grasp of technological problems. Sen. John Kennedy, a Louisiana Republican, provided perhaps the best example of this when he asked Google attorney Richard Salgado whether or not his company was "the biggest newspaper in 92 countries." Salgado had to inform him that "we're not a newspaper." (It is, however, the owner of YouTube and its own news search engine.)

Across three separate hearings, the politicians conveyed a rightful sense of concern that Google, Facebook and Twitter had allowed their massive social platforms to become tools for trolls employed by various agencies funded or assisted by the Russian government. Those agencies apparently sought to use propaganda techniques honed at home to destabilize American politics, beginning in 2015. Virginia Sen. Mark Warner, the Senate Intelligence Committee's ranking Democrat, was one of several elected officials who noted that not only had the tech companies failed to take the issue seriously in the past, they had failed to respond to congressional requests for information.

Most of the Capitol Hill politicians, however, seemed to think that Russian trolling efforts continue to happen because Silicon Valley has refused to fix it. No one seemed focused on the real problem: Technology companies have become too dependent upon computers to process data.

In the old days, meaning before a large portion of the world had access to the internet (believe it or not, we're only at 51 percent right now), this wasn't as much of a problem. There simply wasn't as much data. But as the web has grown and as social media have become more integral in people's lives, it's become overwhelmingly clear that for all the talk about machine learning or data fusion, the simple reality is that no one has the capacity to comprehend or even locate the massive amount of information out there. Not even close.

Google, which says it wants to organize the world’s information, has been estimated to have data on only about 50 billion pages in its index. But the vast majority of data on the web is not public and cannot be found by Google or any other search engine.

Whether in the form of webmail that requires a password or secret hacker sites accessed only by an IP address, the "Deep Web" has been estimated to be about 500 times the size of the regular web. Even Facebook profiles that have been set to private would technically be classified as Deep Web pages.

The National Security Agency, the U.S. intelligence organization charged with intercepting foreign communications, likely has a much larger database than Google’s. All available evidence indicates that it too has the same problem: too much data and not enough understanding of it.

Both government agencies and web services companies have made a dramatic shift away from human intelligence, in the hopes that hoarding data would enable them to make sense of it.

It was a sad coincidence that the House Judiciary Committee’s hearing about Big Tech's failure to understand its data was interrupted with news that a man who had affiliated himself with ISIS had engaged in a terrorist act in New York. As in numerous past cases, data collected by the NSA has proven useful in investigating the background of the accused killer, Sayfullo Saipov. But no one successfully connected the dots beforehand. Like the Russian trolls, Saipov had been at his activities for some time, and had apparently become radicalized in part by watching ISIS videos online.

The underlying problem in the case of dealing with trolling and terrorism is that the organizations who control the data refuse to spend the money that would be necessary to get human intelligence to decipher it. Computers are great for storing vast quantities of information and helping humans find correlations between them -- but they can only do what they are programmed to do. Despite the tech executives’ promises to do better, the reality is that their work even now is only about discerning patterns in past behavior instead of diagnosing and responding to present or future circumstances. Algorithms cannot solve human problems.

Unfortunately, many tech firms seem to have deliberately isolated themselves from human input. As secretary of state, Hillary Clinton issued warnings as early as 2011 about online manipulation by Russian president Vladimir Putin, and there have been any number of articles and books about the phenomenon. For all their pretenses of operating “social networks,” Facebook and Twitter seem to have been completely oblivious about social trends outside the United States. That’s particularly egregious, as Business Insider’s Linette Lopez noted last month, because both companies were explicitly called out during a congressional hearing in 2014, long before Donald Trump declared his presidential candidacy.

“The Kremlin also funds ‘troll farms,’ regime-funded companies which hire people to spread messages on social media, using Facebook, Twitter, newspaper comment sections and many other spaces,” Peter Pomeranzev testified three years ago before the House Foreign Affairs Committee. People who were paying attention knew what Russia was doing. The tech world refused to listen to them.

“Big Data,” to use the industry buzzword, is of no use without people to interpret it. The NSA and the rest of America's intelligence agencies have become addicted to "signal intelligence" gained from electronic eavesdropping instead of "human intelligence" gained from people on the ground.Not only does this snooping generate more data than the NSA can possibly analyze, it also takes intelligence agencies' attention away from devoting more funds and people to areas of the world that ought to be much more significant to their time than keeping tabs on an 80-year-old emailing her daughter who happens to live in Bangkok.

Social media firms have the exact same dilemma, and the only solution lies in investing more in people than machines to monitor their platforms for dangerous propaganda and fake news while also avoiding the "false positives" of flagging valid, even less-popular, opinions as somehow being inappropriate. Computers can help at all of this, of course, but the human touch is needed more than ever to keep social networks sociable.

Facebook seems to have partially figured out what's necessary, having committed recently to hiring 1,000 more people for an internal team to review submitted advertisements. Given that the social network now has over 1.3 billion daily users, however, this number isn't nearly enough to keep pace. Just by comparison, as Axios' Christopher Matthews has observed, America's emaciated auto companies employ far more people than do their much more valuable technology counterparts.

employees vs jobs

The immediate rejoinder to the above graphic might be that web services companies aren't producing physical products across the world as Ford or GM do, which is true. It's also irrelevant. Last year, Facebook reported that its users spent an average of 50 minutes a day scrolling around on its site. Given the age limits on driving, this means that people spend more time on social media than they do in their cars. If Facebook and its rivals aren't willing to drastically spend more to improve the user experience, then they deserve to lose their marketshare. Already, people under 18 are expected to use Facebook less in 2017. Without significant reforms, Russian trolls' goal of polluting the web in the hopes of driving people into informational isolation is sure to succeed.


By Matthew Sheffield

A writer, web developer, and former tv producer, Matthew Sheffield covers politics, media, and technology for Salon. You can email him via m.sheffield@salon.com or follow him on Twitter.

MORE FROM Matthew Sheffield