On Aug. 28, 2001, a 33-year-old Egyptian flight-school student named Mohamed Atta walked into a Kinko’s copy shop in Hollywood, Fla., and sat down at a computer with Internet access. He logged on to American Airlines’ Web site, punched in a frequent-flyer account number he’d signed up for three days before, and ordered two first-class, one-way e-tickets for a Sept. 11 flight from Boston to Los Angeles. Atta paid for the tickets — one of which was for Abdulaziz Alomari, a Saudi flight student also living in Florida — with a Visa card he had recently been issued.
The next day, Hamza Alghamdi, a Saudi man who was also training to become a pilot, went to the same Kinko’s. There, he used a Visa debit card to purchase a one-way seat on United Airlines Flight 175, another Sept. 11 flight from Boston to Los Angeles. The day after that, Ahmed Alghamdi, Hamza’s brother, used the same debit card to purchase a business-class seat on Flight 175; he might have done it from the Hollywood Kinko’s, too. And at around the same time, all across the country, 15 other Arab men, several of them flight students, were also buying seats on California-bound flights leaving on the morning of Sept. 11. Six of the men gave the airlines Atta’s home phone number as a principal point of contact. Some of them paid for the seats with the same credit card. A few used identical frequent-flyer numbers.
It’s now obvious that there was a method to what the men did that August; had someone been on their trail, their actions would have seemed too synchronized, and the web of connections between them too intricate, to have been dismissed as mere coincidence. Something was up. And if the authorities had enjoyed access, at the time, to the men’s lives — to their credit card logs, their bank records, details of their e-mail and cellphone usage, their travel itineraries, and to every other electronic footprint that people leave in modern society — the government might have seen in the disparate efforts of 19 men the makings of the plot they were to execute on Sept. 11, 2001. Right?
We could have predicted it. That’s the underlying assumption of Total Information Awareness, a new Defense Department program that aims to collect and analyze mountains of personal data — on foreigners as well as Americans — in the hope of spotting the sort of “suspicious” behavior that preceded the attacks on New York and Washington. The effort, sponsored by the Defense Advanced Research Projects Agency, or DARPA, is at this point only a vaguely defined research project; officials at the agency have so far declined to fully brief the public on the program and its potential cost, and the few documents made available have stressed that technologists will need several years to achieve many of TIA’s goals.
Civil libertarians, not unexpectedly, are already raising a ruckus, their temper brought to a flaring point by the appointment of the man tapped to head the agency: John Poindexter, Ronald Reagan’s national security advisor, who was convicted (though, on appeal, acquitted) of lying to Congress during the Iran-Contra scandal. The invasion of privacy threatened by the name “Total Information Awareness” itself is also sure to raise constitutional questions. But computer scientists who specialize in the kinds of technologies necessary to make something like TIA work are intrigued — even as they express concern. For some, the threat posed by terrorism is so great that the need for a comprehensive response can be equated to the need for the Manhattan Project. It’s a comparison meant to convey both how dangerous and how vital to our society constant data collection may be.
“Frankly, I don’t see any other way for us to survive as a civilization,” says Jeffrey Ullman, a computer scientist at Stanford University and an expert on database theory. “We’re heading for a world where any creep with a grudge can build himself a dirty bomb. Al-Qaida has just broken new ground, but you can’t see these things as a unique phenomenon. We have to have in place a system that makes it very hard for individuals anywhere to do such things.”
But can a system like TIA ever work? There are obvious, huge technical problems, including the sheer amount of data that will have to be analyzed; the difficulties in integrating disparate databases; and the challenge of predicting unprecedented terrorist threats. The whole idea might seem, to a non-expert, like just another unwieldy, expensive, and dangerous bit of American military excess.
Specialists in “data mining” technologies, including people who are critical of the Bush administration, are, however, guardedly hopeful. They worry about many aspects of such a program: the “false positives,” the harm to privacy, the possibility that personal information will be misused, the almost inevitable codification of racial and religious profiling. They stress that there should be strict laws governing the collection of data. But most of them think it could work and should at least be researched — a conclusion that on at least one level is not too surprising: Funding for TIA means more funding for computer scientists.
Public outcry has so far been muted. People already feel constantly monitored, and one may wonder why the FBI shouldn’t know you prefer Paul Newman’s brand of marinara when your supermarket is well aware that you do. Privacy experts provide an obvious response: Your supermarket can’t put you in jail. They also say that it’s still early and that once the scope of TIA becomes widely known, there will be widespread agitation over its invasiveness and the consequences of its misuse.
They could be right. But what if TIA does work? What if it can spot the kind of trail the 9/11 hijackers left in their wake — the test flights, the car rentals, the gym memberships, the flight schools, the public Internet terminals, the driver’s licenses with fake addresses, the one-way tickets, all of which are completely innocent when one person does them, but which could raise flags when several people (who know each other) do them at around the same time. Would the public be for a system that helps find terrorists, despite its concerns over civil liberties?
Shortly after Sept. 11, 2001, Stanford’s Ullman posted on his Web site a long essay he’d written reacting to the attacks. The piece was mostly political; Ullman criticized religious fundamentalism, Palestinians who think terrorism will buy them freedom, and the misplaced zeal of our drug war (which he says can stand in the way of the war on terrorism). There was only one part that had anything to do with his research:
“Modern technology has given criminals and terrorists many new and deadly options,” he wrote. “Just about the only defensive weapon to come out of the developments of the past 50 years is information technology: our ability to learn electronically what evils are being planned. If we use it wisely, we can keep our personal freedom, yet use information effectively against its enemies.” Ullman says that he’s been thinking about such a system since 1998, when al-Qaida bombed two U.S. embassies in Africa, and the New York and Washington attacks only firmed up his convictions. He now thinks that a system like TIA is critical to our safety.
The specific information technology that Ullman believes will be our salvation is called data mining. If you tend to use such modern conveniences as credit cards, supermarkets and online bookstores, chances are you’ve been helped — or, depending on how you see it, hurt — by data mining. Broadly speaking, the phrase means the process of looking at a heap of information and finding something you think you might want. It implies a “fuzziness” about your search, a hunt for patterns buried in the data that are not obvious. Credit card companies use a form of data mining to determine whether your purchases look “unusual” and may, therefore, be fraudulent. Amazon.com uses it to recommend books by looking at other books you’ve purchased. When you hand over your discount-club card at a grocery store checkout, you’re actually letting the store keep data on your personal shopping habits; some chains are finding ways to mine that data.
Total Information Awareness uses a data-mining system that DARPA calls Evidence Extraction and Link Discovery (EELD). According to the TIA site, the system will have “detection capabilities to extract relevant data and relationships about people, organizations, and activities from message traffic and open source data. It will link items relating potential terrorist groups or scenarios, and learn patterns of different groups or scenarios to identify new organizations or emerging threats.”
What that means, specifically, is illustrated on the TIA Web site by a graphic showing several workers at a uranium plant who’ve been “recruited to steal the uranium.” Three of the workers have been contacted — apparently without each other’s knowledge — by a “black market” dealer who wants uranium. That dealer has ties to another dealer, who, through a middle man, recruits a dump-truck operator to transport the load. Altogether, about half a dozen people are involved in the scheme, and the TIA site suggests that one way authorities might have pieced together the whole thing is by monitoring everyone to “discover relationships and learn patterns of activity.”
But the graphic offers a disingenuous example of TIA in action. By focusing on a uranium plant and not on, say, the settings where the 9/11 hijackers or the U.S.S. Cole bombers or Tim McVeigh planned their attacks — regular, everyday places — the graphic tends to downplay the scope of the TIA program. Presumably, everyone who works at a nuclear fuel plant is already heavily scrutinized; it’s the rest of us, the people who wouldn’t know uranium from plutonium but who might have stepped into a Kinko’s once or twice, that TIA will want to monitor. The main difference between what TIA will do and what other surveillance programs already do is in this fact: TIA will work only by monitoring everyone. Though it may put extra emphasis on people who work at high-risk places like uranium plants, that is not its main function. Its main function is to ferret out a picture of a threat from a confluence of what seem to be normal activities, and the more of these normal activities it has recorded — that is, the more people it is monitoring — the better it works.
In such a scenario, a non-farmer who buys fertilizer that could be used for making a bomb, or a flight-school student with an Arab surname, or someone who does something as seemingly innocuous as buy a book about the Taliban, might raise a TIA warning flag. And someone who did all three would likely get a visit from the FBI.
“Collecting everything — that’s what would give it its power,” explains Raghu Ramakrishnan, a computer scientist at the University of Wisconsin at Madison. To determine whether an individual might be a threat, the system would look at all of his activities and all his relationships, and “you would ask if there is statistically significant evidence that these activities are ‘suspicious,’” Ramakrishnan says. “If three things occur together, you might be able to make the statement that they are ‘highly correlated’ — that in, say, 99.9 percent of the cases where I found these two activities occurring together, I would also find this other thing happening.”
Take as an example the purchase of one-way airline tickets. For years, airlines have known that this is one signal of dangerous activity but does not in and of itself indicate a sure threat. (Before 9/11, only international passengers using one-way tickets were deemed a high security risk; domestic passengers going one way, even on a ticket purchased at the counter with cash, weren’t seen as much of a problem at all, which is one reason why some of the hijackers weren’t more closely examined.) Buying a one-way ticket could be one flag in TIA — an indication of a marginally higher risk. But when TIA notices that someone has purchased a one-way ticket, it might also look to see if he has associated with anyone else who has done the same. Have they all recently done other things — enrolled in flight schools, purchased weapons, etc. — that would make them even more suspicious? (Pointing to Richard Reid, the British man who pleaded guilty to an attempt last year to blow up an airliner with a bomb hidden in his shoe, Ullman said that TIA might not be any more effective if it took a person’s ethnicity into account; it’s not clear whether TIA will consider race and religion, but it could if it wanted to.)
TIA would be set up to do its work automatically and in close to real time: The suspect buys the one-way ticket, his past activities and affiliations are examined, and then, if his risk factor meets a certain threshold, an intelligence or law enforcement analyst is notified. According to the Web site, TIA “provides focused warnings within an hour after a triggering event occurs or an evidence threshold is passed.”
If TIA works this cleanly, many say that the chief problem it raises — its knowledge about you, personally — is not much of a problem at all: After all, it has information about you only so it can determine what a good guy looks like. You, as an innocent, are in the database mainly as an example of someone who’s not a terrorist: the guy who buys a one-way ticket every once in a while because of some emergency business. John Poindexter would call you “noise.” In an interview with the Washington Post, he described TIA as a giant filter to separate noise from what he calls “signal.”
To hear Poindexter describe it, the system sounds almost elegant; and if you take it to its technological extreme, there’s also a supernatural aspect to it. TIA would know everything; TIA would predict evil; TIA could save the world. Indeed, some of TIA’s research projects sound as though they’ve been copied from the Psychic Friends’ Network. One program, “Wargaming the Asymmetric Environment,” for instance, would try to predict the “behavior of specific terrorists by examining their behavior in the broader context of their political, cultural and ideological environment,” according to the site. It goes on to say that “indication and warning models have been tested historically, and in some cases operationally, to predict an active terrorist group’s next action (attack/no attack, target characteristics, location characteristics, tactical characteristics, timeframes, and motivating factors), and test results have been shown to be statistically significant.”
You can see why more than a few pundits have compared TIA to the notion of “precrime” imagined in the Philip K. Dick short story (and Tom Cruise movie) “Minority Report.” The comparison is not meant to be a compliment.
There are several technological and mathematical reasons why TIA can’t become truly oracular. Its main limitation is that it could never really know everything. Indeed, how much it could conceivably know — and how fast it could know it — is at this point unclear; a database on a huge scale that’s meant to be as dynamic as TIA has never been set up before, experts say, and nobody knows if it’s even possible. But even if DARPA does manage to create the database, TIA will face another limitation: It can only know what you do, not what you think. And, though it would have some idea — maybe even a good idea — of what a terrorist plan “looks” like, TIA would be limited to terrorist attacks it has seen in the past. And it’s not clear that all new terrorism will look like old terrorism. Before Sept. 11, the possibility that a data-mining system might have predicted that four planes would be simultaneously hijacked and slammed into buildings would have been close to nil — and the likelihood that terrorists will come up with new, unprecedented threats seems close to 100 percent.
TIA will most likely get its information from currently existing databases — from banks, airlines, retail chains, etc. But giving TIA access to that data presents a significant database problem, engineers say. “You have lots of headaches in the management of the data,” says Wisconsin’s Ramakrishnan. “It’s going to be copied from a multiplicity of sources at different schedules, and how do you keep track of where what came from when?”
Another problem is that two databases can be as different as two languages — you can translate from one to the other, but sometimes it doesn’t always make sense. A bank might reference every person by his account number, for example, while the DMV will do it by driver’s license number. How will TIA know that the person named Mohamed Atta with a certain bank account number is the same Mohamed Atta with a certain driver’s license? Michael Franklin, a database expert in the computer science department at the University of California at Berkeley, says that for many years, businesses have been looking for good ways to address this data-integration problem. “It’s really an age-old problem, and companies have been trying to do it for years and years because even inside a company they tend to have lots of different databases,” he says. “There are some ways to do it. If I was a credit card company and I made an agreement with an airline company, we could together figure out how to cobble together the databases. But what’s missing is some larger way to do it.” A good part of the short-lived, late-’90s boom in “business to business” (or B2B) companies was aimed at fixing this problem, Franklin says, and some part of the push for “Web services” is as well.
DARPA’s involvement in a research area tends to accelerate advances in that field, and the group’s stated goal for TIA is to do things that have never been done before. In its solicitation for research ideas for TIA, the agency asks for ideas “that enable revolutionary advances in science, technology or systems.” It’s possible that DARPA could hit on some new, easy way of integrating information, which scientists say would be a good side benefit of the project. “I’m retiring,” says Stanford’s Jeffrey Ullman, “so I’m not trying to use you as a way to get more money for my research project. But I think the government has made a huge mistake in not funding computer scientists, and this is an area — the information-integration part of it — which has good commercial use as well.” (The Internet can be considered a good side benefit of DARPA research.)
After it puts together its database, TIA would then set about looking for hints of terrorism hidden in its data. An important question arises: What are some hints of terrorism? According to the TIA site, the system will have in its memory “patterns that cover 90 percent of all previously known foreign terrorist attacks.” (TIA hasn’t said what “foreign” means in that sentence — is it referring only to attacks that occurred outside the U.S., or to all attacks perpetrated by foreign groups? If it means the former, 9/11 wouldn’t be in its database; if it means the latter, the Oklahoma City bomb and the Unabomber’s attacks would be left out.)
In other words, for TIA to single them out, potential terrorists would need to be doing many of the same things that some other terrorists have done before. Is that likely? Probably — after all, every terrorist organization needs to communicate, shop for equipment, and participate in the financial system. The problem is that innocent people need to do those things, too. Thus, one of the main challenges John Poindexter will face in building his noise filter will be its calibration: Should TIA look at more specific, narrow traits of terrorism in an effort to reduce the false positives, while risking the chance that some novel disaster will slip through? Or should it do the opposite — look for the more general characteristics of terrorists and risk pursuing thousands (or millions) of innocent people?
“That’s a good question,” says Gregory Piatetsky-Shapiro, a data-mining expert who runs KDnuggets, an online newsletter devoted to the subject. The answer, he says, “is that in general you do still want to protect against past attacks — so you would look for the kinds of things that happen there and try to stop those. But also, there are general things that you would look for in other attacks” — things that are statistically unlikely in the general population.
Ullman says, “You ask it about all of the unusual coincidences of people who are known to be involved with al-Qaida. The system should be able to notice that four guys have enrolled in different flight schools, and you have to distinguish that from noticing that four guys in al-Qaida have bought jeans at Macy’s.”
But what about regular people — people who aren’t suspected of being in al-Qaida? “That’s where it becomes a hard algorithm problem and a good research problem,” Ullman says. “This is something that requires the brightest minds in computer science.”
But could even the brightest minds prevent TIA from fingering innocent people? Not long after he heard about the system, Bobby Gladd, a statistician and self-described “political pain in the ass” who lives in Las Vegas, set out to determine how many false positives a system like TIA would produce. It turns out that you don’t need an advanced degree in statistics to do the calculation Gladd did to determine that even if TIA is very good, it will still be frequently wrong.
Gladd figures that if TIA has a scheme that can correctly identify as innocent 99.9 percent of the innocent people it sees — an exceptionally high percentage that is probably not achievable — then it will still end up with about 240,000 falsely accused Americans. (That is, 0.1 percent of the 240 million adult Americans.) If you reduce the percentage to 80 — more reasonable but probably still too high — the number of false positives becomes 48 million!
“I am offended by the constitutional implications of it,” Gladd says, “but at the same time I’m calling attention to it on the basis of what I do. This is a waste of time, and it’s going to take away resources.” Like many other critics of the system, Gladd points out that intelligence analysts missed 9/11 not because they had too little information — it turns out that, in retrospect, there were many “unconnected dots” pointing to an attack — but because they didn’t have the capability to analyze it. Gladd says that government money would be more wisely spent on information analysis. “Every dollar spent on TIA is going to be a dollar not spent on fighting terrorism,” he says.
But Piatetsky-Shapiro says that we have to remember that law enforcement already falsely follows a lot of innocent people. Anyone who’s seen “Law & Order” knows this. The recent hunt for the Washington sniper proved this too, as thousands of calls poured into hotlines, almost all of them pointing to people uninvolved with the crime. “I think we’ll never be able to eliminate false positives,” Piatetsky-Shapiro said, “but maybe this tool can improve the ratio.”
It’s true there are significant dangers to Total Information Awareness, and the computer scientists all said they were worried about that. The whole thing may be unconstitutional; if it’s not, it would still be what many people consider an invasion of privacy, and there would need to be new rules governing its use. Can such rules be set up — and will they? And do we trust the people setting them up? But despite these questions, the computer scientists also said they think of TIA as a long-term research project. As such, they say, both the policy that will govern it and the technology that will run inside it need to be publicly debated.
Up to this point, the Defense Department has been circumspect about TIA and what rules will govern it. At a briefing in late November, Edward Aldridge, the undersecretary of defense for acquisition, logistics and technology, told reporters that in the experimental phase of TIA, the system would use phony data. He also said that “in order to preserve the sanctity of individual privacy, we’re designing this system to ensure complete anonymity of uninvolved citizens, thus focusing the efforts of law enforcement officials on terrorist investigations. The information gathered would then be subject to the same [legal protections] currently in place for the other law enforcement activities.” But he didn’t say what those rules were.
A reporter asked: “It sounds like every time I would enter or a citizen would enter a credit card, any banking transaction, any medical — I go see my doctor, any prescription — all of those things become part of this database. Right? Hypothetically?”
“Hypothetically they would,” Aldridge said. “Although the data that would go along with personal information such as bank accounts, that would all be protected in the Privacy Act just as it is today. Individuals would not be associated with that.”
Aldridge was then asked if the department would need search warrants to add personal data to its files. He sidestepped the question: “First of all, we are developing the technology of a system that could be used by the law enforcement officials, if they choose to do so. It is a technology that we’re developing. We are not using this for this purpose. It is technology.” He was asked again about search warrants. “They would have to go through whatever legal proceedings they would go through today to protect the individuals’ rights, yes,” Aldridge said. (A few experts have said that a system like TIA that would require search warrants would be unworkable.)
It’s this sort of secrecy that gives Ullman, of Stanford, the willies. He’s for a system like TIA, but he deeply mistrusts the people in power. “For it to work,” he says, “you’d have to get Republicans agreeing to not use it to track drug dealers and other civil crimes that are not acts of war. Whether a Republican administration would ever contemplate this, I don’t know. Because Ashcroft wants to catch drug dealers.”
But Ullman also adds that “once you get the right laws passed, the thing that makes it a crime to misuse the data there, then you’re OK. Why doesn’t the military stage a coup? Because there’s a tradition built up over 200 years that doesn’t let this thing happen. You have to get that here, that tradition.”
In the end, the debate over TIA, if it comes, may hang on this point: Are the rules good enough? For some people, no number of safeguards may be OK. Lee Tien, of the Electronic Frontier Foundation, for example, says that “I can’t possibly say yes based on what I know now. I’d have to be convinced there would be a commitment to privacy from the get-go, and we just don’t see that now. This administration is known for its secrecy. They are as bad as Nixon, maybe worse. We certainly cannot trust them with this system.”
He added that “one of my biggest fears is that they are working on this stuff and they have some breakthroughs, and then something happens — an attack — and all of a sudden TIA’s riding the white horse to the rescue. And then it’s, ‘Gee we haven’t worked out the privacy,’ and ‘We haven’t had new legal protections, but the exigencies are such that we need it now.’”
That’s probably a valid fear. But so is the fear of terrorism, says Ramakrishnan. “You know, not to make its sound grandiose, but I think there is a battle here, and we’re facing the kinds of things the people who invented the atom bomb were thinking. It’s probably not whether we should — I don’t think we have a choice. I would rather that we understood this and took the time to enforce reasonable safeguards. To the extent that we do this in the open and have in place an array of legal legislative guidelines, I’d be much happier with that.”