Ace Sarich, a Vietnam vet and former Navy SEAL, caught the last commercial flight out of Kuwait City, Kuwait, on March 19, before the American and British invasion of Iraq began the next day.
He’d spent two weeks visiting military encampments up and down the border, training soldiers, military police and medical personnel on how to talk to Iraqis in their own language without speaking a word of Arabic.
Sarich is the vice president of VoxTec in Annapolis, Md., a division of Marine Acoustics, a military contractor. He was in Kuwait training troops to use a handheld device called the “Phraselator,” a one-way language translator that’s the size of a large PDA and weighs about a pound.
The Phraselator operates like an audio version of a tourist phrasebook. It’s loaded with hundreds of stock phrases such as: “Halt.” “Put your hands above your head.” “Sit down.” “Open your bag.” ” Show me your identification.” “Do you speak English?” and the all-purpose: “We are the U.S. military. We are here to help you.”
Speak one of these sentences into the device, or choose the phrase from a menu with a stylus, and the Phraselator emits a prerecorded Arabic translation. About 100 of the one-way speech machines are deployed in the current conflict at a cost of more than $2,000 per device. Different modules can be loaded into the device depending on the language encountered or the task at hand.
The Phraselator uses speech-recognition technology called Dynaspeak, developed by SRI International. It recognizes phrases phonetically, and then matches them to the prerecorded Arabic phrases. It’s a nifty idea, and to some wistful thinkers, it offers a vision of Star Trek-ish universal translators that will eventually make all language barriers disappear. But we’re still a long way from that future — the Phraselator doesn’t yet signify the long hoped-for breakthrough in machine translation sought after by generations of artificial-intelligence researchers. Instead, it’s a step along the way, a little piece of the puzzle. It’s also a sign that where once Cold War dynamics and the space race pushed new technological advances, now it’s the war on terrorism that is spurring research forward.
“This is not attempting to solve the problem of general translation,” says Horacio Franco, a developer at SRI International who worked on the device. “This addresses a specific need, when you really want to do something very constrained.”
The Phraselator first saw battle in Afghanistan, where it communicated in four different languages: Pashtu, Dari, Urdu, and Arabic. In Kandahar, Army military police used it to communicate with prisoners of war. Troops requested Russian and Chinese modules to communicate with detainees.
Since the war in Iraq began, Sarich has received e-mail from troops who’ve put the device to use on the battlefield. One Army soldier with the Special Forces wrote that he’d used it to communicate with some children in Basra who were able to clue him in to the location of a weapons cache. The device has also made cameo appearances in some news reports from the front.
The voice-recognition technology in the Phraselator is just a precursor of a much more technically ambitious initiative that the Defense Advanced Research Projects Agency (DARPA) calls the Babylon project.
The goal: true, two-way, multilingual communication. VoxTec and SRI International together make up one of four groups around the country that have received DARPA funding for Babylon project research. Through the Information Awareness Office, home of the controversial Total Information Awareness system, the military is funding research into two-way translation that focuses on “low population, high-terrorist risk languages.”
It’s not enough to train military and intelligence personnel in Arabic. Not only can’t it be done fast enough, but in the new era of the American war on terrorism, there are too many languages, too many different theaters and too many different enemies for anything but a machine to understand and speak all the necessary tongues.
“Prior to the end of the Cold War, we had a monolith theater,” says Sarich. “We had Russian and Eastern European linguists. Now, we find ourselves involved in a lot of new things where we don’t speak the language, and finding reliable linguists is very difficult.”
The Phraselator currently does not accept responses — for the obvious reason that, say, Iraqi peasants will have no idea what phrases the machine has been programmed to translate. So the first step to real two-way communication is a machine that will accept and translate a very limited number of responses, such as “yes” and “no” from a respondent. Then, soldiers or doctors can be trained to ask only questions that will elicit those types of answers.
The next level of complexity: training the machine to pick out specific, important words like “doctor” or “danger” spoken by a respondent. That way, something sensible might be communicated, even if full translation can’t be achieved.
But the real goal of the Babylon project is not just limited phrase translation, but so-called “free input” — two-way translation of actual conversation. “The problem is that free input is still too ambitious for current technology today,” says Franco. “The technology is not powerful enough to allow us to recognize any speech in any domain, because the error would be too high.” Instead, researchers are focusing on free input in a very limited context: such as a doctor communicating with a patient.
At Carnegie Mellon University, a group of computer scientists working with three local Pittsburgh companies, Mobile Technologies, Multi Modal Technologies and Cepstral, last year created a prototype of a device called the “speechlator” that translates medical interviews from English to Egyptian Arabic and back.
The speech-recognition component translates the English to “interlingua,” a computer-readable intermediate language. “That language is a mathematical language almost like a logic,” says Alan Black, a research computer scientist with Carnegie Mellon.
What’s the value of introducing a third language to the equation? It means that instead of solving the problem of translating from English to Spanish, and Spanish to French and French to English, every language can be translated into interlingua. From interlingua, the translation engine generates the Arabic text, then a speech synthesizer vocalizes the Arabic.
Even within the relatively constrained realm of a medical interview, there’s endless variation. “We have to have linguists write rules to describe the variations that exist in the language,” says Black. “Does your X hurt?’” Then, listing all the possible things that X will be.”
The prototype hasn’t been tested yet to see how it might perform in real-world conditions; they’re awaiting more funding from DARPA later this year.
Of course, the usual problems of machine translation aren’t the only challenges for these devices. As any user of desktop voice-recognition software knows, systems can be trained for the vagaries of a specific user’s accent and inflection, improving over time the more it’s used. But battlefield devices need to be largely user-neutral: ready for use by a number of people. The limited memory of a handheld device is also a real barrier when you’re dealing with the variables of actual real conversation.
Although, how much memory does it really take to say “Take me to your weapons of mass destruction”?