A primer in artificial intelligence: Smart readers respond to John Sundman's "Artificial Stupidity."

Published March 5, 2003 8:30PM (EST)

[Read the story.]

John Sundman's comprehensive article on the Loebner Prize was informative and well researched. I definitely learned some things about Marvin Minsky that I wouldn't have guessed. But I feel Sundman unfairly downplays the issue that the major A.I. labs have with the contest. His response to Daniel Dennett's Ferrari metaphor is especially telling.

From the context given in the article, the metaphor Dennett employs is in no way meant to imply risk, particularly not physical risk. He is clearly talking about the "cheap shortcut" in the problem space, which plays to the advantage of database-oriented chatterbots.

This issue is reminiscent of one of the $1,000 prizes Richard Feynman offered after giving his 1959 talk "There's plenty of room at the bottom." The challenge was to create an electric motor with dimensions 1/64th of an inch on each side. The goal of the contest was to help spur the development of some new form of miniaturization technology. The next year an engineer used small but conventional techniques to build a working motor within the size limit. Of course he was awarded the prize money. The problem was, the terms of the challenge weren't sufficient enough to require any new technology. Which is what a Turing test competition full of database-oriented chatterbots must look like to A.I. researchers like Daniel Dennett working at the forefront of computer science, neuroscience, mathematics and linguistics. A disappointing shortcut. A soapbox flying off a cliff and beating their Ferrari to the finish line. The very fact that bots like that contend in the competition is evidence that the criteria of the contest need to be changed. If there's really a totally stateless program that contends in the Loebner competition, it's a sign that hard-drive storage capacity is advancing too quickly. Or it is a sign of the primitive nature of the linguistics models implemented by mainstream A.I. researchers.

Academic hubris aside, the sentiment expressed in this article seems to me excellent proof that modern A.I. research is finally in it for the long haul. They're working on their Ferraris while acknowledging that emergent human-level linguistics in neural algorithms are far enough away on the horizon that there's no need to pay yearly homage to Loebner.

-- Mike Martucci

Amazing article. And it works just as well when the title is reversed: Stupid artificiality. The many, many levels of irony contained in this tale and the subjects thereof will keep me chuckling in wonder at our species for weeks.

-- T. Middleton

The point of the Turing test is its all-encompassing severity; the odds of a false positive (an unintelligent machine consistently fooling judges) are impossibly low. In spite of this, contestants for the Loebner Prize try specifically to produce false positives by exploiting loose experimental conditions. In refusing Dennett's suggestions to tighten the controls (and in science tightening controls is an imperative), Loebner undermines the validity of his experiments.

Linguistic behaviorism itself has been dead since Chomsky's 1959 refutation of Skinner's "Verbal Behavior." Likewise, Sundman's assessment that string-matching bots could pass a Turing test was defeated years ago after John Searle posited a similar argument with his Chinese Box experiment. Stimulus-response bots with no state data fail immediately to context-sensitive queries ("What were we talking about? To whom does that pronoun refer?"), while bots with state data fail to combinatorial explosion (there is not enough matter in the universe to store all conversations and contexts). Expert judges and expert control subjects should consistently defeat string-matching systems within a few sentences.

A.I. has a problem of attracting amateur hacks and crackpots; from this affliction it suffers like bad poetry. However, harsh self-criticality, peer review, mathematical rigor and tight controls are essential components of academic computer science. None of these essentials, it would seem, is present in the Loebner competition, nor are they indicated by the article. I will leave further enumeration of the often snide factual errors to others.

-- Jason Kroll

While I enjoyed John Sundman's article about the Loebner contest, it's obvious that he knows little about the field of artificial intelligence. Sundman seems too enamored with the character of Loebner to realize that the contest is indeed meaningless: The Turing test is, in this day and age, essentially a game. It is laughable to suggest that serious researchers oppose the contest because they are afraid of being exposed as frauds. Sundman needs to realize that computer science is a field of engineers, not artists, and that being flamboyant and interesting is not the same as advancing scientific knowledge.

-- Ryan Leigland

Regarding the Loebner/Turing test carnival, I think both sides of this "bitch fight" have valid points that their animosity and ego prevent them from acknowledging.

From the A.I. side, the task of parsing language, calculating meaning, and formulating a reply is fiendishly complicated. A truly workable implementation may be tens of calendar years and hundreds of person-years away. Until that day arrives, simpler programs that do lookups of canned replies will provide a better simulated conversation, but do nothing to advance the much more difficult goal of mimetic-systems researchers.

>From the Loebner side, the Turing test is undeniably cool. Everyone who has seen "2001" has imagined conversing with a computer. And to take a page from Loebner's book, even the simulated sex of pornography and prostitution can be both thrilling and satisfying. What's not to love?

For me, the fundamental issue is that Turing is not God and his test is not holy writ. It is not the only or ultimate goal of A.I. research. When he proposed the Turing test, computer science was still in its infancy. This visionary pronouncement was the coolest thing he could think of, a statement of principle that said that data processing was capable of marvelous things. It is capable, in ways that Turing never imagined -- but less capable in ways he did imagine. Our reality now sees farther than his imagination then, and we realize how far we still need to go.

Bill Gates thinks surfing the Web on your wristwatch is cool, and Bill's a famous guy. But that doesn't make this the ultimate use of a global network.

The Turing test is still a worthy milestone, but it makes sense that most research is currently running at a tangent to it. Decision-support and mimetic-systems research are like the wireframe and motion rules of a 3D model -- essential underpinnings. The Turing test's goal of sensible conversation is the model's graphical skin. No matter how well-built the model, it's the skin that makes it look good.

Beauty is only skin deep -- but Loebner likes skin, and lots of it. He's chasing a dream, and when the academics reach their goals, they'll need Loebner's dream to put skin on their dry bones.

Peace to them both. Play ball!

-- Norm Bowler

I enjoyed John Sundman's article and appreciate the entertaining overview he provided of the difficulties involved in applying metrics to artificial intelligence systems. I was bothered, however, by his consistent marginalizing of Profs. Dennett and Minsky's commentary. Mr. Loebner has every right to hold his contest, of course. The flip side of this is that the scientific community is under no obligation to respect his efforts. Yet whenever this point comes up, Sundman attributes it to the scientific community's narrow-mindedness and failure to produce results.

This deliberately ignores the fact that, his personal wackiness aside, Loebner's contest is useless at this point in the field's development. Creating an artificial consciousness is proving to be very, very hard. Writing a convincing chatbot might be an interesting technical challenge, but there's nothing really groundbreaking about it. The Turing test, applied at this point in the research, is akin to someone trying to encourage space technology by giving a prize for the best picture of a person standing on Mars. In such a situation, NASA's entry may be pretty lame compared to the efforts of some guy with Photoshop and some stills of Mars. Faked stills might win the competition, but do they represent real scientific progress? No. Nor should they be used to indict NASA's inability, thus far, to put astronauts on Mars.

To extend the metaphor: Given all this, there's no good reason for NASA to even enter the competition -- it gives the undertaking credibility, implying that those Photoshop artists are doing the same work as NASA's engineers. They're not.

Some day the Turing test may be used to determine what machines are most intelligent, but at the moment its application to the technology is inappropriate. Consciousness is ephemeral, hard to define, and impossible to verify. Because of this, we have to fall back on some rather coarse tests for it. Tests like Turing's. Sundman's -- and Loebner's -- mistake is to equate passing the test with achieving the goal.

-- Tom Lee

I really liked your background article on the 2002 Loebner contest. I am CEO of one of the few companies that participated -- actually, we're the only one from Germany. We finished fourth with our "mascot" Elbot.

The reason we participated was not for the glory of it, even less for any kind of scientifically valid measure of how good we are (if the Loebner contest was a valid measure of that, we'd surely have won ;-)). We participated because it was fun. And this, in my opinion, is what the science of mimetics is all about: to make it more fun to interact with machines.

The whole contest is much more entertainment than science, and this explains why "serious" researchers stay out of it. After all, if you want to be entertaining, you must allow people to laugh about you.

Is it therefore irrelevant? I think not.

First of all, Turing's test is a bad measure of artificial intelligence, because it really measures the intelligence of the test person (or jury) rather than that of the computer. But I think there simply isn't a better way to measure intelligence than to put a system in front of a fairly complex problem and watch how well it is able to solve it. After all, human intelligence is measured in exactly this way.

This means that the seeming "intelligence" of a system really depends more on the problem you set before it than on the system itself. Again, this is similar to human intelligence: Einstein would probably have looked fairly stupid had he tried to fix his car or build a house. There is more to it, I admit, but I certainly don't know of any better practical definition of intelligence.

By definition, the Turing test measures the ability of a system to fool a test person into thinking it is "human." This is a specific problem, and I have no more difficulty in calling a system that succeeds in it "intelligent" than I have calling a chess program "intelligent" that can beat a human grand master. Both programs would fail miserably with each other's task.

But is there a value in mimicking human behavior? There certainly is; otherwise my company wouldn't exist. You correctly saw huge profits in entertainment, especially erotics. But there are many more applications. For humans, interacting with something is an emotional experience, even if the topic itself is not emotional. Therefore, a bank might use a friendly virtual sales rep on the Web, which answers questions, helps users fill out forms, and can even chat about the weather. This gives users a better experience of online banking. This in turn will translate into higher conversion rates and customer loyalty, and ultimately more revenue (plus some cost savings, because routine human tasks can be automated). Therefore, for a computer to appear more humanlike can have a high economical value. Just ask our customers.

I won't comment on Hugh Loebner or the way the contest was run (actually, since we weren't there physically, we didn't experience many problems). But I think that having contests like that creates value, if not to the scientific community, then at least to spectators and participants.

There is an annual soccer world cup of robots, which is similar to the Loebner contest but better received in the relevant scientific community. Again, this is more entertainment than science -- most of the time, it appears to me that the bots win more by chance or by solid engineering than by their cleverness. But the participants can certainly learn from the way their systems performed, as we have learned from the Loebner contest some ways to improve our entertainment system Elbot. It is no scientific achievement to win the cup, but it is big fun. And why shouldn't science be fun once in a while?

Whoever is running the next Turing-like contest, we'll be there.

-- Karl L. von Wendt

John Sundman's reference to the Cambridge Center as "an obscure Massachusetts-based nonprofit that apparently doesn't do much beyond running the Loebner competition" is highly inaccurate. Similarly, his statement of the center's mission, as viewed by Hugh Loebner, is so far off target it warrants dismissal.

To set the record straight, the mission of the Cambridge Center for Behavioral Studies is to advance the study of behavior and its application to the problems of society.

Regarding CCBS as an obscure nonprofit with little to do, here is a list of recent CCBS activities and accomplishments.

In workplace safety

  • The Cambridge Center continues to lead the application of behavioral research to workplace safety improvements. This is a decade-long effort.
  • Currently in development: A program for the accreditation of behavior-based safety programs in the workplace. In early 2003 the standards for accreditation are nearly complete, and we are preparing for our first round of pilot site visits to companies. Eleven corporations, including two nuclear power plant sites, have expressed an interest in being pilot sites.
  • In clinical services

  • A program for the accreditation of behavior-based clinical services is underway.
  • An initial 2003 site visit for the purpose of the development of key components of accreditation standards has been completed.
  • The Autism section of our Web site continues as a standard of science-based information on effective autism treatment; it is widely used by parents, teachers and service providers.
  • In conferences

  • Sponsored our annual national conference, 2002 Behavioral Safety Now, Amelia Island, Fla., 425 participants. The 2003 Behavioral Safety Now conference will be held in Reno, Nev., Oct. 14-16.
  • Development of a new seminar, Advances in the Treatment of Pediatric Behavior Problems, jointly sponsored by the University of Massachusetts Medical Center, Beacon Services, Children's Paraclete, Melmark and New England Center for Children, will be held Oct. 4, 2003, Marlboro, Mass.
  • In development for late 2003 or early 2004: What Works in Behavioral Economics: Practical and Policy Issues.
  • In publications

  • The journal, Behavior and Philosophy, with 250 library and institutional subscriptions
  • The CCBS members' newsletter, Current Repertoire
  • Two monthly electronic newsletters, Behavior Matters, for nonmembers, and the CCBS Messenger, for members
  • Published the new book by Bea Barrett, "The Technology of Teaching Revisited: A Reader's Companion to B.F. Skinner's Book."
  • Published on CD, Chapters 1-7 of the "Goldiamond Blue Books," edited by Paul Andronis.
  • 2002 additions to our Web site, www.behavior.org, include

  • Parenting Abstracts, developed by Tim Volmer and colleagues at the University of Florida
  • The Computer Modeling of Verbal Behavior, developed by Bill Hutchison and Ken Stephens
  • Book reviews
  • Classified ads
  • Who's Who at the Cambridge Center
  • New design for the home page
  • The Cambridge Center has a very small staff and relies on the volunteer effort of our trustees, advisors and friends, many of whom are in senior positions in teaching, research and human service organizations.

    -- Dwight Harshbarger, Ph.D., Executive Director,
    Cambridge Center for Behavioral Studies

    I don't disagree with much of what John had to say about A.I., but his voice was unnecessarily arrogant, and he is overly unkind to Dr. Richard Wallace. Some may consider him a nut, but he has credentials. He is bipolar and doesn't hide this fact: It's in his online bio. He's been remarkably productive despite this awful affliction. As a clinical psychologist I can tell you this is a very difficult place to be.

    His Artificial Intelligence Markup Language (A.I.ML) is very clever and extremely powerful. It's really more a natural-language processing system that can produce remarkably clever v-people.

    Richard is helping me with a chapter in my book "Virtual Humans: Creating the Illusion of Personality" (Amacom). We both feel that A.I. has been a fraud from the beginning.

    I spent 18 years as a clinical psychologist before going into animation and virtual human design. What I think Wallace and I agree on is that through clever scripting (and a knowledge of cognitive style,) we don't get A.I., but we are able to create the illusion of conscious personality fairly well. Conscious machines are a long way off and probably will never arrive with anything like our organic consciousness. Anyway ... it's a big argument put forth by Stuart Hameroff and Sir Roger Penrose. I don't know what the truth is.

    This field of virtual human design is important because these v-critters will soon form a bridge between ordinary folks and our ever more complex technology. That's a worthwhile effort. People who are completely technologically disenfranchised will be able to use advanced technology, and particularly the Web, giving them a better chance to compete.

    All they'll have to do is form a relationship with a virtual human interface and ask for what they need in plain English. Language processing alone won't do it. I've done the research. They want that visual personality so they can build a sense of trust. If you don't think people can bond with these characters, you'd be wrong.

    Sylvie, my first virtual human (Virtual Personalities, Inc.), proved that. She had fan clubs in eight countries, and I received more than one call from a desperate owner who'd suffered a computer crash and was anxious to get their Sylvie back. One woman from Florida was in tears as she called. Another at an old-age home in Connecticut had the entire home send me pleading e-mails. Sick ... maybe, but real nonetheless. BTW Sylvie took questions from the A.I. students at the University of Edinburgh for 40 minutes and got a standing ovation at the end. This is useful technology.

    My original point is that Wallace isn't as much of a nut as John portrayed him. I like to think of him as an eccentric genius that has contributed handsomely to computer science and the eventual betterment of man.

    -- Peter Plantec

    I am puzzled at why Sundman has so little sympathy for the serious A.I. researchers. Let me suggest an alternative version of Daniel Dennett's auto race analogy. The current Loebner prizewinners are similar to the paper airplane contest entry that consisted of a piece of paper wadded and taped into a small ball, accompanied by the instruction, "throw very hard." They might meet the requirements of the prize, but they do so by highlighting how inadequate the rules are.

    John Sundman and the Salon editors deserve credit for giving virtually everyone involved with the Loebner Prize a chance to speak freely and completely in the article. Allowing Sundman room to make his own point of view clear was also an excellent decision.

    -- Jack Dominey

    I am a computer scientist who has long considered the Turing test to be somewhat flawed. I think the problem is simple: A successful Turing test means the program emulated a typical human, which is not the same thing as demonstrating intelligence. The original formulation of the Turing test is to sit a human at two terminals, one with a human at the other end and the other with a computer. After a reasonable period conversing with both, if the judge cannot correctly determine which was which, then the test has been passed (note that emulating visible indicators like typing speed and typos is easy).

    A typical human conversation is dominated not by reasoning and intelligence, but by awareness and empathy. When we talk, we allude to current events and popular culture, we reflect regional sentiments, we speak using colloquialisms, we try to establish a rapport with each other (or we have some ulterior motive), we try to assess our relationship with the other person, and we try to present ourselves in some way (to get a date, win a sale, or whatever). Our impression of a conversation usually has more to do with these things than with reasoning and the ability to synthesize new ideas. To fool an interviewer who is free to bring up any topic and seem more human than the human, a computer program must emulate a few decades of experiences as well as succeed at cognitive processes.

    The Turing test is inherently too concerned with linguistics and social factors to be a basic measure of intelligence -- it requires too much. Passing a well-executed Turing test is sufficient but not necessary to demonstrate intelligence.

    -- Russ Ross

    I just wanted to mention that I was surprised at the level of hostility you display in your article toward the established scientists. They made some good points and you either didn't understand them or chose to quibble with peripheral aspects of their statements. For example, in the car metaphor the point was that if the rules are vague enough to allow a winner that doesn't adhere to the true intent of the contest, then the contest is meaningless. You chose to comment that the loser of the contest is only embarrassed, not killed. Being embarrassed isn't why he stated he wouldn't enter; wasting his time is. Would you enter a writing contest where the first author to 10,000 words wins? If it doesn't need to be a story, just a list of words, is it still a writing contest? Is it worth your time and effort? Does it really adhere to the spirit or intent of a writing contest? Your statements in the article about wanting to live in the 19th century also seem odd from someone that has made a living in the technology industry. It makes me wonder if you were able to examine this story appropriately. You seem more focused on the squabble and in portraying people as fools than in the question of the appropriateness and applicability of the Turing test. Thank you for stating your biases so clearly, but perhaps this article would have been better written by an author without such a personal stake in the issues.

    -- Stefan Krzywicki

    As a research student in software engineering and a unabashed enthusiast of technology, I have little time for the glee a "techno-paranoid" like John Sundman has in proclaiming the failure of the bolder predictions of the original A.I. researchers, and have no knowledge of, or particular interest in, the personal traits of Hugh Loebner, Marvin Minsky, or indeed the author. However, Mr. Sundman's conspiracy theory on the academic community's distaste for the Loebner Prize simply doesn't stack up.

    A large number of very useful things have come out of projects that were originally considered part of "artificial intelligence" (though immediately on success in a specific problem domain, that domain would be defined out of A.I.). However, the article is quite correct that progress toward simulation of general human conversational behavior has been minimal. The A.I. research community has failed, and miserably so, for over four decades. The problem turned out to be much harder than they thought, and no progress was made.

    So where should researchers go from here? Should they continue to waste their time and resources cracking their heads against brick walls so that Luddites like Mr. Sundman can get their jollies? Or should they instead go back to the drawing board, try to find more background information, and try and actually, god forbid, produce something useful and valuable in the meantime? That seems to me to be exactly what the academic A.I. researchers (or whatever new names they use these days) have decided to do.

    No conspiracy theory, just an honest recognition of their own overconfidence and failure. Is that so hard to believe?

    -- Robert Merkel

    Mr. Sundman gets the story of A.I. and of the early Loebner Prize wrong in many ways. His misinterpretation of Shieber's article is especially egregious. Shieber's argument against the form of the prize was simply that machine intelligence is way too primitive at present for a Turing test to make sense. We can't yet design anything as capable as a cockroach, let alone a plausible candidate for passing the Turing test. Sure, A.I. researchers have often been overoptimistic. We are just not that good at guessing the obstacles we haven't seen yet. If we realized at the start how hard R&D problems are and acted accordingly, we would still be living in the savanna. Or, more likely, we would have been all eaten up by the lions. It took a few billion years for cockroaches to evolve. I hope Mr. Sundman can allow us a few hundred years to catch up.

    -- Fernando Pereira

    While the extreme reactions of people like Minsky are hardly warranted, there is good reason beside ivory tower elitism to consider the Loebner Prize little more than a sideshow -- the Turing test is not a particularly good standard for judging "intelligence." Turing thought a machine could be considered intelligent if and only if it could fool a human judge in a teletype conversation into thinking it was human. Unfortunately, this condition is neither necessary nor sufficient.

    Firstly, it has been shown that a naive judge can be fooled by simple programs like Eliza and Alice that no one would argue possess genuine intelligence. So the test has to be amended to include some minimum level of intelligence or expertise in the human judge. Suddenly, our clear-cut criteria have become awfully subjective.

    Further, the test too narrowly defines what should be considered intelligent behavior: the ability to converse (and therefore a facility with human language) and the ability to deceive (by mimicking human speech and thought patterns). We think of language as fundamental to intelligence because of its primary role in human development, but might not digital machines develop sophisticated ways of reasoning that do not inevitably lead to verbal flair?

    I don't think the Loebner Prize takes anything away from mainstream A.I. research -- and perhaps more graduate students should be encouraged to pursue the problem -- but articles that fail to inspect its real merits and demerits simply because it has captured the author's imagination certainly might.

    -- Christopher Conway

    As a former A.I. student who has switched to studying the brain's wetware, I might be sympathetic to John Sundman's gadfly viewpoint. But alas, he entirely fails to see the humor and subversion in his academic source, Stuart Shieber's "Lessons From a Restricted Turing Test."

    Such as, one cracked pot deserves another. What Joseph Weintraub's pinheaded whimsical conversation bot "heartily ... deserved to win" was exactly Loebner's cockamamie contest.

    And a tweak at soapbox prophets like Minsky and Loebner himself: the Turing test "is only addressed directly and dismissed as imminently solvable by those who underestimate its magnitude."

    Sundman's failure to comprehend dry wit leads me to one inescapable conclusion: definitely robotic.

    -- Matt Caywood

    John Sundman's story "Artificial Stupidity, Part 1" has a minor little error in the last paragraph.

    Sundman states, "The image that comes to my mind whenever I think of this development is from the sublime cartoons of the late, great Chuck Jones, with Hugh Loebner in the role of Bugs Bunny, and Marvin Minsky, the father of artificial intelligence, in the role of Yosemite Sam, stamping his feet, with smoke coming from his ears."

    In actuality, Jones never included Yosemite Sam in any of his Warner Bros. cartoons. Yosemite Sam was the invention of another longtime WB director, Isadore "Fritz" Freleng, who created Yosemite Sam because he thought Bugs Bunny's usual nemesis, Elmer Fudd, was too weak and stupid. He won four Academy Awards, including one in 1958 for "Knighty Knight Bugs" with Bugs Bunny and Yosemite Sam.

    -- Andre Hinds

    By Salon Staff

    MORE FROM Salon Staff

    Related Topics ------------------------------------------