Essay questions

How well can computers judge prose -- and would you want one grading your exam?

Topics: Education,

Forget No. 2 pencils and Scantron ovals: Some educators are beginning to use computers to grade essays. Already a system called E-rater evaluates every essay
written as part of the Graduate Management Admission Test ( href="">GMAT) — or about 800,000
compositions crafted by 400,000 business school applicants this year.

And some professors, bogged down by the volume of student papers they must
read, eagerly anticipate computerized readers that can help them slog
through the volume of words that comes across their desks each semester.

“It is becoming increasingly difficult to manage the load associated with
essay grading, and lecturers are gradually shifting the focus of their
assessment to multiple-choice questions,” says Chris Janeke, a senior psychology lecturer at
the University of South Africa, a 120,000-student university experimenting
with a computerized grading system called the
Intelligent Essay Assessor.
Software “offers the possibility of automatizing at least some aspects of essay grading and may present a technological solution to our logistic problems.”

But hold on: If a student writes an essay that is graded by a computer, has it really been “read” at all? Well, sort of. A machine obviously can’t
comprehend a student’s argument — but it can determine whether a
composition addresses a specific question, and it can judge an essay’s structure. Electronic grading systems analyze hundreds of sample answers to a specific question (something
like “Should a government be able to censor the media?”), then compare the content and semantic structure of the students’ answers to the sample essays.

If this sounds like a lifeless way to examine a student’s thoughtful writing, it is. But it’s actually little different from the decades-old system that depends on people to grade the essay portion of standardized tests. Human graders, too, are required to read sample essays and judge student responses based on qualities prescribed by the testing service.

“The procedures are actually identical,” says Fred McHale, vice president for assessment and research at the Educational Testing Service (ETS), which
developed E-rater. “Once the scoring rubrics are created by expert readers
from the sample responses, those samples are used to train human readers –
or programmed into E-rater.” (GMAT essays have been submitted electronically since 1997, so neither people nor software has to read

Every electronically graded essay still gets a second read by a real live human. Still, the notion that computers play any part in evaluating student
essays hasn’t gone down well with everyone in the academic community. “I think it’s silly,” says Dennis Baron, head of the English Department at the
University of Illinois at Urbana-Champaign, with an edge of derision. Computerized grading undermines the very purpose of essays, he adds. “Like
the teacher says, ‘I’m not just talking to hear myself talk.’ We don’t ask students to write just to have them jump through a hoop.”

Writing for a computerized audience is, to some critics’ thinking, an absurd waste of time that can only warp the educational process. (ETS is also considering making E-rater available to score practice essays for students preparing to take the GMAT.) “Even before this, the pressure was there to teach to the test,” says Baron. If students know their essays are being graded by a machine that can parse
semantics and syntax, they “will learn to write for the formula.”

Critics have long lodged similar complaints against all standardized testing, arguing that the tests measure students’ ability to take tests, not their ability to learn and produce ideas. Some also maintain that the tests often include subtle racial, class or gender href="">biases — benefiting students who are white, middle-class and male.

But could we use technology to eliminate bias from the grading process, and to promote fairness and consistency? “In essence [the technology] is doing what a person is trained to do when they’re doing holistic grading,” says Darrell Laham, chief scientist for Knowledge Analysis Technologies, which developed the href="">Intelligent
Essay Assessor.
“You see samples of
what an excellent essay is supposed to look like, or a medium essay, or a very bad essay. With a person, their criteria may shift a little bit.” The
software, on the other hand, is 100 percent consistent: “You give it the same set of parameters, and it will always give the same results.”

But Monty Neill, executive director of href="">FairTest — an advocacy group that fights for fairness in standardized
testing — says the software’s lack of bias doesn’t mean electronic grading will be free of prejudice: It all depends on how the software is programmed. “If you’re looking for things that are not really relevant but are associated with a particular demographic group, then certainly that would reinforce a bias,” he says.

A question assuming knowledge of stock dividends, for example, could penalize test-takers whose
family never owned securities. But Neill agrees that a computerized grading system, properly programmed, could eliminate other forms of bias. “You might have someone who identifies black writing as automatically bad,
whereas a machine might not,” he says.

Laham says the best way to escape grading bias is to choose the model essays with care: “The underlying comparison set of essays should represent the population that the grades are meant to represent.” To be fair to the test-takers, he says, his Intelligent Essay Assessor is designed to know
its limits and not give a student a poor mark when the software can’t “read” an essay, for stylistic or other reasons. “What the technology will do when it sees an essay that is completely unlike what it has seen before is to flag it and tell a teacher to look at it … It won’t be able to grade it, but it will know it can’t grade it.”

E-rater examines 50 linguistic features, including transitional phrases,
vocabulary and the ratio of complement clauses to the total number of
sentences. “For each essay, about eight to 12 of the features turn out to be
particularly predictive and explain why an essay should get a certain
score,” says Jill Bustein, a developmental scientist who invented the
E-rater prototype and led the ETS development team.

E-rater is surprisingly consistent with human graders. The E-rater scores
agree with scores given by a human grader about 90 percent of the time –
or as often as a second human reader would, according to ETS statistics.
And when a second human grader does score a disputed essay, he or she agrees with
E-rater about 97 percent of the time. In other words, the electronic
graders seem to do the job about as well as their human counterparts.

Computerized grading could cut student fees by $5 to $10 per test, according to ETS; readers who score the GMATs currently earn $23.75 per hour. And at Knowledge Analysis Technologies, Laham argues that essay-grading software can improve education by helping to eliminate multiple-choice
testing. His company’s Web site says: “Students need many more
opportunities to put their knowledge into words and find out how well
they’ve done and how to do better”; and Laham asserts that student writing,
even when written for a computerized reader, demonstrates “a much deeper level of learning” than multiple-choice exams do.

But he is conscious of his product’s limitations. “When you start getting
into the creativity types of things, that’s not really our focus,” says
Laham. “This technology is not appropriate for looking at term papers where
every student is writing on a unique topic. We see it as a way to provide
students with the opportunity to write and revise their writing and to get
immediate feedback that they simply can’t have right now. A person can’t
always look at what a student produces.”

University of Illinois professor Baron still criticizes the system, however, saying he’s gotten surprisingly good grades after
submitting essays that were completely off-topic to a demonstration of the
Intelligent Essay Assessor that is href=""> available online. “If you don’t care
about what might be in the text that doesn’t match your template, then I
suppose you can go ahead and use it,” he says. “But it seems to me that
it’s also an insult to the writer. You’re asking these test-takers to write
connected prose, but you’re having it graded by an entity that has no sense
of what’s good about connected prose and how to evaluate it.” (Laham defended the product, saying that the version of IEA currently online does not yet have the system’s full battery of validity checks.)

Meanwhile, won’t students rebel against computerized readers?

Test-takers haven’t been troubled by the electronic grading of GMATs, says
McHale. “We were expecting more negative reaction, but we’ve had minimal
complaints, and just a single response of ‘I don’t want a computer grading
my essay,’ which someone wrote in one of their essays.” Part of the reason
for the subdued response may be that a person still reads each submission
– a procedure that McHale expects to continue. “For the large-scale,
high-stakes kind of testing that we do, I don’t see a human reader being
taken out of the loop,” he said. “The small discrepancies that we do see
could be very creative responses that we really do want to allow in the

So far, there’s no plan to employ E-rater as a judge of literary merit or
creative writing, but ETS is researching the possibility of computerized
grading for the Test of English as a Foreign Language and the
Graduate Record Examinations. The GMAT was the first to employ the
software because the test had already phased out handwritten essays in
favor of keyboarded essays.

While it’s unlikely that computerized grading will ever replace the careful
eye of a teacher, technology proponents like Laham say it can be a great
addition to the current academic system. “The reality is that teachers can’t
read enough to provide the student with enough feedback,” says Laham.

So instead of comparing the software to a human reader — where it can’t help
but appear a poor substitute — Laham argues critics should view electronic grading as a great benefit to students who want to write more than their teachers can read.
Dismissing the technology’s detractors, Laham says, “There aren’t as
many of the critics as there are teachers who want this system.”

Christopher Ott is a writer in Madison, Wis.

More Related Stories

Featured Slide Shows

  • Share on Twitter
  • Share on Facebook
  • 1 of 22
  • Close
  • Fullscreen
  • Thumbnails

    Once upon a time on the Bowery

    Talking Heads, 1977
    This was their first weekend as a foursome at CBGB’s, after adding Jerry Harrison, before they started recording the LP “Talking Heads: 77.”

    Once upon a time on the Bowery

    Patti Smith, Bowery 1976
    Patti lit up by the Bowery streetlights. I tapped her on the shoulder, asked if I could do a picture, took two shots and everyone went back to what they were doing. 1/4 second at f/5.6 no tripod.

    Once upon a time on the Bowery

    Blondie, 1977
    This was taken at the Punk Magazine Benefit show. According to Chris Stein (seated, on slide guitar), they were playing “Little Red Rooster.”

    Once upon a time on the Bowery

    No Wave Punks, Bowery Summer 1978
    They were sitting just like this when I walked out of CBGB's. Me: “Don’t move” They didn’t. L to R: Harold Paris, Kristian Hoffman, Diego Cortez, Anya Phillips, Lydia Lunch, James Chance, Jim Sclavunos, Bradley Field, Liz Seidman.

    Once upon a time on the Bowery

    Richard Hell + Bob Quine, 1978
    Richard Hell and the Voidoids, playing CBGB's in 1978, with Richard’s peerless guitar player Robert Quine. Sorely missed, Quine died in 2004.

    Once upon a time on the Bowery

    Bathroom, 1977
    This photograph of mine was used to create the “replica” CBGB's bathroom in the Punk Couture show last summer at the Metropolitan Museum of Art. So I got into the Met with a bathroom photo.

    Once upon a time on the Bowery

    Stiv Bators + Divine, 1978
    Stiv Bators, Divine and the Dead Boys at the Blitz Benefit show for injured Dead Boys drummer Johnny Blitz.

    Once upon a time on the Bowery

    Ramones, 1977
    “The kids are all hopped up and ready to go…” View from the unique "side stage" at CBGB's that you had to walk past to get to the basement bathrooms.

    Once upon a time on the Bowery

    Klaus Nomi, Christopher Parker, Jim Jarmusch – Bowery 1978
    Jarmusch was still in film school, Parker was starring in Jim’s first film "Permanent Vacation" and Klaus just appeared out of nowhere.

    Once upon a time on the Bowery

    Hilly Kristal, Bowery 1977
    When I used to show people this picture of owner Hilly Kristal, they would ask me “Why did you photograph that guy? He’s not a punk!” Now they know why. None of these pictures would have existed without Hilly Kristal.

    Once upon a time on the Bowery

    Dictators, Bowery 1976
    Handsome Dick Manitoba of the Dictators with his girlfriend Jody. I took this shot as a thank you for him returning the wallet I’d lost the night before at CBGB's. He doesn’t like that I tell people he returned it with everything in it.

    Once upon a time on the Bowery

    Alex Chilton, Bowery 1977
    We were on the median strip on the Bowery shooting what became a 45 single sleeve for Alex’s “Bangkok.” A drop of rain landed on the camera lens by accident. Definitely a lucky night!

    Once upon a time on the Bowery

    Bowery view, 1977
    The view from across the Bowery in the summer of 1977.

    Once upon a time on the Bowery

    Ramones, 1977 – never before printed
    I loved shooting The Ramones. They would play two sets a night, four nights a week at CBGB's, and I’d be there for all of them. This shot is notable for Johnny playing a Strat, rather than his usual Mosrite. Maybe he’d just broken a string. Love that hair.

    Once upon a time on the Bowery

    Richard Hell, Bowery 1977 – never before printed
    Richard exiting CBGB's with his guitar at 4am, about to step into a Bowery rainstorm. I’ve always printed the shots of him in the rain, but this one is a real standout to me now.

    Once upon a time on the Bowery

    Patti Smith + Ronnie Spector, 1979
    May 24th – Bob Dylan Birthday show – Patti “invited” everyone at that night’s Palladium show on 14th Street down to CBGB's to celebrate Bob Dylan’s birthday. Here, Patti and Ronnie are doing “Be My Baby.”

    Once upon a time on the Bowery

    Legs McNeil, 1977
    Legs, ready for his close-up, near the front door of CBGB's.

    Once upon a time on the Bowery

    Suicide, 1977
    Rev and Alan Vega – I thought Alan was going to hit me with that chain. This was the Punk Magazine Benefit show.

    Once upon a time on the Bowery

    Ian Hunter and Fans, outside bathroom
    I always think of “All the Young Dudes” when I look at this shot. These fans had caught Ian Hunter in the CBGB's basement outside the bathrooms, and I just stepped in to record the moment.

    Once upon a time on the Bowery

    Tommy Ramone, 1977
    Only at CBGB's could I have gotten this shot of Tommy Ramone seen through Johnny Ramones legs.

    Once upon a time on the Bowery

    Bowery 4am, 1977
    End of the night garbage run. Time to go home.

  • Recent Slide Shows



Comment Preview

Your name will appear as username ( settings | log out )

You may use these HTML tags and attributes: <a href=""> <b> <em> <strong> <i> <blockquote>