Monday, Jun 14, 1999 4:00 PM UTC

The Web's plagiarism police

An online service claims it can identify purloined papers. So why'd it nail my thesis?

I am a plagiarist.

At least, that’s what an online plagiarism-testing service report says. After analyzing my senior thesis, it said flatly that my
30-page paper was “plagiarized,” and said that it had found a source on the
Internet that matched my document. At first, I panicked. I hadn’t copied anyone
else’s work, so what was going on? Was it unconscious, a phrase I’d once read and
kept hidden in my memory? Had I been careless in paraphrasing or quoting? I
didn’t know; all I did know what that the report said I was guilty of ripping off
my senior thesis from some source on the Web.

Baffled, I went back to the report, and there, I found less-than-intuitive links
to a more detailed analysis. Clicking through, I found the section that listed
the URL of the source I was accused of plagiarizing from. I clicked to find …

To find that Plagiarism.org had just discovered a copy of my own thesis online.
Instead of realizing that it was my work and ignoring it, the service had accused
me of plagiarism. It seemed an odd thing to overlook, and an odd way of doing
business to announce the crime, and let the recipient of the report figure out
whether it was justified or not. I took the time to investigate the report’s
charges; what if a professor hadn’t?

These are key issues, it turns out, in the brave new world of plagiarism
detection online. Like other things on the Web, it’s a prospect alluring in its
simplicity, but devilishly difficult to accomplish in reality. It remains to be
seen how many people might be unjustly accused before the kinks are worked out.

The purveyors of the new service, however, say that Plagiarism.org actually allows people to
“harness the Internet to solve the problem the Internet is creating,” according
to founder John Barrie. With the availability of online sources, including
electronic “term-paper mills” like SchoolSucks and The Paper Store, students can
easily “borrow” — or even buy — papers online. The Plagiarism.org site refers
to studies that
suggest as many as 66 percent of university students have cheated and 36 percent
have plagiarized written material.

And while cheating in school is nothing new, some professors think the Web is
making things worse. Harold J. Noah, an emeritus professor at the City University
of New York, co-authored a study on plagiarism that found technology to be partly
responsible for “ubiquitous” cheating. The trouble, he told the Chronicle of
Higher Education, is that “it’s often difficult to detect plagiarism from
Internet sources.”

Not surprisingly, enterprising programmers have spotted this market and are
offering universities weapons to combat the practice. There are now programs that
search for “borrowed” code in computer science projects, and services like Plagiarism.org, the
Essay Verification Engine and IntegriGuard that comb through essays and student
reports in search of copied passages.

Plagiarism.org, for example, analyzes the structure and content of a paper by
comparing it to the contents of a centralized database, which includes papers
posted online, material from academic Web sites, documents indexed by major
search engines and other student papers that have been submitted to
Plagiarism.org for analysis. It then prepares a report pointing out possible
instances of plagiarism.

To test the service, I took advantage of a free five-paper trial run and uploaded
my senior thesis. (I should note
that the paper uses Salon’s Table
Talk
as a case study for an examination of online community). A day or so
after I submitted my work, I received an e-mail message pointing me to an online
report.

It was this that I had to click through before I discovered that an error had
been made. Plagiarism.org, in fact, had found only one matching phrase in my
essay — but it was 8,367 words long. It was my own paper: within the archives
searched by Plagiarism.org was the copy of my thesis that I had posted on the
Web.

While obviously an anomaly, such a false reading or a misinterpretation of the
results could have some pretty ugly consequences. I wouldn’t want to be thrown
out of school for cheating — and expulsion is the penalty at some
schools, like the University of Virginia.

Plagiarism.org’s site insists that “only cases of gross plagiarism are flagged.
This means that papers using some identical quotes or papers written on similar
topics will NEVER be flagged as unoriginal.” But that wasn’t exactly my
experience. I put a friend’s research paper in the system as well, and it found
five phrases that matched other sources found on the Net. The report said the
“paper probably contains plagiarized material from the given manuscript.” But a
quick check showed that the indicted sentences were all legitimate excerpts,
appearing within quotation marks and citing sources. Again, the service came
across like a hanging judge.

Plagiarism.org’s Barrie — a neurobiology graduate student at UC-Berkeley –
acknowledges that the service fails to properly differentiate between quoted
materials and original writing. He argues that the analysis can still be useful
for professors who want to know how much of a paper was quoted or who want to
verify that quotation marks were properly placed.

“It is just an informational tool,” says Christian Storm, a biophysics graduate
student who helped develop Plagiarism.org. But it’s an invaluable tool, he says,
because “it’d take a decent amount of effort to associate or correlate the two
documents” by hand.

But then why is the program so quick to throw the word “plagiarism” around?

Barrie, Storm and other UC-Berkeley students and alumni created Plagiarism.org
after papers submitted via a Web-based peer-review system began to resurface;
students were downloading their peers’ work and turning it in as their own, in
different classes and during different semesters. They realized it was time to
develop a “technological solution to the problem the Internet was breeding,”
Barrie says.

Though anyone can run a few papers through
Plagiarism.org free of charge, the business model, of course, requires
universities to pay for the service. Professors at small colleges can analyze a
class’s worth of papers for $20, while larger universities will pay $1 per
student, plus $1 per paper. To date, Plagiarism.org has been used at Berkeley and
by hundreds of individual instructors around the globe, Barrie says.

The Office of Student Conduct at UC-Berkeley tried the service while
investigating a handful of plagiarism cases this year. But the university isn’t
paying while it tests the system’s capabilities. “We want to try it for at least
a full year before committing,” says Doug Zuidema, manager the student conduct
office at Berkeley.

Zuidema says he has found Plagiarism.org to be a “very effective tool” for
proving or disproving an allegation of cheating. “What tends to happen is that
once we show students the capability — if in fact they’ve pulled something from
the Web — they pretty much confess to it,” he says. And as faculty members learn
of this automated search for instances of plagiarism, they become “more likely to
report a case,” adds Zuidema. “It saves a lot of time” and makes some cases
possible, which would have been prohibitively time-consuming using traditional
searches through old papers and other sources.

While Plagiarism.org bills itself as “the only automated Web site cataloging and
academic paper originality checker in existence,” there are several similar
services.

The Essay
Verification Engine
(EVE), uses a downloadable program (free for 15 days, and
then $34.95) that searches the Internet for matching phrases in the text. The
makers of EVE boast that it “has been developed to be powerful enough to find
plagiarized material while not overwhelming the professor with false links” — a
promising assertion. And IntegriGuard
promises an overall “passed” or “failed” mark along with sentence-by-sentence
analysis of the paper.

To test these automated plagiarism-detectors, I constructed a mini-essay with
randomly selected sentences from works by four major authors: Karl Marx, Oscar
Wilde, Bram Stoker and Ralph Waldo Emerson. I also included a slightly revised
version of the sentences from Marx’s “The Communist Manifesto,” changing words
and punctuation to see if it could be identified.

Plagiarism.org found the revised Marx paragraph, but completely missed the direct
quotes from Wilde, Stoker and Emerson. Overall, Plagiarism.org found the paper –
composed entirely of plagiarized material — to have a high degree of
originality.

EVE, however, found everything — the sentence from Wilde’s “Birthday of the
Infanta,” the portion of Bram Stoker’s “Dracula” that I copied from the text, and
the sentence from Emerson’s “The Transcendentalist.” But what EVE offers in
searching power, it lacks in interface and usability. Rather than providing you
with a side-by-side comparison of the paper you’re analyzing and the matching
phrases it finds from other sources, the software simply generates a list of URLs
for sites where it has located matching phrases — it’s up to the user to
determine what was really plagiarized.

With quirks like these and the difficulties the different programs have
differentiating between quotations and plagiarized text, each service offers a
cautious disclaimer explaining that the analyses aren’t definitive.
IntegriGuard’s sample reports say, “Results provided by IntegriGuard should be
researched before concluding that plagiarism has been committed.”

UC-Berkeley’s Zuidema says he’s well aware of Plagiarism.org’s flaws. “You have
to really be careful what you look at,” he says, because even if a quote is
properly attributed, the passage can be identified as plagiarized material. It’s
clear that while the program can be a helpful tool for detecting potential
plagiarism, it is not as an absolute test of originality.

“A human being must take that report and interpret it,” says Plagiarism.org’s
Barrie, to “make sure that what we’re saying jibes with reality.” In the case of
my thesis, there sure wasn’t much jibing going on. Had a professor inexperienced
in the ways of plagiarism detectors consulted the service about my work, I could
have been branded a plagiarist and maybe even expelled moments before I walked
across the stage to collect my $80,000 diploma. As educators begin to rely more
on technology, hopefully they’ll realize that — at least for now — nothing can
completely replace the watchful eyes of human beings.