Spam vs. spam

The only way to stem the flood of unwanted e-mail may be to harness a million eyeballs and an army of open-source hackers.

Published June 24, 2002 9:11PM (EDT)

The World Birthday Web is back!

Last August, I wrote a melancholy column bemoaning the death of the World Birthday Web, programmer Thomas Boutell's effort to spread a little birthday cheer via the Internet. For six years, on the date of my birthday, I had been receiving a passel of e-mailed greetings from strangers simply because my birth date was included in the World Birthday Web. But last year the birthday greetings dried up. Spammer abuse of the Birthday Web database forced Boutell to shut down the site.

In May of this year, Boutell e-mailed former participants in the World Birthday Web and invited them to sign back up. He had devised a combination of encryption and HTML trickery that he hoped would defeat the spammers.

It remains to be seen whether Boutell's fix will work permanently -- as of mid-June, he reported 2,000 returnees, a far cry from the 200,000 participants the site boasted at the height of its glory. Spammers are nothing if not resourceful. Indeed, the amount of spam generated by abusers of the World Birthday Web, at its worst, would now amount to a hardly noticeable trickle in the flood that sweeps across the Net every day. Every measurement I've been able to find tells the same sad story: Not only has the volume of spam steadily risen every year since the Internet became a mainstream phenomenon, but the rate of growth also appears to be accelerating. All the year-by-year graphs show pretty much the same upward curve -- and sometime during the year 2000, the line starts pointing nearly straight up.

But Boutell's happy news came at the same time that system administrators in my office brought a new, and successful, weapon to bear in the fight against spam on our own servers at Salon and the Well: SpamAssassin, an open-source filtering engine that cleverly differentiates between legitimate and junk e-mail. You can't keep a good technologist down -- if spammers are indefatigable, so are anti-spam geeks.

For those of us who get hundreds of spam e-mails a day, SpamAssassin is heaven-sent. SpamAssassin labels the e-mail it thinks is spam, and individual users can then shunt the garbage off into its own quarantined mailbox. Once or twice a day, I check to see that no "false positives" have been misdirected. Life, post-SpamAssassin, is definitely better.

Salon isn't alone in appreciating SpamAssassin. Although currently targeted at Unix users, SpamAssassin has boomed in popularity over the past four months. Craig Hughes, a significant code contributor to the project and the founder of Deersoft, a start-up that will offer a commercial version of SpamAssassin that Windows users can harness to their Outlook programs, says that in March, SpamAssassin registered a total of 2,000 downloads. In April and May, the number jumped to 30,000.

As the Wall Street Journal reported last Wednesday, an increasing number of start-ups are rushing to market with various spam-fighting technologies. Where there's a plague, there's a market opportunity. The Journal didn't mention Deersoft or SpamAssassin. That may turn out to be a mistake. SpamAssassin, by virtue of its nature as an open-source project, may have a leg up on its competitors. Spam is like a hydra: It needs a multiheaded opponent to give it a serious battle.

Open-source software refers to software programs for which the code is publicly available, and open to modification by anyone who cares to take a hack. For a problem like spam-fighting, these factors may turn out to be a huge advantage.

The key to SpamAssassin is its "rules." It looks at an e-mail and applies various rules to figure out whether it is likely to be spam. For example, there might be a rule that says "If the text 'Make Money Fast' appears, score ten points." (A total score of five points or higher earns an e-mail the spam label.) Rules range from simple ones, like the presence of specific text, to more complicated, geeky principles that analyze how and from where the message was sent.

New kinds of spam are being created all the time, of course, and coming up with new rules is a never-ending business. That's where having a large, distributed base of potential rule-makers is crucial. Say a hacker starts getting a lot of spam from Korean addresses; he writes a rule that identifies Korean spam, and adds it to SpamAssassin. Presto!

"The great thing about SpamAssassin is that it's so open," says Matt Sergeant, a key participant in the project. "Anyone can help by contributing rules, or spams that got missed, or helping us out with the main core bits of code. This gives us a huge advantage over some commercial alternatives such as Brightmail. The best part is that the people who work with SpamAssassin -- sysadmins -- are the ones who hate spam the most, and so are adamant to stop it. These people are tigers!"

"The open-source system really does help to generate a wide variety of rules," says Hughes. "It's the so-called million eyeballs principle ... lots and lots of people working together to solve a common problem."

Hughes notes that it's not just the rule-making that benefits from the process. SpamAssassin is designed to be modular: It's very easy to add a piece with specific functionality that takes advantage of whatever spam-fighting mechanism has been devised anywhere on the Net.

"Most other spam filters," says Hughes, "will do one thing. They'll search for text strings, or they will go look up things in the Realtime Blackhole List, or they'll calculate some kind of cryptographic checksum to see if it matches anything already known to be spam. But SpamAssassin tries to do everything. It's very easy to create a module that will connect to any kind of spam identification method that exists."

The Internet economy is littered with the corpses of companies that attempted to make a profit from open-source or free software. In 2002, it feels almost a little self-delusional to start enthusing about the commercial possibilities of code that anyone can hack on. But SpamAssassin may flourish because existing companies want to take advantage of its filtering engine for their own proprietary products, and in the meantime, are happy to return the favor by giving back some (though not all) improvements.

Matt Sergeant works for a company called MessageLabs that specializes in e-mail security. He says he was asked to develop an anti-spam detection engine to fit into the overall product.

"[In Oct. 2001] I searched around to see what was out there," says Sergeant, "and already back then SpamAssassin was the best open-source anti-spam solution bar none. So I took the code, and integrated it with our e-mail engine ... We're now seeing really great results with the combination of SpamAssasssin and my extensions."

Likewise, Hughes says he plans to keep plowing effort back into improving the open-source version of SpamAssassin, while holding onto the extensions that make it work with, say, Outlook, as the proprietary property of Deersoft. (SpamAssassin is licensed under the "Artistic Licence" devised by Lary Wall for the Perl scripting language, which essentially means that anybody can do pretty much anything they want with the code.)

It's a classic model for open-source software development: a group of parties coming together to collaborate on a common code-base that is of benefit to all -- not just to the individuals who want to fight spam, but also to companies with specific products that incorporate some aspect of spam-fighting.

But will it work in the long run? With the return of the World Birthday Web and the emergence of SpamAssassin, is hope finally on the horizon for those afflicted by junk e-mail? Or will the spammers just find a way around the latest technology, as they have found their away around every other previous obstacle? Already, I've observed that the amount of spam escaping SpamAssassin's clutches is rising. Sure, there's a newer version of the software that we need to install, but a sysadmin only has so many hours in the day.

Justin Mason, one of the original leaders of the SpamAssassin project, believes that the problem of spam is "never going to be addressable by pure technology. The spammers are human too, and will always put plenty of effort into defeating whatever filters are out there. As a result, it'll always be an 'arms race' between spammers and the filter developers and users, requiring frequent updating of filters ... With enough work from the anti-spam community (and sysadmins using anti-spam tools!) -- which seems to be forthcoming enough -- we can keep ahead and make their lives a whole lot harder. Fundamentally, though, no matter how hard we make it, I don't think we can 'defeat' spam."

Craig Hughes disagrees.

"I think it is [a winnable battle]," says Hughes, "and here's why. Ultimately spammers need to get a commercial message through that you will respond to. If they don't, there is no point in sending it. They could send you random gibberish, but they won't get any benefit. As long as there is a requirement that they need to make money or get a response I think we can construct ways of catching those messages, and distinguishing those messages.

"It's definitely an arms race -- our filters will get better, and spammers will get smarter. But the best spammer in the world has to beat us, and we only have to beat the average spammer. It's only a problem if too many messages get through, as long as we stay ahead of the average spammer, we're OK."

I want to believe Hughes, although I have a sneaking suspicion that a gallows or a guillotine might be the only technology that really has a hope of deterring spammers. And even the best filtering engine in the world does nothing to address the load that spam puts on the Internet's infrastructure -- the processing and bandwidth resources that it consumes. Dan Quinlan, a hacker hard at work combatting the rising flood of spam emanating from Korea, believes that only a three-pronged attack will work -- one that utilizes filtering, legislation and wholly new e-mail protocols that make spam more difficult.

But until Congress gets its act together, we're going to have to depend on the best the geeks can do. That's nothing to sniff at. Judging by SpamAssassin -- and the kind of will that says, by gum, I'm going to send out birthday greetings no matter how hard the spammers try to stop me -- the geeks aren't going down easy.

By Andrew Leonard

Andrew Leonard is a staff writer at Salon. On Twitter, @koxinga21.

MORE FROM Andrew Leonard

Related Topics ------------------------------------------