Invasion of the spambots

From blog spam to pornbots, new strains of computer programs aimed at pumping up Google page ranks just keep on coming.

Published June 8, 2004 7:30PM (EDT)

For Lawrence Kestenbaum, the realization that a new species of intelligent agent -- or "bot" -- was prowling the Internet first dawned about two years ago.

It was about that time, Kestenbaum says, that a series of "fluke" addresses started popping up in the HTTP referrer log of his personal Web site, the historical cemetery database Political Graveyard.

"If you're at all concerned with how your Web site is being received, you're almost compulsively checking the logs to see who's coming in and from where," says Kestenbaum, laying the scene. "You get to know what sites are linking to you. Anything new gets your attention."

Even more attention-grabbing, Kestenbaum adds, was the fact that the fluke referrals came in bunches. Curious, Kestenbaum pasted in the URL and went to look. His disappointment was immediate. Expecting something interesting, he instead found a page filled with nothing but banner and pop up ads.

For a moment, Kestenbaum says, he suspected a glitch. How else could one explain a dozen or so Internet browsers flipping directly from a site boasting zero unpaid content to one documenting historical graveyards? It didn't make sense.

"That's when I had this 'Aha' moment," says Kestenbaum. "I'd visited the site because of the very technique they'd used to advertise it. Somebody had taken the trouble to write a program that would plant strange links in referrer logs knowing that the people curious enough to check those logs would also be curious enough to follow the link.

Scary as it may seem, spam is evolving. The automated, Web-spidering technology that delivers bulk c1alis and vi@gra ads to your daily e-mail in box has mutated into a dozen variants, targeting everything from cellphones to blogs to instant messenger accounts. Feeding off the two divergent trends in online publishing -- increased specialization of content and increased generalization in the use of basic software tools such as Google, AIM and Movable Type -- many of these mutations no longer even demand your attention. In some cases, a place to hide in a chat room or forum is the only thing they need.

"There are tons of ways to monetize any type of traffic you can get," notes Aaron Wall, author of "The SEO Book," a newly published treatise on the art of "search-engine optimization" and other traffic-boosting techniques. "The indirect technique isn't as noticed yet, because so many people are still fighting off the direct stuff," Wall says.

So-called indirect techniques vary. Aside from referrer-log spam -- the general term for what happened to Kestenbaum's site in 2002 -- there's "blog spam" (using bots to post unsolicited HTTP links in the "comment" sections of blog listings), and chat-room spam. Recently, marketers have even resorted to targeting wiki sites such as Wikipedia, taking advantage of their anyone-can-edit policies.

"We've only been noticing it for six months," says Tim Starling, an Australian Wikipedia contributor who has taken a leadership role in the site's attempts to ward off the bot menace. "The bots will go through a site and spam every page. They'll start with the smaller [non-English] language versions, which aren't watched as closely. So it takes longer to pick them up."

In each case, the goal isn't so much to solicit a purchase or confirm receipt -- the tactic of most e-mail spam capaigns -- as to boost visibility. With more than a third of all Internet search queries now running through Google, site marketers have crafted their automated campaigns with an eye to Google's PageRank algorithm, which factors the total number of incoming links to a site as a sign of relevance.

Although Google publishes clearly stated policies forbidding the use of "link farms," -- sites that manipulate link totals as a way to boost (and rent out) page ranks -- the percentage of offenders dropped entirely from Google search listings is microscopically small.

That, says British SEO specialist Phil Craven, leaves plenty of room for other people to push the envelope.

"If a search engine like Google can make link text so important, then people are going to go out of their way to get link text," says Craven. "So-called spamming is perfectly valid, if necessary."

Such words are tempered by Craven's own experience as a target of exotic spam. As manager of the SEO forum Web Workshop, Craven says he recently had to upgrade his site-registration system to ward off bots that had been masquerading as human guests in an effort to deposit links in the open forum and profile sections.

"Basically, the bot would come along and register five names at a time," says Craven. "The names always began with a non-alphanumeric character and ended with a non-alphanumeric character, like a percentage symbol or an exclamation point."

To stop the bot, Craven simply modified the registration process, forcing registrants to confirm their chosen username before getting the usual welcome e-mail. The trick worked only because the bot's author, knowing that most users will run the program in default security mode, didn't bother accounting for such a variation.

"I can do that because I'm a programmer," Craven says. "A lot of forums don't have programmers operating them and they simply wouldn't be able to do it."

Such modifications are similar in their simplicity to the now-common anti-spam technique of spelling out e-mail addresses using "at" and "dotcom." The only thing keeping bot writers from anticipating the trick, Wall says, is the level of effort. Currently, bot writers and copiers find that there are enough newbie operators out there to serve as unwilling page-rank boosters.

"The main thing that's driving specialization is whatever's exploitable and easy," Wall says. "Once it's no longer exploitable and easy, people move on to something else."

To get a glimpse of innovation in the bot world, the best place to look, as usual, is in the realm of adult entertainment.

"The adult industry will likely be married to spam and its attendant distribution methods long past the evolution of man into beings of pure energy," jokes Domenic Merenda, vice president of business development for Edge Productions, a company that operates adult-media properties.

Merenda says his company doesn't resort to spam but admits to having "rubbed elbows with the kingpins." The experience has given him a chance to divide so-called porn bots into three major categories: lead-generation bots, URL-proliferator bots and address-harvesting bots.

Of the three categories, lead-generation programs tend to be the most sophisticated and most expensive. Unleashed on X- and R-rated chat-room logs, they run through transcripts, seeking out the names and addresses of the most active participants. Once acquired, these contacts become fodder for third-party vendors eager to advertise webcams, escort services and other variations on the adult-entertainment theme.

Aside from the obvious legal issues, such programs face a growing hurdle: Many of the most active participants in public chat-rooms nowadays are other bots masquerading as human users, often for commercial purposes.

To cut down on this practice, many chat-rooms now use CAPTCHA, an automated tool developed by computer scientists at Carnegie Mellon University. Short for "completely automated public Turing test to tell computers and humans apart," CAPTCHA is the chat-room equivalent of an immune system T cell. It asks registrants to prove their non-bot status by identifying a randomly generated word. Instead of displaying the word as normal text, however, it displays it as a distorted image, usually with a patterned background, a format that can befuddle even the most sophisticated optical character recognition systems.

"We settled on something humans could do, but machines can't," says Luis von Ahn, a Carnegie Mellon grad student and CAPTCHA project member.

Like the helper T cell, however, CAPTCHA is far from perfect. In 2002, less than a year after the Carnegie Mellon group delivered a working prototype of the CAPTCHA system, programmers at the University of California were already claiming the ability to crack CAPTCHA-generated images in Yahoo's e-mail account-registration system. Porn marketers, meanwhile, have recruited eager users to beat the system. To gain entry or special privileges on many sites, users identify CAPTCHA images piped in by bots currently attempting to register fresh accounts.

If such ploys seem slightly Darwinian, maybe that's because the people charged with designing them see the Internet in survival-of-the-fittest terms.

When the referrer-log spam phenomenon first attracted attention two years ago, Francois Lane, owner of the Canadian marketing firm Mastodonte Communication, took credit for the outbreak while at the same time disavowing any sense of guilt.

"I'm not too worried about my reputation," Lane wrote in response to blogger complaints. "Marketing is all about being innovative, different, adaptive, taking risks and knowing how to use the technology. I'm trying to be all that."

By Sam Williams

Sam Williams is a freelance reporter who covers software and software-development culture. He is also the author of "Free as in Freedom: Richard Stallman's Crusade for Free Software."

MORE FROM Sam Williams

Related Topics ------------------------------------------