Sam Williams

A unified theory of software evolution

Meir Lehman has been studying the life cycles of computer programs since he was a researcher at IBM 30 years ago. One of these days he's going to get it all figured out.

  • more
    • All Share Services

The office of Meir “Manny” Lehman is a cozy one. Located on the outer edge of the Imperial College of Science, Technology and Medicine campus in South Kensington, London, it offers room for all the basic amenities: a desk, two chairs, a Macintosh G4 and a telephone. Still, for a computer scientist nearing the end of a circuitous 50-year career, the coziness can be a bit confining.

“You’ll have to forgive me,” apologizes Lehman at one point, sifting through a pile of research papers on a nearby shelf. “Since I lost my secretary, I can’t seem to find anything.”

The pile, a collection of recently published papers investigating the topic of software evolution, a topic Lehman helped inaugurate back in the 1970s, is something of a taunting tribute. Written by professional colleagues at other universities, each paper cites Lehman’s original 1969 IBM report documenting the evolutionary characteristics of the mainframe operating system, OS/360, or his later 1985 book “Program Evolution: Processes of Software Change,” which expands the study to other programs. While the pile’s growing size offers proof that Lehman and his ideas are finally catching on, it also documents the growing number of researchers with whom Lehman, a man with dwindling office space and even less in the way of support, must now compete.

“And to think,” says Lehman, letting out a dry laugh. “When I first wrote about this topic, nobody took a blind bit of notice.”

Software evolution, i.e. the process by which programs change shape, adapt to the marketplace and inherit characteristics from preexisting programs, has become a subject of serious academic study in recent years. Partial thanks for this goes to Lehman and other pioneering researchers. Major thanks, however, goes to the increasing strategic value of software itself. As large-scale programs such as Windows and Solaris expand well into the range of 30 to 50 million lines of code, successful project managers have learned to devote as much time to combing the tangles out of legacy code as to adding new code. Simply put, in a decade that saw the average PC microchip performance increase a hundredfold, software’s inability to scale at even linear rates has gone from dirty little secret to industry-wide embarrassment.

“Software has not followed a curve like Moore’s Law,” says University of Michigan computer scientist John Holland, noting the struggles of most large-scale software programs during a 2000 conference on the future of technology. “In order to make progress here it is not simply a matter of brute force. It is a matter of getting some kind of relevant theory that tells us where to look.”

For Lehman, the place to look is within the software development process itself, a system Lehman views as feedback-driven and biased toward increasing complexity. Figure out how to control the various feedback loops — i.e. market demand, internal debugging and individual developer whim — and you can stave off crippling over-complexity for longer periods of time. What’s more, you might even get a sense of the underlying dynamics driving the system.

Lehman dates his first research on the topic of software evolution back to 1968. That was the year Lehman, then working as a researcher at IBM’s Yorktown Heights facility, received an assignment to investigate IBM’s internal software development process. Managers at rival Bell Labs had been crowing about per-developer productivity, and IBM managers, feeling competitive, wanted proof that IBM developers were generating just as many lines of code per man-year as their AT&T counterparts.

Lehman looked at the development of OS/360, IBM’s flagship operating system at the time. Although the performance audit showed that IBM researchers were churning out code at a steady rate, Lehman found the level of debugging activity per individual software module to be decreasing at an equal rate; in other words, programmers were spending less and less time fixing problems in the code. Unless IBM programmers had suddenly figured out a way to write error-free code — an unlikely assumption — Lehman made a dire prediction: OS/360 was heading over a cliff. IBM, in stressing growth over source-code maintenance, would soon be in need of a successor operating system.

Although IBM executives largely ignored the report, Lehman’s prediction was soon borne out. By 1971, developers had encountered complexity problems while attempting to install virtual memory into the operating system, problems which eventually forced the company to split the OS/360 code base into two, more easily manageable offshoots. The linear growth curve that seemed so steady in the 1960s suddenly looked like the trail of a test missile spiraling earthward.

Lehman’s report would eventually earn a small measure of fame when University of North Carolina professor and former OS/360 project manager Frederick P. Brooks excoriated the IBM approach to software management in his 1975 book “The Mythical Man Month.” Using Lehman’s observations as a foundation for his own “Brooks Law” tenet — “adding manpower to a late software project makes it later” — Brooks argued that all software programs are ultimately doomed to succumb to their own internal inertia.

“Less and less effort is spent on fixing original design flaws; more and more is spent on fixing flaws introduced by earlier fixes,” wrote Brooks. “As time passes, the system becomes less and less well-ordered. Sooner or later the fixing ceases to gain any ground. Each forward step is matched by a backward one. Although in principle usable forever, the system has worn out as a base for progress.”

By 1975, Lehman, with the help of fellow researcher Laszlo Belady, was well on the way to formulating his own set of laws. A quarter century after their creation, the laws read like a mixture of old developer wisdom and common textbook physics. Take, for example, Lehman’s “Second Law” of software evolution, a software reworking of the Second Law of Thermodynamics.

“The entropy of a system increases with time unless specific work is executed to maintain or reduce it.”

Such statements put Lehman, who would leave IBM to take a professorship at Imperial College, into uncharted waters as a computer scientist. Halfway between the formalists, old-line academics who saw all programs as mathematical proofs in disguise, and the realists, professional programmers who saw software as a form of intellectual duct tape, Lehman would spend the ’70s and ’80s arguing for a hybrid point of view: Software development can be predictable if researchers were willing to approach it at a systems level.

“As I like to say, software evolution is the fruit fly of artificial systems evolution,” Lehman says. “The things we learn here we can reapply to other studies: weapon systems evolution, growth of cities, that sort of thing.”

That Lehman conspicuously leaves out biological systems is just one reason why his profile has slipped over the last decade. At a time when lay authors and fellow researchers feel comfortable invoking the name of Charles Darwin when discussing software technology, Lehman holds back. “The gap between biological evolution and artificial systems evolution is just too enormous to expect to link the two,” he says.

Nevertheless, Lehman aspires to the same level of intellectual impact. While he was in retirement during the early 1990s, his early ideas jelled into one big idea: What if somebody were to formulate a central theory of software evolution akin to Darwin’s theory of natural selection? In 1993, Lehman took an emeritus position at Imperial College and began work on the FEAST Hypothesis. Short for Feedback, Evolution and Software Technology, FEAST fine-tunes the definition of evolvable software programs, differentiating between “S-type” and “E-type”: S-type or specification-based programs and algorithms being built to handle an immutable task, and “E-type” programs being built to handle evolving tasks. Focusing his theory on the larger realm of E-type programs, Lehman has since expanded his original three software laws to eight.

Included within the new set of laws are the Law of Continuing Growth (“The functional capability of E-type systems must be continually increased to maintain user satisfaction over the system lifetime”) and the Law of Declining Quality (“The quality of E-type systems will appear to be declining unless they are rigorously adapted, as required, to take into account changes in the operational environment”). For added measure, Lehman has also thrown in the Principle of Software Uncertainty, which states, “The real world outcome of any E-type software execution is inherently uncertain with the precise area of uncertainty also unknowable.”

While the new statements still read like glossed-over truisms, Lehman says the goal is to get the universal ideas on paper in the hopes that they might lead researchers to a deeper truth. After all, saying “objects fall down instead of up” was a truism until Sir Isaac Newton explained why.

“Whenever I talk, people start off with blank faces,” Lehman admits. “They say, ‘But you haven’t told us anything we didn’t already know.’ To that I say, there’s nothing to be ashamed of in coming up with the obvious, especially when nobody else is coming up with it.”

For extra ammo, Lehman also has expanded the graphs and data from his original studies in the 1970s. Taken together, they show most large software programs growing at an inverse square rate — think of your typical Moore’s Law growth curve rotated 180 degrees — before succumbing to over-complexity.

Whether the curves serve as anything more than a conversation-starter is still up for debate. Chris Landauer, a computer scientist at the Aerospace Corporation and a fellow guest speaker with Lehman at a February conference on software evolution at the University of Hertfordshire, was impressed by the Lehman pitch.

“He has real data from real projects, and they show real phenomena,” Landauer says. “I’ve seen other sets of numbers, but these guys have something that might actually work.”

At the same time, however, Landauer wonders if the explanation for similar growth trajectories across different systems isn’t “sociological.” In other words, do programmers, by nature, prefer to add new code rather than substitute or repair existing code? Landauer also worries about whether the use of any statistic in an environment as creative as software development leads to automatic red herrings. “I mean, how long does it take a person to come up with a good idea?” Landauer asks. “The answer is we just don’t know.”

Michael Godfrey, a University of Waterloo scientist, is equally hesitant but still finds the Lehman approach useful. In 2000, Godfrey and a fellow Waterloo researcher, Qiang Tu, released a study showing that several open-source software programs, including the Linux kernel and fetchmail, were growing at geometric rates, breaking the inverse squared barrier constraining most traditionally built programs. Although the discovery validated arguments within the software development community that large system development is best handled in an open-source manner, Godfrey says he is currently looking for ways to refine the quantitative approach to make it more meaningful.

“It’s as if you’re trying to talk about the architecture of a building by talking about the number of screws and two-by-fours used to build it,” he says. “We don’t have any idea of what measurement means in terms of software.”

Godfrey cites the work of another Waterloo colleague, Rick Holt, as promising. Holt has come up with a browser tool for studying the degree of variation and relationship between separate offshoots of the original body of source code. Dubbed Beagle, the tool is named after the ship upon which Charles Darwin served as a naturalist from 1831 to 1836.

Like Landauer, Godfrey expresses concern that a full theory of software evolution might be too “fuzzy” for most engineering-minded programmers. Still, he credits Lehman for opening the software field to newer, more intriguing lines of inquiry. “It’s the gestalt ‘Aha’ of his work that I find more interesting than the numbers,” Godfrey says.

For Lehman, the lack of a scientific foundation to the software-engineering field is all the more reason to keep digging. Fellow researchers can quibble over the value of judging software in terms of total lines of code, but until they come up with better metrics or better theories to explain the data, software engineering will always be one down in the funding and credibility department. A former department head, Lehman recalls the budgetary battles and still chafes over the slights incurred. Now, as he sits in a cramped office, trying to recruit new corporate benefactors and a new research staff, he must deal once again with those who label software development a modern day form of alchemy — i.e. all experiment but no predictable result.

“In software engineering there is no theory,” says Lehman, echoing Holland. “It’s all arm flapping and intuition. I believe that a theory of software evolution could eventually translate into a theory of software engineering. Either that or it will come very close. It will lay the foundation for a wider theory of software evolution.”

When that day comes, Lehman says, software engineers will finally be able to muscle aside their civil, mechanical and electrical engineering counterparts and take a place at the grown-ups’ table. As for getting bigger offices, well, he sees that as a function of showing the large-scale corporations that fund university research how to better control software feedback cycles so their programs stay healthier longer. Until then, the search for a theory has rendered Lehman less of a Darwin and more of an Ahab — a man in search of both fulfillment and a little revenge.

Firefox — the flag bearer of free software

Mozilla's browser is taking market share away from Microsoft. Sometimes, slow and steady really does win the race.

  • more
    • All Share Services

Firefox -- the flag bearer of free software

To misquote F. Scott Fitzgerald, there are no second acts in the lives of software projects.

Oh sure, the developers sometimes move on to bigger and better things. When it comes to the created works, however, the trajectory is depressingly consistent: Functional simplicity gives way to feature bloat, followed by brittleness, unreliability and, barring certain monopoly-friendly market conditions, oblivion.

For the bulk of its six-year existence, the Mozilla project has been the unwitting victim and symbol of this truism. Like Jacob Marley’s ghost in “A Christmas Carol,” the open-source browser seemed doomed to bear the sinful weight of its earlier, proprietary incarnation — Netscape Communicator — for eternity.

A funny thing happened on the way to oblivion, however. With no employer to guide them and no market to punish them, Mozilla developers stubbornly kept plugging. After delivering a stable 1.0 release of its Mozilla suite of applications (including a browser and a mail client) in 2002, four years after the project’s launch and about two years beyond initial estimates, they proposed an even more ambitious, ground-up overhaul of the underlying source code. Given the steady half-decade flameout of the original Netscape user population, developers went with the obvious code name: Phoenix.

“Team members wanted to do a reset,” says Mozilla engineering director Chris Hofmann, looking back.

The end result has been arguably the biggest comeback story in software development since Steve Jobs retook the helm at Apple. Trademark issues have forced the Mozilla team to redesignate the project Firefox, but the browser itself has met few obstacles. The 0.9 version, released over the summer, registered more than 5 million downloads. WebSideStory, a Web analytics company, puts the combined October Mozilla-Firefox market share at 6 percent, a 71 percent jump over June market share. To cap it all off, the Mozilla Foundation, official overseer of the project since its spinout from Netscape last year, officially released the 1.0 version on Tuesday, Nov. 9, and has set itself a 10 percent market share target by the end of the year.

“This is a first,” says WebSideStory analyst Geoff Johnston. “Until July, Microsoft had never lost market share. They’d had spikes, sure, but it never trended down. The bigger news now is that the trend has continued.”

Granted, Microsoft’s commanding portion of the browser market — Johnston puts Internet Explorer’s current market share at 92.4 percent — is in no immediate danger of collapsing. What is in danger, however, is the trusted wisdom that open-source developers, whether through cultural prejudice or isolation from market forces, don’t know how to deliver simple, consumer-friendly software tools. Cut loose from the corporate world, Mozilla’s developers have hit their target: a thriving, user-friendly open-source browser. The question everyone should be asking now is: Where Mozilla has trodden, will other open-source projects follow?

The Mozilla Foundation’s Hofmann says the first move in launching the Firefox redesign was soliciting feedback from dedicated users in the hopes of gleaning something that Microsoft developers might have missed.

“We wanted to gather all the different things we learned about building browsers over the last 10 years and combine that with a strong look at the way people used browsers,” Hofmann says.

One thing Mozilla developers quickly learned was that most traditional browser elements are extraneous to the everyday Web-surfing experience. Using minimalism as a design cue, developers whittled down the Firefox tool bar. They also stole a trick from Internet Explorer 5.0 and Opera, a browser created by a Norwegian company, by integrating a Google search form into the browser frame. Most important, they scrapped support for anything outside the W3C rule book, which attempts to set standards for Web development.

This latter decision, which meant that Firefox does not support Microsoft’s ActiveX extensions or any party’s VBSscript add-ons, proved fortuitious. In June, just after the 0.9 version of Firefox became available for download, a Trojan horse known as Download.Ject began to harass Microsoft Windows users en masse. A JavaScript-based Trojan horse of Russian origin, Download.Ject exploits tight coupling of Internet Explorer and Microsoft Windows. Users who visit a propagating page automatically download the invisible JavaScript applet. The applet then installs backdoor access and a keystroke logger on the unwitting recipient’s machine, thus giving third-party hackers a chance to break in at a later date.

One recent convert is Frank Scheelen, manager of the porn-specific search engine Ask Jolene. Based in the Netherlands, Scheelen’s site has a blacklist policy for thumbnail galleries and other porn sites that try to slip JavaScript applets into the downloaded bitstream. To minimize user headaches, the site has also taken to endorsing Firefox, offering a direct link to the Mozilla Foundation download page.

“Firefox is inherently safer, because it allows you to turn off the things that make Internet Explorer dangerous — popups, JavaScript, ActiveX,” says Scheelen.

The reason, says Hofmann, boils down to marketing savvy, or lack thereof. Internet Explorer currrently enjoys its dominant market share not because of Microsoft’s celebrated marketing muscle, but because of Microsoft developers’ undercelebrated flexibility. In essence, they’ve made it accessible to both sides of the browsing experience — the ordinary user who wants to take advantage of the Web’s abundant content and the commercial marketers who use dangle-free content as a lure for sideline promotions. Firefox developers, in contrast, don’t have to worry about the content-provider side and can thus focus on a few elemental details: security, downloading speed, and ease of use.

“We’ve been able to focus, saying, ‘Let’s just do the right thing for the user. If there’s a good search engine out there, let’s integrate it into the product,’” Hofmann says. “We don’t have to worry about business arrangements. We don’t have to worry about how to make money off it. Let’s just go out and make quality software.”

Hofmann isn’t the only one enjoying that freedom. Much of the Mozilla project’s success stems from the fact that individual components have been outsourced to teams obeying their own “let’s just make quality software” imperative. For example, Gecko, the layout engine that determines how Firefox displays HTML, is its own independent project under the Mozilla aegis. The same goes for Netscape Portable Runtime (NSPR), a library to ensure that applications interact with Firefox across a variety of platforms, and Thunderbird, an e-mail client still in development.

This sort of feudal distribution of authority seems like an ideal recipe for chaos. In fact, it’s exactly the sort of thing that has kept both Mozilla in general and Firefox in particular moving forward, even without a major corporate benefactor.

“Our original manifesto for Phoenix set out a few key principles: make a product that just browses, and browses well (and) keep the team small and focused,” writes Blake Ross, a Firefox team co-founder and current Stanford University sophomore, celebrating the 1.0 release on his personal Web site. “I’m proud to say we have delivered on that today.”

Such focus in the midst of complexity is a large reason many open-source projects, despite the waning of late-1990s media hype, have not lost momentum. Apart from Firefox and the ongoing SCO-IBM lawsuit, the most noted open-source story of the last two years has been the Salt Lake City software company Novell’s 2003 decision to purchase Ximian, a Linux desktop company founded by developers of the free software GNOME graphic user interface.

Noting the countercyclical timing of the purchase — IBM, Hewlett-Packard and Sun Microsystems had each invested in GNOME’s success as early as 1999 — Jeff Hawkins, vice president of Novell’s Linux Business Office, says it was the GNOME team’s sustained progress in the subsequent downturn that proved more compelling.

“Remember the phrase ‘Internet time?’” Hawkins asks, pointedly. “I think during the late 1990s there was this fallacy that somehow software could be developed faster. The truth is that software takes people writing it. It takes time.”

Hawkins credits open-source developers for adopting a “steady march of progress” mind-set in the face of shifting market and media conditions. In the case of Mozilla, that mind-set has proved especially useful given the quick die-off in excitement when the 1998 Netscape source code failed to save that company from losing the remainder of its market to Microsoft.

“They kept plugging away,” Hawkins says, of Mozilla. “People ignored them, until they got their break from the security problems in I.E.”

The Mozilla second act, in other words, is a misnomer. While the rebirth imagery works well for those of us with short attention spans, the truth is, Mozilla never really went away. If anything, its delivery comes right on time. Most successful software projects, notably Linux and Windows, take between a half-decade and a full decade to reach full maturity, and most software project managers worth their salt will tell you that a good team, like a good winery, delivers no code before its time.

Instead of the fiery phoenix or the speedy firefox, technology watchers would be well served to think of the microscopic yeast cell — a humble organism that delivers its best work when the lights are off and the oxygen supply is low — the next time they read about reignited browser wars.

“That’s one of the best strengths of open-source [development],” says Hawkins, noting the anaerobic analogy. “There’s no way to kill it in the classic sense. Even the failed companies of open source contribute to its success.”

Continue Reading Close

The Wal-Mart supremacy

The giant retailer's introduction of RFID technology is forcing other supermarket chains to catch up. But fiddling with data may not be the best survival strategy in the Wal-Mart future.

  • more
    • All Share Services

The Wal-Mart supremacy

What do you call it when a company announces a multibillion-dollar technology initiative with no preexisting infrastructure, no software code and an 18-month deadline to delivery?

In most cases you’d call it a recipe for disaster. In the case of Wal-Mart, a company with the power to force others to follow its technology agenda, you’d simply call it “tough love.”

That two-word description, according to a January article in Computerworld Magazine, is exactly how Wal-Mart CEO H. Lee Scott summed up his company’s philosophy on radio frequency identification (RFID) in a speech to suppliers last winter. For those who missed it, the company sent out letters to top suppliers last June requesting that all pallets and boxes come equipped with RFID tags by Jan. 1, 2005, a request designed to facilitate better warehouse tracking. Suppliers so far seem to have gotten the message. This June, a year after the initial letter campaign requesting 100 participants, Wal-Mart reported that 137 companies had climbed aboard.

“We see this as beneficial to the entire supply chain,” says Procter & Gamble spokesperson Jeannie Tharrington, summarizing her company’s eager participation in the so-called “mandate.” “Right now our out-of-stock levels are higher than we’d like and certainly higher than the consumer would like, and we think this technology can help us to keep the products on the shelf more often.”

Such comments, of course, reinforce a growing theme in the business and technology press: Those worried about Wal-Mart’s deleterious effect on mom and pop retailers need to put down Nirvana’s “Nevermind” album and catch up on present-day reality. Nowadays, even billion-dollar behemoths face the awkward choice of doing things the Wal-Mart way or watching a major portion of their customer base wave goodbye.

Not surprisingly, most are choosing to set their strategic clocks to Arkansas time. This summer, just before Wal-Mart launched a pilot RFID rollout in a handful of Texas stores, the Worldwide Retail Exchange, an industry consortium launched by supermarkets and other large retailers to improve back-end efficiencies, announced that it, too, had seen a dramatic increase in members willing to participate in its “global data synchronization” effort. The effort’s focus is to make sure that the code a supplier uses to describe a consumer product in its own databases matches the code in retailer databases, a simple concept in theory but a fiendishly complex task in reality. The reason: Most retailers and suppliers rely on proprietary software code and standards to define current bar code data. Adopt a common standard, says WWRE’s chief marketing officer Nick Parnaby, and a package of toilet paper or can of tuna suddenly becomes trackable across all portions of the so-called “supply chain” — factory, truck, retail shelf and checkout line.

Wal-Mart’s decision to unilaterally impose RFID on its suppliers made making the case for “global data synchronization” to the rest of the industry a done deal, whether or not they understood what they were doing.

“[Before RFID] you couldn’t describe it to your chief executive in less that 25 words,” says Parnaby. “With RFID, you suddenly have people’s attention.”

Granted, investment levels in technology among supermarkets, a retail sector that has given up 21 percent of its North American market share to Wal-Mart over the last two decades, remain modest. Of the 20 companies that have participated in the data synchronization program, Parnaby estimates the average investment to be $250,000 per company. Still, he sees it as an ante on what has become an increasingly high stakes poker table. It’s a sign that volume-dependent chains like Krogers, Albertson’s and Safeway are willing to gamble on Wal-Mart’s ability to make RFID an industry-wide product tracking standard.

“Wal-Mart has created this herd moving in the right direction,” Parnaby says.

But is it really the right direction for anyone besides Wal-Mart? Some industry observers suggest that supermarket chains that are attempting to survive in a Wal-Mart world may find that no matter how many technological “efficiencies” they introduce, they will never be able to challenge Wal-Mart in the area where it remains supreme — price. If they really want to differentiate themselves, they may have to look elsewhere. Instead of searching their databases for answers, they might just have to ask a simple question:

“Can I help you to your car, Ma’am?”

Joshua Greenbaum, principal at Enterprise Applications Consulting, has counseled clients to hang back a bit when it comes to game-changing strategies like RFID and data synchronization. After all, he says, it was only five years ago that most supermarket chains were still racing to catch up with electronic data interchange, or EDI, a 1980s-era data-sharing standard designed to reduce the vast amount of paperwork supermarkets generated with each product order.

“Wal-Mart has proven that under intense margin pressure, a little I.T. [information technology] can go a long way,” says Greenbaum, noting the company’s mid-1990s decision to move past the EDI standard. “On the flip side, the mechanics and physics that go into making RFID work are still a long way off. With RFID, there’s still something in the hookah that smells a little funny.”

Such comments echo concerns within the supermarket industry. Known for its punishing margins — according to the Food Marketing Institute, the average supermarket earned 95 cents for every $100 spent inside its doors — the industry can afford little in the way of technology experimentation. That Wal-Mart has proved itself more aggressive in this arena owes more to size than boldness. Last year, Wal-Mart stores generated $256 billion in sales, $67 billion if you count only the food, candies and tobacco products. That’s more than Krogers ($54 billion), Albertson’s ($35 billion) and Safeway ($35 billion).

In other words, assuming a similar 1 percent technology reinvestment rate, Wal-Mart could match each of its three grocery competitors dollar for dollar and still have another dollar left over for pilot programs like RFID. Never mind the additional savings from nonunion labor and having the best bulk-buying leverage in the business.

“When we’re talking about Wal-Mart, the best that other retailers are going to be able to compete with Wal-Mart is to do what Wal-Mart can’t do,” says Lee Hollman, vice president of product development for Nashville-based IHL Consulting Group, a company that tracks I.T. spending within the retail industry. “Wal-Mart will beat everybody on price, beat ‘em like a drum.”

Surprisingly, few companies take that advice. Asked for an example, Hollman skips over the top tier of companies and points to Publix, a fast-growing Florida supermarket chain that earned $661 million on $16.8 billion in sales, an eye-popping 3.9 percent margin. How did the company do it? By focusing on store-specific services like on-site bakeries and baggers who walk the groceries out to customers’ cars.

To pay for the extra labor, the company charges higher prices than its discount competitors, of course. Then again, the company has also foregone the now-standard “loyalty card” programs that impose hidden software licensing and database grooming costs on the back end.

“I have to get my biases up front, however,” says Hollman. “Both my wife and I do our shopping at Publix.”

The reason, says Hollman, boils down to service. Because the customers who prefer lower prices to better service already have the option of driving to Wal-Mart and Costco in most corners of the U.S. southeast, the company can instead focus on the minority of shoppers willing to pay a few cents more to have their groceries walked out to the car. When the company does make a decision to invest in technology, such investments generally focus on portions of the business visible to the customer, such as a recent 16,000-unit order for Hewlett-Packard checkout terminals.

“They do customer stuff really well,” Hollman says. “They understand that going head-to-head with Wal-Mart is probably not in anybody’s best interest.”

Loyalty card programs, like RFID, have been a favorite target of consumer groups concerned about retailer use of customer-specific information. A more damning complaint, however, comes from Gary Hawkins, president of Green Hills Market, a single store operation in Syracuse, N.Y.: Most companies have yet to see anything close to the expected profits such systems originally promised.

“The big guys deal directly with the [consumer products group] companies,” says Hawkins, noting his neighboring competitors. “My experience has been that the large retail companies collect data more for the benefit of suppliers or to make sure suppliers’ discounts only go to the best customers. I would venture to say, I don’t think that’s the best strategy.”

In contrast, Hawkins says, his own company’s loyalty card program, run with the help of a Windows PC system and an SQL database, is devoted more to identifying the best customers and making sure those customers get additional benefits beyond low prices. Each Thanksgiving, Green Hills’ top spenders receive a free turkey, and each Christmas they get a free tree. Additional incentives throughout the year are designed to keep the profitable customers coming back while at the same time encouraging unprofitable customers to find another place to ferret out bargains. Like Publix, the company is willing to trade higher volume for higher income.

Not every store manager or president has the luxury of knowing his best customers by name and face. Still, Hawkins says, customer service is as much a matter of philosophy as strategy. Executives who see Wal-Mart as a paragon of retail are, in essence, espousing the philosophy that companies can squeeze more profits out of internal operations than they can draw out of customers at the checkout aisle.

“If you sit back and look at the whole supply chain, the consumer is not in that supply chain,” says Hawkins. “The products on that flier are not there because that customer is interested in those products. Those products are there because the manufacturer has paid for them to be there. What the customer wants doesn’t enter into the equation. In my mind that’s a bit backwards.”

I.T. experts like Greenbaum hesitate to assail Wal-Mart’s competitors for embracing Wal-Mart methods. He points to continued comments by Federal Reserve Chairman Alan Greenspan, crediting corporate investments in information technology both for improving market efficiency and per-employee productivity.

At the same time, however, Greenbaum has noted the paradox of Wal-Mart capitalism: That in order for it to succeed, a company practically has to give itself over to Soviet-style principles. Whether that means management centralization, tighter information control, or adopting a “tough love” approach to suppliers and employees, most large-scale corporations are too far down the garden path to consider a detour.

“This is command capitalism,” Greenbaum says. “It almost has to be. We’re talking about the tightest, most dreadful business to be in: All your capital is sitting on the shelf going out of date. Your competition is 100 times bigger than you’ll ever be and can sell products at retail cheaper than you can buy them at wholesale. We’re talking about a business that’s heavily unionized and where one of the biggest unions is the Teamsters. It doesn’t get much more grim than that.”

Continue Reading Close

When machines breed

Evolvable hardware -- gadgets that design themselves -- can get the job done, even if humans have no idea how they do it.

  • more
    • All Share Services

When machines breed

Paul Layzell is a specialist in the budding field of evolvable hardware. Simply put, he helps machines design themselves, using principles borrowed directly from biological evolution.

It’s a job with strange and unexpected twists. Take the time three years back when he and fellow University of Sussex researcher Jon Bird attempted to build an oscillator circuit using genetic algorithms and a handful of transistors. While a few circuits came out fitting the functional profile — steady output, steady frequency — one circuit took a strange path to get there. Instead of building internal feedback loops to reach the desired frequency, it had simply wired itself in a way that the radiated hum of a nearby computer went straight through the circuit and into the attached oscilloscope.

In other words, it cheated. The circuit had hacked the system by becoming a radio.

“The best way I can think to describe it is a mixture of respect and humor,” says Layzell, summing up his reaction. “A bit like when a child solves a common problem in an original way: It always makes you smile.”

Using evolutionary processes to optimize machine performance is nothing new. Since the 1960s, artificial intelligence researchers have exploited the dynamics of Darwinian evolution to solve software problems in fields as diverse as financial investment, manufacturing and biochemistry.

What is new, however, is the application of evolutionary processes in the hardware realm. Thanks to reconfigurable devices such as the field programmable gate array (FPGA) — the microchip designer’s equivalent of an Etch A Sketch — and increasing computational power, researchers who once performed simulations of new circuits with an eye on the clock are suddenly free to let their designs evolve for a while just to see what happens. One might not be sure that one understands how a given circuit achieves what it is supposed to, but if it works, is that really a problem?

For many engineers, the question is already the first major litmus test of the 21st century. Those who answer yes see evolutionary engineering as barely a half step above tinkering. Those who answer no, however, see it as a useful method to break through the complexity barriers limiting both software and hardware innovation.

“I see the evolved radio as an intuition pump,” says Bird — borrowing a phrase from Daniel Dennett, a Tufts University philosopher with a sizable fan base in the world of artificial intelligence research — “a vivid thought experiment that can structure the way we think about a problem.”

Derek Linden, chief science officer for JEM Engineering, an antenna-design firm based in Laurel, Md., has used evolutionary design processes to build antennas for military contractors and NASA’s Jet Propulsion Laboratory. He credits the relatively sudden interest in evolutionary hardware design to economic factors.

“When I started doing this, I was running my simulations on a single Pentium 66 [MHz] PC,” Linden says. “That meant I had to be real careful with how large my problems were and how long it took things to run. Now, you can brute-force things a lot more easily.”

Applying “brute force” in the case of evolutionary design means breaking problems down into smaller, simpler tasks. Just as the human genotype can be rendered as a 3 billion base-pair genome, so can silicon circuits and wire antennas be boiled down to an even simpler, binary numeric form. Split this “genotype” into randomly determined halves, and you have something that can be “mated” with another design, with the resulting offspring farmed out for testing on a separate software simulator. The results aren’t always pretty, but when you filter out the weak designs and let the breeding process run for a thousand or so generations, you get something like the seedless watermelon — all features and no drawbacks.

“Some have called genetic algorithms ‘embarrassingly parallelizable,’” says Linden, who uses a seven-CPU Linux Beowulf cluster to test and optimize antenna designs. Linden and fellow engineers run the recommended genotypes through a second, higher-fidelity simulator. If the results remain promising, they fabricate the antenna for real-world testing.

In the case of circuit designs, especially those employing FPGA chips, the evolutionary design program can try out its own designs on a reconfigurable base of silicon transistors. Dubbed “online evolution,” this form of design is much more complex and tends to generate the most bizarre, and intriguing, results.

So far, the process has worked best in the antenna-design realm. Next year, NASA’s Space Technology 5 mission will deploy the first satellite employing an antenna designed by evolutionary processes. Developed at the NASA Ames Research Center, the antenna looks like an ordinary paper clip after a hard day’s work. Built to fit within a cubic inch of space, it features five sharp bends and one gradual bend. All told, the entire design process took 10 hours, using 35 Linux servers and minimal human intervention.

“We try to give as little antenna knowledge as possible to our software and let evolution be free to design the antenna as it sees fit,” says Jason Lohn, head of the Evolvable Systems Group at the NASA Ames Research Center.

Such comments highlight the dividing line between traditional engineering and evolutionary engineering. Where traditional engineers find comfort in rigid specifications, trusting the computer for number-crunching and testing, evolutionary designers must trust the entire design process.

This can lead to sticky situations. Lohn notes how a number of algorithms have come up with designs patented by other inventors. Novel designs, meanwhile, have a tendency to leave engineers scratching their heads. In a 2002 paper summing up their oscillator experiment, Bird and Layzell note that even the designs that play within the rules were often impenetrable to later analysis.

“It has proved difficult to clarify exactly how these circuits work,” Bird and Layzell write. “Probing a typical one with an oscilloscope has shown that it does not use beat frequencies to achieve the target frequency. If the transistors are swapped for nominally identical ones, then the output frequency changes by as much as 30 percent.”

Lohn, for one, sees the electrical engineering world falling into two schools of thought. “One school of thought says you need a black box that does X, Y and Z. If I use evolution to get something that does X, Y and Z, I don’t care what’s in it as long as it works.”

And the other school? “That one says, ‘I need to understand what’s in there,’” Lohn says. “Those are the people we can’t really help, because a lot of times, we don’t know what’s in there.”

Lohn seems comfortable working in the “black box” camp. He describes antenna design as a “black art” and distances himself from those who, following in the footsteps of British 19th century physicist James Clerk Maxwell, prefer elegant, textbook-worthy solutions to actual working solutions.

“Maxwell wrote down the four equations which govern all of wireless communication,” he says. “They describe the physics, but the weird thing is, you never use them. In practice, this field is so squirrely, the only way to learn is through trial and error. It’s the school of hard knocks.”

Not that the field doesn’t offer the occasional, tantalizing glimpses at a deeper, more theoretical insight waiting to be discovered. Lohn recalls episodes in which he and fellow researchers accidentally cut out sizable portions of a circuit with minimal effect. Lohn likens the lost portion to a vestigial organ, like the human appendix, useless now but useful at one time. Hence the decision to keep it around.

Equally intriguing are the occasional experiments where researchers make a mistake, using a faulty wire to power the circuit, for example. In such instances, fixing the wire often kills off many of the most promising circuits, proof positive that even circuits grow accustomed to a shoddy environment.

“Evolution doesn’t want to see that wire fixed,” Lohn says. “Evolution is sneaky. It’ll exploit anything you put within its reach.”

Continue Reading Close

Computer, heal thyself

Why should humans have to do all the work? It's high time machines learned how to take care of themselves.

  • more
    • All Share Services

Computer, heal thyself

In his 1992 book “To Engineer Is Human: The Role of Failure in Successful Design,” Duke civil engineering professor Henry Petroski tosses out a little-known statistic from the history of bridge design: During the latter half of the 19th century, a period that introduced the locomotive train to most corners of the industrial world, roughly a quarter of all iron truss bridges failed.

The simplified reason: Bridge designers, unused to iron as a structural material and railroad trains as a service load, had yet to grasp the full impact of a minor miscalculation anywhere within their plans. It wasn’t until designers started introducing a conservative fudge factor, now known as the margin of error, that bridge designs developed enough redundancy and robustness to account for the occasional errant crossbeam or overloaded rail car.

“Basically civil engineers made bridges safe by recognizing that humans would be involved in every step of the bridge-building process,” says David Patterson, a Berkeley computer science professor who has cited Petroski’s statistic in numerous papers. “With human involvement comes the risk of human failure.”

For Patterson, the iron-truss story is more than just a quick attention grabber; it’s a hint that today’s software programmers, oft derided for their failure to deliver bug-free code, have yet to grasp the full weight of their own discipline.

Coauthor of the landmark 1987 paper that laid out the low-cost memory strategy now known as RAID (the acronym stands for “redundant array of inexpensive disks”), Patterson has long been a proponent of hardware architectures that treat component failure as a given yet still find a way to get the job done. Since 2002, he’s been putting forward the same strategy in the realm of software systems, banding together with Stanford counterpart Armando Fox, head of that university’s Software Infrastructures Group, to launch the Recovery Oriented Computing project.

In a June 2003 article for Scientific American, Fox and Patterson cited Petroski’s observation and laid out their own project’s philosophy and goals. “As digital systems have grown in complexity, their operation has become brittle and unreliable,” they wrote. “Rather than trying to eliminate computer crashes — probably an impossible task — our team concentrates on designing systems that recover rapidly when mishaps do occur.”

While somewhat fatalistic on the surface, treating failure as inevitable just might be the key to pushing software development out of its current malaise. From Berkeley to MIT and points in between, software engineers are buzzing over the prospect of “autonomic computing” — systems built to recognize and recover from their own flaws without tying down a human administrator in the process. Such systems remain a few years over the current commercial horizon, of course, but the sense of collective mission, something akin to the mammoth World War II science projects that spawned computer science in the first place, is growing.

“We’re running into a complexity barrier in computing,” says Steve White, senior manager for autonomic computing at IBM Research. “Computer scientists have done a great job of making software faster and cheaper. But we haven’t paid as much attention to the people costs.”

Maybe that’s because, until recently, counting up the “people costs” was an inexact science itself. “Total cost of ownership” studies vary from platform to platform and often fall prey to vendor bias. Still, over the last decade, one common statistic has emerged: When it comes to running enterprise-level software, most companies spend twice as much on human talent than they do on licensing and acquisition.

While companies strive to reduce this two-thirds tax through lower labor costs (read: outsourcing), researchers are looking further down the road. One problem with hiring any human to fix or tune a system is the assumption that the system is fixable at the human level and that once fixed, it stays fixed. A quick review of recent software history, however, proves otherwise. For at least three decades now, programmers have joked of “heisenbugs” — software errors that surface at seemingly random intervals and whose root causes consistently evade detection. The name is a takeoff on Werner Heisenberg, the German physicist whose famous uncertainty principle posited that no amount of observation or experimentation could pinpoint both the position and momentum of an electron.

“A lot of the bugs we’re seeing in modern systems have been plaguing programmers from the beginning of time,” says Fox, the head of Stanford’s Software Infrastructures Group. “The only difference now is machines just crash faster.”

One remedy to this situation is a strategy so simple every user has relied on it at least once or twice: Reboot the machine and start from scratch. Fox and Stanford University doctoral student George Candea have collaborated on a series of papers investigating a tactic originally known as partial rebooting but which Candea now calls “micro-rebooting.” Instead of digging through the source code to fix errors, their strategy calls upon system managers to simply reboot the offending components while leaving the rest of the network operationally intact.

“In a lot of cases, rebooting cures the problem much faster than fixing the root cause,” Candea says. “We see this all the time with PCs. Rebooting takes 30 seconds to a minute, enough time for a bathroom break. When you come back, the problem is usually gone and you can go back to work.”

Rebooting the components of a computer network is, of course, more challenging than rebooting an individual PC. Network administrators have to guard against the lost data and whatever performance loss such outages might incur. Still, thanks to clustering, a strategy that bundles low-cost hardware resources in a way that makes it easy for one machine to pick up another machine’s workload in the event of a failure or shutdown, most e-commerce networks already have that built-in safeguard. Fox and Candea have worked together to develop a process they call recursive restartability, in which an automated network manager systematically goes through a network’s node tree, rebooting each branch as a form of preventive maintenance.

Lately, however, Candea has been looking at an even more sophisticated approach, one that gives a system its own ability to target and correct failing components. He calls it crash-only computing, and the strategy is to marry micro-rebooting with the increasingly popular diagnostic tactic known as fault injection. Candea has built a Java application server divided into two main components: management and monitoring. The monitoring side periodically sends queries into the software system and watches for any sign of bad data.

If the messages trigger an erroneous response, the monitors’ own components compare notes on the error path, generate a statistical estimate of the faulty component, and send a signal to the management component to perform a micro-reboot. According to a paper released last year, Candea’s self-monitoring Java server was able to increase system dependability by 78 percent while reducing service outages from 12 per hour to zero.

It’s at this point that a technology journalist must fight the urge to evoke biological metaphors, an urge all the more compelling because many programmers, IBM’s White included, consider experiments like Candea’s a first step toward autonomic computing systems that manage internal resources the same way the human body’s own autonomic nervous system regulates heart rate and breathing.

“First of all, I’m a real fan of ROC,” says White, referring to recovery-oriented computing. “It’s that notion of self that I think is the key idea of autonomic computing and the most revolutionary part.”

Candea, for one, is hesitant to invoke biological metaphors but notes that, for discussing overly complex systems, sometimes they are the only parallels available. Like the body’s own autonomic system, which operates independently of the conscious brain, his Java server works best when the monitoring component is strictly isolated from the management component. The same goes for all components. Without rigid functional boundaries, the software equivalent of cell membranes, it is almost impossible to tell which component is in need of a restart.

“It’s all about having isolation of what we in computer-speak call the fault domain,” says Candea.

Across the country, University of Virginia computer scientist David Evans has taken this notion of cellular segregation one step further. Three years ago, he and his colleagues developed a program that shows how a software network might function if limited to the same rules governing cellular interaction. In other words, modules communicate not by direct electronic query but in a fashion modeled on the physics of chemical diffusion. Signals move outward in a slow-moving spherical field, delivering information in variable doses.

While significantly slower than standard electronic communications, this diffusion strategy has one sizable advantage: When healthy components fail, the “signal” remains, leaving a distributed memory of its position and function, a memory the overall network can use to replace the damaged component.

To demonstrate survivability, Evans and his colleagues have taken a cue from biological evolution and programmed the individual modules to build and maintain an arbitrary three-dimensional superstructure — a sphere, for example. Once it is built, various modules are subjected to damaging data and flushed out of the system when they fail. The question then becomes whether the superstructure can rebuild the same shape with a fraction of its original components.

So far, Evans says, diffused signaling works like a charm: “We can survive damage to nearly all the cells as long as the structure is maintained through these types of interactions.”

Building cartoon spheres might seem a little frivolous, but Evans says the experiment has solid business-world roots. A security specialist, he says it was the creativity of Internet hackers that forced him to consider a more creative approach to network defense.

“The attackers have really taken advantage of the interconnectedness of the Internet,” he says. “Defenders haven’t.”

With self-healing software at the blastula stage of software evolution, it seems a bit premature to speak of full-scale autonomic computing. Even so, NASA, DARPA, IBM (which boasts a 3-year-old Autonomic Computing Division) and a growing number of research underwriters have taken an active interest in seeing what’s next. Evans’ sphere project is already supported by the National Science Foundation. This summer, Evans and university colleagues John Knight, Jack Davidson, Anh Nguyen-Tuong and Chenzi Wang will start a new project backed by DARPA’s Self-Regenerative Systems program. “[We'll] study approaches to system security inspired by biological diversity,” he says.

Whether that inspiration leads to outright mimicry remains to be seen. For the moment, says IBM’s White, terms like “self-healing software” and “autonomic computing” offer a convenient reference point for scientists eager to explore the next level of software complexity. Just as the sound barrier forced aircraft designers to radically revise aircraft and engine designs, so today’s complexity barrier is forcing computer scientists to rethink systems design or, at the very least, to seek out new sources of inspiration.

“Today’s systems have too many dials to watch; people can spend their whole lives figuring out how to make a database run well,” White says. “We want to stand this notion of systems management on its head. The system has to be able to set itself up. It has to optimize itself. It has to repair itself, and if something goes wrong, it has to know how to respond to external threats. If I can think about the system at that level, I’m using humans for what they’re good at, and I’m using the machines for what they’re good at. That’s the idea here.”

Continue Reading Close

Invasion of the spambots

From blog spam to pornbots, new strains of computer programs aimed at pumping up Google page ranks just keep on coming.

  • more
    • All Share Services

For Lawrence Kestenbaum, the realization that a new species of intelligent agent — or “bot” — was prowling the Internet first dawned about two years ago.

It was about that time, Kestenbaum says, that a series of “fluke” addresses started popping up in the HTTP referrer log of his personal Web site, the historical cemetery database Political Graveyard.

“If you’re at all concerned with how your Web site is being received, you’re almost compulsively checking the logs to see who’s coming in and from where,” says Kestenbaum, laying the scene. “You get to know what sites are linking to you. Anything new gets your attention.”

Even more attention-grabbing, Kestenbaum adds, was the fact that the fluke referrals came in bunches. Curious, Kestenbaum pasted in the URL and went to look. His disappointment was immediate. Expecting something interesting, he instead found a page filled with nothing but banner and pop up ads.

For a moment, Kestenbaum says, he suspected a glitch. How else could one explain a dozen or so Internet browsers flipping directly from a site boasting zero unpaid content to one documenting historical graveyards? It didn’t make sense.

“That’s when I had this ‘Aha’ moment,” says Kestenbaum. “I’d visited the site because of the very technique they’d used to advertise it. Somebody had taken the trouble to write a program that would plant strange links in referrer logs knowing that the people curious enough to check those logs would also be curious enough to follow the link.

Scary as it may seem, spam is evolving. The automated, Web-spidering technology that delivers bulk c1alis and vi@gra ads to your daily e-mail in box has mutated into a dozen variants, targeting everything from cellphones to blogs to instant messenger accounts. Feeding off the two divergent trends in online publishing — increased specialization of content and increased generalization in the use of basic software tools such as Google, AIM and Movable Type — many of these mutations no longer even demand your attention. In some cases, a place to hide in a chat room or forum is the only thing they need.

“There are tons of ways to monetize any type of traffic you can get,” notes Aaron Wall, author of “The SEO Book,” a newly published treatise on the art of “search-engine optimization” and other traffic-boosting techniques. “The indirect technique isn’t as noticed yet, because so many people are still fighting off the direct stuff,” Wall says.

So-called indirect techniques vary. Aside from referrer-log spam — the general term for what happened to Kestenbaum’s site in 2002 — there’s “blog spam” (using bots to post unsolicited HTTP links in the “comment” sections of blog listings), and chat-room spam. Recently, marketers have even resorted to targeting wiki sites such as Wikipedia, taking advantage of their anyone-can-edit policies.

“We’ve only been noticing it for six months,” says Tim Starling, an Australian Wikipedia contributor who has taken a leadership role in the site’s attempts to ward off the bot menace. “The bots will go through a site and spam every page. They’ll start with the smaller [non-English] language versions, which aren’t watched as closely. So it takes longer to pick them up.”

In each case, the goal isn’t so much to solicit a purchase or confirm receipt — the tactic of most e-mail spam capaigns — as to boost visibility. With more than a third of all Internet search queries now running through Google, site marketers have crafted their automated campaigns with an eye to Google’s PageRank algorithm, which factors the total number of incoming links to a site as a sign of relevance.

Although Google publishes clearly stated policies forbidding the use of “link farms,” — sites that manipulate link totals as a way to boost (and rent out) page ranks — the percentage of offenders dropped entirely from Google search listings is microscopically small.

That, says British SEO specialist Phil Craven, leaves plenty of room for other people to push the envelope.

“If a search engine like Google can make link text so important, then people are going to go out of their way to get link text,” says Craven. “So-called spamming is perfectly valid, if necessary.”

Such words are tempered by Craven’s own experience as a target of exotic spam. As manager of the SEO forum Web Workshop, Craven says he recently had to upgrade his site-registration system to ward off bots that had been masquerading as human guests in an effort to deposit links in the open forum and profile sections.

“Basically, the bot would come along and register five names at a time,” says Craven. “The names always began with a non-alphanumeric character and ended with a non-alphanumeric character, like a percentage symbol or an exclamation point.”

To stop the bot, Craven simply modified the registration process, forcing registrants to confirm their chosen username before getting the usual welcome e-mail. The trick worked only because the bot’s author, knowing that most users will run the program in default security mode, didn’t bother accounting for such a variation.

“I can do that because I’m a programmer,” Craven says. “A lot of forums don’t have programmers operating them and they simply wouldn’t be able to do it.”

Such modifications are similar in their simplicity to the now-common anti-spam technique of spelling out e-mail addresses using “at” and “dotcom.” The only thing keeping bot writers from anticipating the trick, Wall says, is the level of effort. Currently, bot writers and copiers find that there are enough newbie operators out there to serve as unwilling page-rank boosters.

“The main thing that’s driving specialization is whatever’s exploitable and easy,” Wall says. “Once it’s no longer exploitable and easy, people move on to something else.”

To get a glimpse of innovation in the bot world, the best place to look, as usual, is in the realm of adult entertainment.

“The adult industry will likely be married to spam and its attendant distribution methods long past the evolution of man into beings of pure energy,” jokes Domenic Merenda, vice president of business development for Edge Productions, a company that operates adult-media properties.

Merenda says his company doesn’t resort to spam but admits to having “rubbed elbows with the kingpins.” The experience has given him a chance to divide so-called porn bots into three major categories: lead-generation bots, URL-proliferator bots and address-harvesting bots.

Of the three categories, lead-generation programs tend to be the most sophisticated and most expensive. Unleashed on X- and R-rated chat-room logs, they run through transcripts, seeking out the names and addresses of the most active participants. Once acquired, these contacts become fodder for third-party vendors eager to advertise webcams, escort services and other variations on the adult-entertainment theme.

Aside from the obvious legal issues, such programs face a growing hurdle: Many of the most active participants in public chat-rooms nowadays are other bots masquerading as human users, often for commercial purposes.

To cut down on this practice, many chat-rooms now use CAPTCHA, an automated tool developed by computer scientists at Carnegie Mellon University. Short for “completely automated public Turing test to tell computers and humans apart,” CAPTCHA is the chat-room equivalent of an immune system T cell. It asks registrants to prove their non-bot status by identifying a randomly generated word. Instead of displaying the word as normal text, however, it displays it as a distorted image, usually with a patterned background, a format that can befuddle even the most sophisticated optical character recognition systems.

“We settled on something humans could do, but machines can’t,” says Luis von Ahn, a Carnegie Mellon grad student and CAPTCHA project member.

Like the helper T cell, however, CAPTCHA is far from perfect. In 2002, less than a year after the Carnegie Mellon group delivered a working prototype of the CAPTCHA system, programmers at the University of California were already claiming the ability to crack CAPTCHA-generated images in Yahoo’s e-mail account-registration system. Porn marketers, meanwhile, have recruited eager users to beat the system. To gain entry or special privileges on many sites, users identify CAPTCHA images piped in by bots currently attempting to register fresh accounts.

If such ploys seem slightly Darwinian, maybe that’s because the people charged with designing them see the Internet in survival-of-the-fittest terms.

When the referrer-log spam phenomenon first attracted attention two years ago, Francois Lane, owner of the Canadian marketing firm Mastodonte Communication, took credit for the outbreak while at the same time disavowing any sense of guilt.

“I’m not too worried about my reputation,” Lane wrote in response to blogger complaints. “Marketing is all about being innovative, different, adaptive, taking risks and knowing how to use the technology. I’m trying to be all that.”

Continue Reading Close

Page 1 of 3 in Sam Williams