Sam Williams

When machines breed

Evolvable hardware -- gadgets that design themselves -- can get the job done, even if humans have no idea how they do it.

  • more
    • All Share Services

When machines breed

Paul Layzell is a specialist in the budding field of evolvable hardware. Simply put, he helps machines design themselves, using principles borrowed directly from biological evolution.

It’s a job with strange and unexpected twists. Take the time three years back when he and fellow University of Sussex researcher Jon Bird attempted to build an oscillator circuit using genetic algorithms and a handful of transistors. While a few circuits came out fitting the functional profile — steady output, steady frequency — one circuit took a strange path to get there. Instead of building internal feedback loops to reach the desired frequency, it had simply wired itself in a way that the radiated hum of a nearby computer went straight through the circuit and into the attached oscilloscope.

In other words, it cheated. The circuit had hacked the system by becoming a radio.

“The best way I can think to describe it is a mixture of respect and humor,” says Layzell, summing up his reaction. “A bit like when a child solves a common problem in an original way: It always makes you smile.”

Using evolutionary processes to optimize machine performance is nothing new. Since the 1960s, artificial intelligence researchers have exploited the dynamics of Darwinian evolution to solve software problems in fields as diverse as financial investment, manufacturing and biochemistry.

What is new, however, is the application of evolutionary processes in the hardware realm. Thanks to reconfigurable devices such as the field programmable gate array (FPGA) — the microchip designer’s equivalent of an Etch A Sketch — and increasing computational power, researchers who once performed simulations of new circuits with an eye on the clock are suddenly free to let their designs evolve for a while just to see what happens. One might not be sure that one understands how a given circuit achieves what it is supposed to, but if it works, is that really a problem?

For many engineers, the question is already the first major litmus test of the 21st century. Those who answer yes see evolutionary engineering as barely a half step above tinkering. Those who answer no, however, see it as a useful method to break through the complexity barriers limiting both software and hardware innovation.

“I see the evolved radio as an intuition pump,” says Bird — borrowing a phrase from Daniel Dennett, a Tufts University philosopher with a sizable fan base in the world of artificial intelligence research — “a vivid thought experiment that can structure the way we think about a problem.”

Derek Linden, chief science officer for JEM Engineering, an antenna-design firm based in Laurel, Md., has used evolutionary design processes to build antennas for military contractors and NASA’s Jet Propulsion Laboratory. He credits the relatively sudden interest in evolutionary hardware design to economic factors.

“When I started doing this, I was running my simulations on a single Pentium 66 [MHz] PC,” Linden says. “That meant I had to be real careful with how large my problems were and how long it took things to run. Now, you can brute-force things a lot more easily.”

Applying “brute force” in the case of evolutionary design means breaking problems down into smaller, simpler tasks. Just as the human genotype can be rendered as a 3 billion base-pair genome, so can silicon circuits and wire antennas be boiled down to an even simpler, binary numeric form. Split this “genotype” into randomly determined halves, and you have something that can be “mated” with another design, with the resulting offspring farmed out for testing on a separate software simulator. The results aren’t always pretty, but when you filter out the weak designs and let the breeding process run for a thousand or so generations, you get something like the seedless watermelon — all features and no drawbacks.

“Some have called genetic algorithms ‘embarrassingly parallelizable,’” says Linden, who uses a seven-CPU Linux Beowulf cluster to test and optimize antenna designs. Linden and fellow engineers run the recommended genotypes through a second, higher-fidelity simulator. If the results remain promising, they fabricate the antenna for real-world testing.

In the case of circuit designs, especially those employing FPGA chips, the evolutionary design program can try out its own designs on a reconfigurable base of silicon transistors. Dubbed “online evolution,” this form of design is much more complex and tends to generate the most bizarre, and intriguing, results.

So far, the process has worked best in the antenna-design realm. Next year, NASA’s Space Technology 5 mission will deploy the first satellite employing an antenna designed by evolutionary processes. Developed at the NASA Ames Research Center, the antenna looks like an ordinary paper clip after a hard day’s work. Built to fit within a cubic inch of space, it features five sharp bends and one gradual bend. All told, the entire design process took 10 hours, using 35 Linux servers and minimal human intervention.

“We try to give as little antenna knowledge as possible to our software and let evolution be free to design the antenna as it sees fit,” says Jason Lohn, head of the Evolvable Systems Group at the NASA Ames Research Center.

Such comments highlight the dividing line between traditional engineering and evolutionary engineering. Where traditional engineers find comfort in rigid specifications, trusting the computer for number-crunching and testing, evolutionary designers must trust the entire design process.

This can lead to sticky situations. Lohn notes how a number of algorithms have come up with designs patented by other inventors. Novel designs, meanwhile, have a tendency to leave engineers scratching their heads. In a 2002 paper summing up their oscillator experiment, Bird and Layzell note that even the designs that play within the rules were often impenetrable to later analysis.

“It has proved difficult to clarify exactly how these circuits work,” Bird and Layzell write. “Probing a typical one with an oscilloscope has shown that it does not use beat frequencies to achieve the target frequency. If the transistors are swapped for nominally identical ones, then the output frequency changes by as much as 30 percent.”

Lohn, for one, sees the electrical engineering world falling into two schools of thought. “One school of thought says you need a black box that does X, Y and Z. If I use evolution to get something that does X, Y and Z, I don’t care what’s in it as long as it works.”

And the other school? “That one says, ‘I need to understand what’s in there,’” Lohn says. “Those are the people we can’t really help, because a lot of times, we don’t know what’s in there.”

Lohn seems comfortable working in the “black box” camp. He describes antenna design as a “black art” and distances himself from those who, following in the footsteps of British 19th century physicist James Clerk Maxwell, prefer elegant, textbook-worthy solutions to actual working solutions.

“Maxwell wrote down the four equations which govern all of wireless communication,” he says. “They describe the physics, but the weird thing is, you never use them. In practice, this field is so squirrely, the only way to learn is through trial and error. It’s the school of hard knocks.”

Not that the field doesn’t offer the occasional, tantalizing glimpses at a deeper, more theoretical insight waiting to be discovered. Lohn recalls episodes in which he and fellow researchers accidentally cut out sizable portions of a circuit with minimal effect. Lohn likens the lost portion to a vestigial organ, like the human appendix, useless now but useful at one time. Hence the decision to keep it around.

Equally intriguing are the occasional experiments where researchers make a mistake, using a faulty wire to power the circuit, for example. In such instances, fixing the wire often kills off many of the most promising circuits, proof positive that even circuits grow accustomed to a shoddy environment.

“Evolution doesn’t want to see that wire fixed,” Lohn says. “Evolution is sneaky. It’ll exploit anything you put within its reach.”

Firefox — the flag bearer of free software

Mozilla's browser is taking market share away from Microsoft. Sometimes, slow and steady really does win the race.

  • more
    • All Share Services

Firefox -- the flag bearer of free software

To misquote F. Scott Fitzgerald, there are no second acts in the lives of software projects.

Oh sure, the developers sometimes move on to bigger and better things. When it comes to the created works, however, the trajectory is depressingly consistent: Functional simplicity gives way to feature bloat, followed by brittleness, unreliability and, barring certain monopoly-friendly market conditions, oblivion.

For the bulk of its six-year existence, the Mozilla project has been the unwitting victim and symbol of this truism. Like Jacob Marley’s ghost in “A Christmas Carol,” the open-source browser seemed doomed to bear the sinful weight of its earlier, proprietary incarnation — Netscape Communicator — for eternity.

A funny thing happened on the way to oblivion, however. With no employer to guide them and no market to punish them, Mozilla developers stubbornly kept plugging. After delivering a stable 1.0 release of its Mozilla suite of applications (including a browser and a mail client) in 2002, four years after the project’s launch and about two years beyond initial estimates, they proposed an even more ambitious, ground-up overhaul of the underlying source code. Given the steady half-decade flameout of the original Netscape user population, developers went with the obvious code name: Phoenix.

“Team members wanted to do a reset,” says Mozilla engineering director Chris Hofmann, looking back.

The end result has been arguably the biggest comeback story in software development since Steve Jobs retook the helm at Apple. Trademark issues have forced the Mozilla team to redesignate the project Firefox, but the browser itself has met few obstacles. The 0.9 version, released over the summer, registered more than 5 million downloads. WebSideStory, a Web analytics company, puts the combined October Mozilla-Firefox market share at 6 percent, a 71 percent jump over June market share. To cap it all off, the Mozilla Foundation, official overseer of the project since its spinout from Netscape last year, officially released the 1.0 version on Tuesday, Nov. 9, and has set itself a 10 percent market share target by the end of the year.

“This is a first,” says WebSideStory analyst Geoff Johnston. “Until July, Microsoft had never lost market share. They’d had spikes, sure, but it never trended down. The bigger news now is that the trend has continued.”

Granted, Microsoft’s commanding portion of the browser market — Johnston puts Internet Explorer’s current market share at 92.4 percent — is in no immediate danger of collapsing. What is in danger, however, is the trusted wisdom that open-source developers, whether through cultural prejudice or isolation from market forces, don’t know how to deliver simple, consumer-friendly software tools. Cut loose from the corporate world, Mozilla’s developers have hit their target: a thriving, user-friendly open-source browser. The question everyone should be asking now is: Where Mozilla has trodden, will other open-source projects follow?

The Mozilla Foundation’s Hofmann says the first move in launching the Firefox redesign was soliciting feedback from dedicated users in the hopes of gleaning something that Microsoft developers might have missed.

“We wanted to gather all the different things we learned about building browsers over the last 10 years and combine that with a strong look at the way people used browsers,” Hofmann says.

One thing Mozilla developers quickly learned was that most traditional browser elements are extraneous to the everyday Web-surfing experience. Using minimalism as a design cue, developers whittled down the Firefox tool bar. They also stole a trick from Internet Explorer 5.0 and Opera, a browser created by a Norwegian company, by integrating a Google search form into the browser frame. Most important, they scrapped support for anything outside the W3C rule book, which attempts to set standards for Web development.

This latter decision, which meant that Firefox does not support Microsoft’s ActiveX extensions or any party’s VBSscript add-ons, proved fortuitious. In June, just after the 0.9 version of Firefox became available for download, a Trojan horse known as Download.Ject began to harass Microsoft Windows users en masse. A JavaScript-based Trojan horse of Russian origin, Download.Ject exploits tight coupling of Internet Explorer and Microsoft Windows. Users who visit a propagating page automatically download the invisible JavaScript applet. The applet then installs backdoor access and a keystroke logger on the unwitting recipient’s machine, thus giving third-party hackers a chance to break in at a later date.

One recent convert is Frank Scheelen, manager of the porn-specific search engine Ask Jolene. Based in the Netherlands, Scheelen’s site has a blacklist policy for thumbnail galleries and other porn sites that try to slip JavaScript applets into the downloaded bitstream. To minimize user headaches, the site has also taken to endorsing Firefox, offering a direct link to the Mozilla Foundation download page.

“Firefox is inherently safer, because it allows you to turn off the things that make Internet Explorer dangerous — popups, JavaScript, ActiveX,” says Scheelen.

The reason, says Hofmann, boils down to marketing savvy, or lack thereof. Internet Explorer currrently enjoys its dominant market share not because of Microsoft’s celebrated marketing muscle, but because of Microsoft developers’ undercelebrated flexibility. In essence, they’ve made it accessible to both sides of the browsing experience — the ordinary user who wants to take advantage of the Web’s abundant content and the commercial marketers who use dangle-free content as a lure for sideline promotions. Firefox developers, in contrast, don’t have to worry about the content-provider side and can thus focus on a few elemental details: security, downloading speed, and ease of use.

“We’ve been able to focus, saying, ‘Let’s just do the right thing for the user. If there’s a good search engine out there, let’s integrate it into the product,’” Hofmann says. “We don’t have to worry about business arrangements. We don’t have to worry about how to make money off it. Let’s just go out and make quality software.”

Hofmann isn’t the only one enjoying that freedom. Much of the Mozilla project’s success stems from the fact that individual components have been outsourced to teams obeying their own “let’s just make quality software” imperative. For example, Gecko, the layout engine that determines how Firefox displays HTML, is its own independent project under the Mozilla aegis. The same goes for Netscape Portable Runtime (NSPR), a library to ensure that applications interact with Firefox across a variety of platforms, and Thunderbird, an e-mail client still in development.

This sort of feudal distribution of authority seems like an ideal recipe for chaos. In fact, it’s exactly the sort of thing that has kept both Mozilla in general and Firefox in particular moving forward, even without a major corporate benefactor.

“Our original manifesto for Phoenix set out a few key principles: make a product that just browses, and browses well (and) keep the team small and focused,” writes Blake Ross, a Firefox team co-founder and current Stanford University sophomore, celebrating the 1.0 release on his personal Web site. “I’m proud to say we have delivered on that today.”

Such focus in the midst of complexity is a large reason many open-source projects, despite the waning of late-1990s media hype, have not lost momentum. Apart from Firefox and the ongoing SCO-IBM lawsuit, the most noted open-source story of the last two years has been the Salt Lake City software company Novell’s 2003 decision to purchase Ximian, a Linux desktop company founded by developers of the free software GNOME graphic user interface.

Noting the countercyclical timing of the purchase — IBM, Hewlett-Packard and Sun Microsystems had each invested in GNOME’s success as early as 1999 — Jeff Hawkins, vice president of Novell’s Linux Business Office, says it was the GNOME team’s sustained progress in the subsequent downturn that proved more compelling.

“Remember the phrase ‘Internet time?’” Hawkins asks, pointedly. “I think during the late 1990s there was this fallacy that somehow software could be developed faster. The truth is that software takes people writing it. It takes time.”

Hawkins credits open-source developers for adopting a “steady march of progress” mind-set in the face of shifting market and media conditions. In the case of Mozilla, that mind-set has proved especially useful given the quick die-off in excitement when the 1998 Netscape source code failed to save that company from losing the remainder of its market to Microsoft.

“They kept plugging away,” Hawkins says, of Mozilla. “People ignored them, until they got their break from the security problems in I.E.”

The Mozilla second act, in other words, is a misnomer. While the rebirth imagery works well for those of us with short attention spans, the truth is, Mozilla never really went away. If anything, its delivery comes right on time. Most successful software projects, notably Linux and Windows, take between a half-decade and a full decade to reach full maturity, and most software project managers worth their salt will tell you that a good team, like a good winery, delivers no code before its time.

Instead of the fiery phoenix or the speedy firefox, technology watchers would be well served to think of the microscopic yeast cell — a humble organism that delivers its best work when the lights are off and the oxygen supply is low — the next time they read about reignited browser wars.

“That’s one of the best strengths of open-source [development],” says Hawkins, noting the anaerobic analogy. “There’s no way to kill it in the classic sense. Even the failed companies of open source contribute to its success.”

Continue Reading Close

The Wal-Mart supremacy

The giant retailer's introduction of RFID technology is forcing other supermarket chains to catch up. But fiddling with data may not be the best survival strategy in the Wal-Mart future.

  • more
    • All Share Services

The Wal-Mart supremacy

What do you call it when a company announces a multibillion-dollar technology initiative with no preexisting infrastructure, no software code and an 18-month deadline to delivery?

In most cases you’d call it a recipe for disaster. In the case of Wal-Mart, a company with the power to force others to follow its technology agenda, you’d simply call it “tough love.”

That two-word description, according to a January article in Computerworld Magazine, is exactly how Wal-Mart CEO H. Lee Scott summed up his company’s philosophy on radio frequency identification (RFID) in a speech to suppliers last winter. For those who missed it, the company sent out letters to top suppliers last June requesting that all pallets and boxes come equipped with RFID tags by Jan. 1, 2005, a request designed to facilitate better warehouse tracking. Suppliers so far seem to have gotten the message. This June, a year after the initial letter campaign requesting 100 participants, Wal-Mart reported that 137 companies had climbed aboard.

“We see this as beneficial to the entire supply chain,” says Procter & Gamble spokesperson Jeannie Tharrington, summarizing her company’s eager participation in the so-called “mandate.” “Right now our out-of-stock levels are higher than we’d like and certainly higher than the consumer would like, and we think this technology can help us to keep the products on the shelf more often.”

Such comments, of course, reinforce a growing theme in the business and technology press: Those worried about Wal-Mart’s deleterious effect on mom and pop retailers need to put down Nirvana’s “Nevermind” album and catch up on present-day reality. Nowadays, even billion-dollar behemoths face the awkward choice of doing things the Wal-Mart way or watching a major portion of their customer base wave goodbye.

Not surprisingly, most are choosing to set their strategic clocks to Arkansas time. This summer, just before Wal-Mart launched a pilot RFID rollout in a handful of Texas stores, the Worldwide Retail Exchange, an industry consortium launched by supermarkets and other large retailers to improve back-end efficiencies, announced that it, too, had seen a dramatic increase in members willing to participate in its “global data synchronization” effort. The effort’s focus is to make sure that the code a supplier uses to describe a consumer product in its own databases matches the code in retailer databases, a simple concept in theory but a fiendishly complex task in reality. The reason: Most retailers and suppliers rely on proprietary software code and standards to define current bar code data. Adopt a common standard, says WWRE’s chief marketing officer Nick Parnaby, and a package of toilet paper or can of tuna suddenly becomes trackable across all portions of the so-called “supply chain” — factory, truck, retail shelf and checkout line.

Wal-Mart’s decision to unilaterally impose RFID on its suppliers made making the case for “global data synchronization” to the rest of the industry a done deal, whether or not they understood what they were doing.

“[Before RFID] you couldn’t describe it to your chief executive in less that 25 words,” says Parnaby. “With RFID, you suddenly have people’s attention.”

Granted, investment levels in technology among supermarkets, a retail sector that has given up 21 percent of its North American market share to Wal-Mart over the last two decades, remain modest. Of the 20 companies that have participated in the data synchronization program, Parnaby estimates the average investment to be $250,000 per company. Still, he sees it as an ante on what has become an increasingly high stakes poker table. It’s a sign that volume-dependent chains like Krogers, Albertson’s and Safeway are willing to gamble on Wal-Mart’s ability to make RFID an industry-wide product tracking standard.

“Wal-Mart has created this herd moving in the right direction,” Parnaby says.

But is it really the right direction for anyone besides Wal-Mart? Some industry observers suggest that supermarket chains that are attempting to survive in a Wal-Mart world may find that no matter how many technological “efficiencies” they introduce, they will never be able to challenge Wal-Mart in the area where it remains supreme — price. If they really want to differentiate themselves, they may have to look elsewhere. Instead of searching their databases for answers, they might just have to ask a simple question:

“Can I help you to your car, Ma’am?”

Joshua Greenbaum, principal at Enterprise Applications Consulting, has counseled clients to hang back a bit when it comes to game-changing strategies like RFID and data synchronization. After all, he says, it was only five years ago that most supermarket chains were still racing to catch up with electronic data interchange, or EDI, a 1980s-era data-sharing standard designed to reduce the vast amount of paperwork supermarkets generated with each product order.

“Wal-Mart has proven that under intense margin pressure, a little I.T. [information technology] can go a long way,” says Greenbaum, noting the company’s mid-1990s decision to move past the EDI standard. “On the flip side, the mechanics and physics that go into making RFID work are still a long way off. With RFID, there’s still something in the hookah that smells a little funny.”

Such comments echo concerns within the supermarket industry. Known for its punishing margins — according to the Food Marketing Institute, the average supermarket earned 95 cents for every $100 spent inside its doors — the industry can afford little in the way of technology experimentation. That Wal-Mart has proved itself more aggressive in this arena owes more to size than boldness. Last year, Wal-Mart stores generated $256 billion in sales, $67 billion if you count only the food, candies and tobacco products. That’s more than Krogers ($54 billion), Albertson’s ($35 billion) and Safeway ($35 billion).

In other words, assuming a similar 1 percent technology reinvestment rate, Wal-Mart could match each of its three grocery competitors dollar for dollar and still have another dollar left over for pilot programs like RFID. Never mind the additional savings from nonunion labor and having the best bulk-buying leverage in the business.

“When we’re talking about Wal-Mart, the best that other retailers are going to be able to compete with Wal-Mart is to do what Wal-Mart can’t do,” says Lee Hollman, vice president of product development for Nashville-based IHL Consulting Group, a company that tracks I.T. spending within the retail industry. “Wal-Mart will beat everybody on price, beat ‘em like a drum.”

Surprisingly, few companies take that advice. Asked for an example, Hollman skips over the top tier of companies and points to Publix, a fast-growing Florida supermarket chain that earned $661 million on $16.8 billion in sales, an eye-popping 3.9 percent margin. How did the company do it? By focusing on store-specific services like on-site bakeries and baggers who walk the groceries out to customers’ cars.

To pay for the extra labor, the company charges higher prices than its discount competitors, of course. Then again, the company has also foregone the now-standard “loyalty card” programs that impose hidden software licensing and database grooming costs on the back end.

“I have to get my biases up front, however,” says Hollman. “Both my wife and I do our shopping at Publix.”

The reason, says Hollman, boils down to service. Because the customers who prefer lower prices to better service already have the option of driving to Wal-Mart and Costco in most corners of the U.S. southeast, the company can instead focus on the minority of shoppers willing to pay a few cents more to have their groceries walked out to the car. When the company does make a decision to invest in technology, such investments generally focus on portions of the business visible to the customer, such as a recent 16,000-unit order for Hewlett-Packard checkout terminals.

“They do customer stuff really well,” Hollman says. “They understand that going head-to-head with Wal-Mart is probably not in anybody’s best interest.”

Loyalty card programs, like RFID, have been a favorite target of consumer groups concerned about retailer use of customer-specific information. A more damning complaint, however, comes from Gary Hawkins, president of Green Hills Market, a single store operation in Syracuse, N.Y.: Most companies have yet to see anything close to the expected profits such systems originally promised.

“The big guys deal directly with the [consumer products group] companies,” says Hawkins, noting his neighboring competitors. “My experience has been that the large retail companies collect data more for the benefit of suppliers or to make sure suppliers’ discounts only go to the best customers. I would venture to say, I don’t think that’s the best strategy.”

In contrast, Hawkins says, his own company’s loyalty card program, run with the help of a Windows PC system and an SQL database, is devoted more to identifying the best customers and making sure those customers get additional benefits beyond low prices. Each Thanksgiving, Green Hills’ top spenders receive a free turkey, and each Christmas they get a free tree. Additional incentives throughout the year are designed to keep the profitable customers coming back while at the same time encouraging unprofitable customers to find another place to ferret out bargains. Like Publix, the company is willing to trade higher volume for higher income.

Not every store manager or president has the luxury of knowing his best customers by name and face. Still, Hawkins says, customer service is as much a matter of philosophy as strategy. Executives who see Wal-Mart as a paragon of retail are, in essence, espousing the philosophy that companies can squeeze more profits out of internal operations than they can draw out of customers at the checkout aisle.

“If you sit back and look at the whole supply chain, the consumer is not in that supply chain,” says Hawkins. “The products on that flier are not there because that customer is interested in those products. Those products are there because the manufacturer has paid for them to be there. What the customer wants doesn’t enter into the equation. In my mind that’s a bit backwards.”

I.T. experts like Greenbaum hesitate to assail Wal-Mart’s competitors for embracing Wal-Mart methods. He points to continued comments by Federal Reserve Chairman Alan Greenspan, crediting corporate investments in information technology both for improving market efficiency and per-employee productivity.

At the same time, however, Greenbaum has noted the paradox of Wal-Mart capitalism: That in order for it to succeed, a company practically has to give itself over to Soviet-style principles. Whether that means management centralization, tighter information control, or adopting a “tough love” approach to suppliers and employees, most large-scale corporations are too far down the garden path to consider a detour.

“This is command capitalism,” Greenbaum says. “It almost has to be. We’re talking about the tightest, most dreadful business to be in: All your capital is sitting on the shelf going out of date. Your competition is 100 times bigger than you’ll ever be and can sell products at retail cheaper than you can buy them at wholesale. We’re talking about a business that’s heavily unionized and where one of the biggest unions is the Teamsters. It doesn’t get much more grim than that.”

Continue Reading Close

Computer, heal thyself

Why should humans have to do all the work? It's high time machines learned how to take care of themselves.

  • more
    • All Share Services

Computer, heal thyself

In his 1992 book “To Engineer Is Human: The Role of Failure in Successful Design,” Duke civil engineering professor Henry Petroski tosses out a little-known statistic from the history of bridge design: During the latter half of the 19th century, a period that introduced the locomotive train to most corners of the industrial world, roughly a quarter of all iron truss bridges failed.

The simplified reason: Bridge designers, unused to iron as a structural material and railroad trains as a service load, had yet to grasp the full impact of a minor miscalculation anywhere within their plans. It wasn’t until designers started introducing a conservative fudge factor, now known as the margin of error, that bridge designs developed enough redundancy and robustness to account for the occasional errant crossbeam or overloaded rail car.

“Basically civil engineers made bridges safe by recognizing that humans would be involved in every step of the bridge-building process,” says David Patterson, a Berkeley computer science professor who has cited Petroski’s statistic in numerous papers. “With human involvement comes the risk of human failure.”

For Patterson, the iron-truss story is more than just a quick attention grabber; it’s a hint that today’s software programmers, oft derided for their failure to deliver bug-free code, have yet to grasp the full weight of their own discipline.

Coauthor of the landmark 1987 paper that laid out the low-cost memory strategy now known as RAID (the acronym stands for “redundant array of inexpensive disks”), Patterson has long been a proponent of hardware architectures that treat component failure as a given yet still find a way to get the job done. Since 2002, he’s been putting forward the same strategy in the realm of software systems, banding together with Stanford counterpart Armando Fox, head of that university’s Software Infrastructures Group, to launch the Recovery Oriented Computing project.

In a June 2003 article for Scientific American, Fox and Patterson cited Petroski’s observation and laid out their own project’s philosophy and goals. “As digital systems have grown in complexity, their operation has become brittle and unreliable,” they wrote. “Rather than trying to eliminate computer crashes — probably an impossible task — our team concentrates on designing systems that recover rapidly when mishaps do occur.”

While somewhat fatalistic on the surface, treating failure as inevitable just might be the key to pushing software development out of its current malaise. From Berkeley to MIT and points in between, software engineers are buzzing over the prospect of “autonomic computing” — systems built to recognize and recover from their own flaws without tying down a human administrator in the process. Such systems remain a few years over the current commercial horizon, of course, but the sense of collective mission, something akin to the mammoth World War II science projects that spawned computer science in the first place, is growing.

“We’re running into a complexity barrier in computing,” says Steve White, senior manager for autonomic computing at IBM Research. “Computer scientists have done a great job of making software faster and cheaper. But we haven’t paid as much attention to the people costs.”

Maybe that’s because, until recently, counting up the “people costs” was an inexact science itself. “Total cost of ownership” studies vary from platform to platform and often fall prey to vendor bias. Still, over the last decade, one common statistic has emerged: When it comes to running enterprise-level software, most companies spend twice as much on human talent than they do on licensing and acquisition.

While companies strive to reduce this two-thirds tax through lower labor costs (read: outsourcing), researchers are looking further down the road. One problem with hiring any human to fix or tune a system is the assumption that the system is fixable at the human level and that once fixed, it stays fixed. A quick review of recent software history, however, proves otherwise. For at least three decades now, programmers have joked of “heisenbugs” — software errors that surface at seemingly random intervals and whose root causes consistently evade detection. The name is a takeoff on Werner Heisenberg, the German physicist whose famous uncertainty principle posited that no amount of observation or experimentation could pinpoint both the position and momentum of an electron.

“A lot of the bugs we’re seeing in modern systems have been plaguing programmers from the beginning of time,” says Fox, the head of Stanford’s Software Infrastructures Group. “The only difference now is machines just crash faster.”

One remedy to this situation is a strategy so simple every user has relied on it at least once or twice: Reboot the machine and start from scratch. Fox and Stanford University doctoral student George Candea have collaborated on a series of papers investigating a tactic originally known as partial rebooting but which Candea now calls “micro-rebooting.” Instead of digging through the source code to fix errors, their strategy calls upon system managers to simply reboot the offending components while leaving the rest of the network operationally intact.

“In a lot of cases, rebooting cures the problem much faster than fixing the root cause,” Candea says. “We see this all the time with PCs. Rebooting takes 30 seconds to a minute, enough time for a bathroom break. When you come back, the problem is usually gone and you can go back to work.”

Rebooting the components of a computer network is, of course, more challenging than rebooting an individual PC. Network administrators have to guard against the lost data and whatever performance loss such outages might incur. Still, thanks to clustering, a strategy that bundles low-cost hardware resources in a way that makes it easy for one machine to pick up another machine’s workload in the event of a failure or shutdown, most e-commerce networks already have that built-in safeguard. Fox and Candea have worked together to develop a process they call recursive restartability, in which an automated network manager systematically goes through a network’s node tree, rebooting each branch as a form of preventive maintenance.

Lately, however, Candea has been looking at an even more sophisticated approach, one that gives a system its own ability to target and correct failing components. He calls it crash-only computing, and the strategy is to marry micro-rebooting with the increasingly popular diagnostic tactic known as fault injection. Candea has built a Java application server divided into two main components: management and monitoring. The monitoring side periodically sends queries into the software system and watches for any sign of bad data.

If the messages trigger an erroneous response, the monitors’ own components compare notes on the error path, generate a statistical estimate of the faulty component, and send a signal to the management component to perform a micro-reboot. According to a paper released last year, Candea’s self-monitoring Java server was able to increase system dependability by 78 percent while reducing service outages from 12 per hour to zero.

It’s at this point that a technology journalist must fight the urge to evoke biological metaphors, an urge all the more compelling because many programmers, IBM’s White included, consider experiments like Candea’s a first step toward autonomic computing systems that manage internal resources the same way the human body’s own autonomic nervous system regulates heart rate and breathing.

“First of all, I’m a real fan of ROC,” says White, referring to recovery-oriented computing. “It’s that notion of self that I think is the key idea of autonomic computing and the most revolutionary part.”

Candea, for one, is hesitant to invoke biological metaphors but notes that, for discussing overly complex systems, sometimes they are the only parallels available. Like the body’s own autonomic system, which operates independently of the conscious brain, his Java server works best when the monitoring component is strictly isolated from the management component. The same goes for all components. Without rigid functional boundaries, the software equivalent of cell membranes, it is almost impossible to tell which component is in need of a restart.

“It’s all about having isolation of what we in computer-speak call the fault domain,” says Candea.

Across the country, University of Virginia computer scientist David Evans has taken this notion of cellular segregation one step further. Three years ago, he and his colleagues developed a program that shows how a software network might function if limited to the same rules governing cellular interaction. In other words, modules communicate not by direct electronic query but in a fashion modeled on the physics of chemical diffusion. Signals move outward in a slow-moving spherical field, delivering information in variable doses.

While significantly slower than standard electronic communications, this diffusion strategy has one sizable advantage: When healthy components fail, the “signal” remains, leaving a distributed memory of its position and function, a memory the overall network can use to replace the damaged component.

To demonstrate survivability, Evans and his colleagues have taken a cue from biological evolution and programmed the individual modules to build and maintain an arbitrary three-dimensional superstructure — a sphere, for example. Once it is built, various modules are subjected to damaging data and flushed out of the system when they fail. The question then becomes whether the superstructure can rebuild the same shape with a fraction of its original components.

So far, Evans says, diffused signaling works like a charm: “We can survive damage to nearly all the cells as long as the structure is maintained through these types of interactions.”

Building cartoon spheres might seem a little frivolous, but Evans says the experiment has solid business-world roots. A security specialist, he says it was the creativity of Internet hackers that forced him to consider a more creative approach to network defense.

“The attackers have really taken advantage of the interconnectedness of the Internet,” he says. “Defenders haven’t.”

With self-healing software at the blastula stage of software evolution, it seems a bit premature to speak of full-scale autonomic computing. Even so, NASA, DARPA, IBM (which boasts a 3-year-old Autonomic Computing Division) and a growing number of research underwriters have taken an active interest in seeing what’s next. Evans’ sphere project is already supported by the National Science Foundation. This summer, Evans and university colleagues John Knight, Jack Davidson, Anh Nguyen-Tuong and Chenzi Wang will start a new project backed by DARPA’s Self-Regenerative Systems program. “[We'll] study approaches to system security inspired by biological diversity,” he says.

Whether that inspiration leads to outright mimicry remains to be seen. For the moment, says IBM’s White, terms like “self-healing software” and “autonomic computing” offer a convenient reference point for scientists eager to explore the next level of software complexity. Just as the sound barrier forced aircraft designers to radically revise aircraft and engine designs, so today’s complexity barrier is forcing computer scientists to rethink systems design or, at the very least, to seek out new sources of inspiration.

“Today’s systems have too many dials to watch; people can spend their whole lives figuring out how to make a database run well,” White says. “We want to stand this notion of systems management on its head. The system has to be able to set itself up. It has to optimize itself. It has to repair itself, and if something goes wrong, it has to know how to respond to external threats. If I can think about the system at that level, I’m using humans for what they’re good at, and I’m using the machines for what they’re good at. That’s the idea here.”

Continue Reading Close

Invasion of the spambots

From blog spam to pornbots, new strains of computer programs aimed at pumping up Google page ranks just keep on coming.

  • more
    • All Share Services

For Lawrence Kestenbaum, the realization that a new species of intelligent agent — or “bot” — was prowling the Internet first dawned about two years ago.

It was about that time, Kestenbaum says, that a series of “fluke” addresses started popping up in the HTTP referrer log of his personal Web site, the historical cemetery database Political Graveyard.

“If you’re at all concerned with how your Web site is being received, you’re almost compulsively checking the logs to see who’s coming in and from where,” says Kestenbaum, laying the scene. “You get to know what sites are linking to you. Anything new gets your attention.”

Even more attention-grabbing, Kestenbaum adds, was the fact that the fluke referrals came in bunches. Curious, Kestenbaum pasted in the URL and went to look. His disappointment was immediate. Expecting something interesting, he instead found a page filled with nothing but banner and pop up ads.

For a moment, Kestenbaum says, he suspected a glitch. How else could one explain a dozen or so Internet browsers flipping directly from a site boasting zero unpaid content to one documenting historical graveyards? It didn’t make sense.

“That’s when I had this ‘Aha’ moment,” says Kestenbaum. “I’d visited the site because of the very technique they’d used to advertise it. Somebody had taken the trouble to write a program that would plant strange links in referrer logs knowing that the people curious enough to check those logs would also be curious enough to follow the link.

Scary as it may seem, spam is evolving. The automated, Web-spidering technology that delivers bulk c1alis and vi@gra ads to your daily e-mail in box has mutated into a dozen variants, targeting everything from cellphones to blogs to instant messenger accounts. Feeding off the two divergent trends in online publishing — increased specialization of content and increased generalization in the use of basic software tools such as Google, AIM and Movable Type — many of these mutations no longer even demand your attention. In some cases, a place to hide in a chat room or forum is the only thing they need.

“There are tons of ways to monetize any type of traffic you can get,” notes Aaron Wall, author of “The SEO Book,” a newly published treatise on the art of “search-engine optimization” and other traffic-boosting techniques. “The indirect technique isn’t as noticed yet, because so many people are still fighting off the direct stuff,” Wall says.

So-called indirect techniques vary. Aside from referrer-log spam — the general term for what happened to Kestenbaum’s site in 2002 — there’s “blog spam” (using bots to post unsolicited HTTP links in the “comment” sections of blog listings), and chat-room spam. Recently, marketers have even resorted to targeting wiki sites such as Wikipedia, taking advantage of their anyone-can-edit policies.

“We’ve only been noticing it for six months,” says Tim Starling, an Australian Wikipedia contributor who has taken a leadership role in the site’s attempts to ward off the bot menace. “The bots will go through a site and spam every page. They’ll start with the smaller [non-English] language versions, which aren’t watched as closely. So it takes longer to pick them up.”

In each case, the goal isn’t so much to solicit a purchase or confirm receipt — the tactic of most e-mail spam capaigns — as to boost visibility. With more than a third of all Internet search queries now running through Google, site marketers have crafted their automated campaigns with an eye to Google’s PageRank algorithm, which factors the total number of incoming links to a site as a sign of relevance.

Although Google publishes clearly stated policies forbidding the use of “link farms,” — sites that manipulate link totals as a way to boost (and rent out) page ranks — the percentage of offenders dropped entirely from Google search listings is microscopically small.

That, says British SEO specialist Phil Craven, leaves plenty of room for other people to push the envelope.

“If a search engine like Google can make link text so important, then people are going to go out of their way to get link text,” says Craven. “So-called spamming is perfectly valid, if necessary.”

Such words are tempered by Craven’s own experience as a target of exotic spam. As manager of the SEO forum Web Workshop, Craven says he recently had to upgrade his site-registration system to ward off bots that had been masquerading as human guests in an effort to deposit links in the open forum and profile sections.

“Basically, the bot would come along and register five names at a time,” says Craven. “The names always began with a non-alphanumeric character and ended with a non-alphanumeric character, like a percentage symbol or an exclamation point.”

To stop the bot, Craven simply modified the registration process, forcing registrants to confirm their chosen username before getting the usual welcome e-mail. The trick worked only because the bot’s author, knowing that most users will run the program in default security mode, didn’t bother accounting for such a variation.

“I can do that because I’m a programmer,” Craven says. “A lot of forums don’t have programmers operating them and they simply wouldn’t be able to do it.”

Such modifications are similar in their simplicity to the now-common anti-spam technique of spelling out e-mail addresses using “at” and “dotcom.” The only thing keeping bot writers from anticipating the trick, Wall says, is the level of effort. Currently, bot writers and copiers find that there are enough newbie operators out there to serve as unwilling page-rank boosters.

“The main thing that’s driving specialization is whatever’s exploitable and easy,” Wall says. “Once it’s no longer exploitable and easy, people move on to something else.”

To get a glimpse of innovation in the bot world, the best place to look, as usual, is in the realm of adult entertainment.

“The adult industry will likely be married to spam and its attendant distribution methods long past the evolution of man into beings of pure energy,” jokes Domenic Merenda, vice president of business development for Edge Productions, a company that operates adult-media properties.

Merenda says his company doesn’t resort to spam but admits to having “rubbed elbows with the kingpins.” The experience has given him a chance to divide so-called porn bots into three major categories: lead-generation bots, URL-proliferator bots and address-harvesting bots.

Of the three categories, lead-generation programs tend to be the most sophisticated and most expensive. Unleashed on X- and R-rated chat-room logs, they run through transcripts, seeking out the names and addresses of the most active participants. Once acquired, these contacts become fodder for third-party vendors eager to advertise webcams, escort services and other variations on the adult-entertainment theme.

Aside from the obvious legal issues, such programs face a growing hurdle: Many of the most active participants in public chat-rooms nowadays are other bots masquerading as human users, often for commercial purposes.

To cut down on this practice, many chat-rooms now use CAPTCHA, an automated tool developed by computer scientists at Carnegie Mellon University. Short for “completely automated public Turing test to tell computers and humans apart,” CAPTCHA is the chat-room equivalent of an immune system T cell. It asks registrants to prove their non-bot status by identifying a randomly generated word. Instead of displaying the word as normal text, however, it displays it as a distorted image, usually with a patterned background, a format that can befuddle even the most sophisticated optical character recognition systems.

“We settled on something humans could do, but machines can’t,” says Luis von Ahn, a Carnegie Mellon grad student and CAPTCHA project member.

Like the helper T cell, however, CAPTCHA is far from perfect. In 2002, less than a year after the Carnegie Mellon group delivered a working prototype of the CAPTCHA system, programmers at the University of California were already claiming the ability to crack CAPTCHA-generated images in Yahoo’s e-mail account-registration system. Porn marketers, meanwhile, have recruited eager users to beat the system. To gain entry or special privileges on many sites, users identify CAPTCHA images piped in by bots currently attempting to register fresh accounts.

If such ploys seem slightly Darwinian, maybe that’s because the people charged with designing them see the Internet in survival-of-the-fittest terms.

When the referrer-log spam phenomenon first attracted attention two years ago, Francois Lane, owner of the Canadian marketing firm Mastodonte Communication, took credit for the outbreak while at the same time disavowing any sense of guilt.

“I’m not too worried about my reputation,” Lane wrote in response to blogger complaints. “Marketing is all about being innovative, different, adaptive, taking risks and knowing how to use the technology. I’m trying to be all that.”

Continue Reading Close

Everyone is an editor

In the wacky wiki world, a Web browser is all you need to start contributing. But when the goal is to create an encyclopedia, such democracy has some pitfalls.

  • more
    • All Share Services

Everyone is an editor

Like most frontier sheriffs, Wikipedia Arbitration Committee member Martin Harper wears his badge with a mixture of pride and caution.

A 24-year-old software engineer from Worcester, England, Harper knows what it’s like to be new. It was only two years ago, after all, that Harper, an immigrant fresh in from the Douglas Adams “Hitchhiker’s Guide to the Galaxy” online encyclopedia project, H2G2, encountered the scary freedom of wiki publishing — where pretty much anyone can add his or her own thoughts to a Web site, even if that means overwriting or “correcting” what someone has already written.

“I think, like most people, I came across the idea and thought, ‘This is madness,’” says Harper, looking back. “On [H2G2] you could have maybe five people editing an article. On Wikipedia you could have 50 people editing at once with no one person in control.”

Today, Harper is one of a select few working to impose a civilized order on what has become one of the Internet’s fastest growing boomtowns. Launched in January 2001 with barely a dozen articles, Wikipedia crossed the 500,000 articles mark in February, with posters contributing content in more than 30 languages and, by last measure, at a rate of 300,000 articles per year.

Needless to say, so much activity generates plenty of controversy and plenty of work for Harper and the nine other members of the Arbitration Committee. Whether that means throwing cold water on recurring editorial battles over Israel and Iraq or deciding whether a ban on offensive user names such as “Mr. Throbbing Monster Cock,” the disputes can vary from the mundane to the humorous to the truly informative all within the space of a single day.

“The hardest problems are always at the lowest level,” he says. “People being rude, people refusing to compromise. We have a guy whose skill is copy editing. However, unlike most copy editors, he’s quite stubborn and adamant about what’s proper for articles. He won’t budge and people have been complaining. After far too much discussion amongst the community, it was referred to us the second time. We’re trying to ease it. We can’t get rid of it.”

Such problems, Harper notes, are common to any site that embraces the wiki model. First coined in the mid-1990s by Portland, Ore., programmer Ward Cunningham, “wiki” is the technical name for a site that lets readers edit the published content in real time. Borrowed from a Hawaiian term for “very fast” (wiki wiki), the term dates back to Cunningham’s Wiki Wiki Web, an experimental offshoot of the Portland Pattern Repository that first offered readers an “edit this page” link in 1995.

“It was something that needed to exist,” says Cunningham, recalling his decision to invite a few dozen fellow programmers to test out the wiki feedback model. “I thought if [WikiWikiWeb] lasted six months, it would still be worth it.

Nine years later, the wiki model is flourishing, mostly in venues where publishers put a value on feedback and informational utility. The Apache Ant Project, for example, uses wikis to make sure readers can correct or improve user guides related to the open-source Apache Web server. Even Microsoft, a company for which Cunningham now works, has gotten in on the act, embedding a wiki page within its recently unveiled Channel 9 external weblog.

Of all the variants out there, however, few have attracted as much attention as Wikipedia. Originally a free-range alternative to Nupedia, a commercial online encyclopedia project of the late 1990s, the project has since become the world’s largest wiki with more than 1,200 regular contributors posting and revising content in more than 30 languages.

Jimmy Wales, co-founder of both Nupedia and the Wikipedia project, credits “The Cathedral and the Bazaar,” Eric Raymond’s online essay on the merits of decentralized software design, for prompting the experiment. Like Cunningham before him, Wales says he saw his venture into wiki as a temporary thing, an experiment that needed to be tried if only to satisfy his own “what if?” curiosity.

“Nupedia was very top down,” says Wales. “We recruited academics to write the articles. We had a peer review board. After nearly two years of work and an enormous amount of money, I think we had 12 articles to show for it. Wikipedia was totally different. With the wiki software, bam, things just took off. It very quickly became the project.”

Inviting the community to participate in such a project has its risks, of course. Thanks to developments outside the Wikipedia community, the project has seen its media profile surge in recent weeks. The episode started when anti-Semitic Internet users pulled off the agitprop technique known as “Google-bombing” — repeatedly linking the word “Jew” on Web pages to the Web site JewWatch.com, a site that bills itself as “Keeping a Close Watch on Jewish Communities & Organizations Worldwide.” When the tactic propelled JewWatch to the top of Google search rankings, outraged bloggers, led by the site Remove JewWatch.com, responded by linking the word “Jew” to the Wikipedia entry.

The resulting flood of visitors has been both a positive and a negative. Izak, a pseudonymous Wikipedia contributor on topics related to Jewish history, has more than once seen 5,000 words of his editorial work replaced by a single one-paragraph anti-Semitic screed.

“Every few days somebody comes in and vandalizes the site,” he says. “So many people are watching the page, though, that it doesn’t take long before some admin comes in to fix the page.”

As one of those admins, Harper describes Wikipedia’s vandalism policy as fairly easy to enforce. Most vandals get a two-strikes allowance. On the third offense, administrators block the offending poster’s I.P. address, preventing them from accessing the site. Though some find a new way back in, taunting the admins as they do so, most casual vandals get bored and find other places to ply their hatred.

Wales, who inaugurated this “three strikes” policy during the days when his role as Wikipedia’s co-creator put him in the self-described role of “god king,” sees it as a cornerstone of the site’s overall “soft security” policy. The policy is, in many ways, a Darwinian response to the pressures that undermine most open Internet communities. Instead of courting controversy, Wikipedia’s culture has evolved an almost religious aversion to it.

“We talk about ‘wiki love,’” says Wales. “We say, hey, if you think this is Usenet and you’re supposed to flame people you’re really out of line. We really don’t approve of that as a community.”

A key tenet of “wiki love” is a devotion to NPOV, Wikipedian for “neutral point of view.” Articles don’t have to be perfect, but they should be free of bias. As an example, Wales cites the 2000 U.S. presidential election. “Two people who disagree vehemently about whether or not it was a fair outcome can at least agree with the description that there was a controversy.”

All wikis run the risk of vandalism. Not all wikis have been bold enough to adapt a neutral content policy. Such distinctions, notes Sunir Shah, a University of Toronto computer scientist who contributes to both Wikipedia and his own wiki project, MeatballWiki, make Wikipedia something of a rogue variant in the wiki world.

“They’re not interested in having discussions and learning in a dialectic kind of way,” argues Shah. “Their goal is to build an encyclopedia, and that changes everything. They don’t want to have opinions and they want everything to look appropriate, which means they have to spend a lot of extra time going after vandalism and trolling.”

Offering MeatballWiki as a counterexample, Shah says most traditional wikis evolve along the lines of a dialectic or Talmudic discussion. Readers respond to but rarely overwrite previous’ authors comments, leaving room for future readers to follow the conversational evolution. In such a scenario, opinion is more than valued: It’s practically necessary to keep the conversation moving.

“At MeatballWiki we are kind of happy dealing with the social problems,” Shah says. “We have this saying that Meatball will be around in 50 years, so why worry. We can come to a better answer over time.”

Harper, who also contributes to MeatballWiki, shares the rogue variant view. Because of its encyclopedic ambitions, Wikipedia has had to adopt new levels of management and security — log-in names, I.P. address blocks, arbitration and deletion committees — that most wikis never have to worry about.

“If anything I would say the wiki is more suited to those smaller-scale projects, he says. “As wikis get larger you run into the problem of troublesome users. You can’t manage it like the small group where you say, ‘We’re not going to invite you down to the pub anymore.’”

Wales, on the other hand, sees that level of familiarity operating at the editorial level, where most people who groom the site and have taken on voluntary management tasks have been around long enough to know the major players. Like other scalable open development projects, Linux most notably, Wikipedia has succeeded in passing on its internal cultural values to newcomers encouraged by the project’s overall ambitions. To further fuel that ambition and underwrite costs, Wales says he is already talking with some of the larger search engine players about licensing specific portions of the Wikipedia knowledge base and is talking with a publisher about putting out an official 1.0 version.

A few kinks have to be worked out between now and then, of course. With no formal Q-and-A mechanism, Wikipedia would have to ship its 1.0 version free of guarantees. Readers hoping to catch up on the history of World War I might stumble onto a porn star biography or vice versa. Supposing project leaders did take the time to download and vet Wikipedia content, releasing it on a static format such as CD-ROM, a new question emerges: Is a static version of Wikipedia still Wikipedia? In the Schrodinger’s cat paradox of wiki publishing — where the only way to verify an article’s quality is to keep checking it — never knowing what you’re going to find is half the fun.

Despite such complicating factors, Wales is optimistic. A fundraising campaign on the project’s third anniversary drew $50,000, more than double the $20,000 target, and Wales says he is currently saving the reserve funds for servers and other future project needs.

“From the beginning, we’ve never known how it was going to scale,” he says, acknowledging the doubts. “There was always that concern as to, if things got too much bigger, would it all just degenerate into garbage?”

Three years later, the concerns are still there. The only difference, of course, is that 1,200 people instead of a dozen people now have a stake in seeing Wikipedia succeed. Reflecting on the project’s continued growth, Wales laughs.

“I can’t believe that it works, but it works,” he says.

Editor’s note: This story has been corrected since its original publication.

Continue Reading Close

Page 1 of 3 in Sam Williams