Google: We're down with ODP

Will the streamlined search engine's decision to mix in the 20,000 editors of the Open Directory Project mess with its mojo?

Published March 24, 2000 5:00PM (EST)

If you've never heard of Google, check your monitor: You may be in sleep mode. Having racked up a devoted user base, a pile of money and a fistful of industry awards, Google has emerged as the search engine of choice for the results oriented and portal intolerant. Now, working in tandem with the
Open Directory Project, the company is moving to broaden its base by introducing a hybrid search strategy -- mixing smart-missile accuracy with the ODP's massive team of human editors.

Google co-founders Larry Page and Sergey Brin, both in their mid-20s, started the company in 1998 after three years of graduate research at Stanford University. Page and Brin quickly pulled in Sun Microsystems co-founder Andy Bechtolsheim as an investor. Stanford has also put money into Google, and last year's $25 million round of equity funding, led by venture capital powerhouses Sequoia Capital and Kleiner Perkins Caufield & Byers, has deepened the company's rosy glow.

One reason for this torrent of cash is that Google works. It's fast and accurate, with an uncanny ability to put the thing you most wanted to find directly under your nose. The technology that makes this happen is equal parts rocket science and peer review. Google's hypertext-based system for ranking search results uses a mathematical algorithm to rate Web sites based on the number of other sites linking to them, then factors in how heavily linked those sites are. The result is a form of objectivity that springs directly from the Internet community, translating its distributed judgments into a quick, precise read of what matters and what you can do without.

Google's latest initiative is the integration of its own search technology with the Open Directory Project, the Web's largest human-edited directory. Melding such disparate tools into a single search service is hardly a no-brainer -- particularly when, like Google, you enjoy the kind of devoted following that recoils at the slightest interface tweak. Add to that the sometimes uneven editorial quality of the all-volunteer ODP, toss in the chance that both search results and user experience might suffer in the wake of the integration, and you could anticipate an upset stomach or two at the company's Mountain View offices.

But while Georges Harik, the software engineer (and "director of directories") who led the integration project, had been up until 7 a.m. the night before the March 16 launch, he was all smiles -- and with good reason. As aficionados of Google's no-nonsense interface were relieved to discover, the company has managed to roll in the new functionality with hardly a ripple. It's still as fast as ever. The difference is that there are now targeted directory entries among the search results, providing both intelligible context and lateral, topic-based browsing, with your results as point of departure. If you search on "Eric Raymond," for example, you get links to sites associated with the open-source advocate, plus a selection of relevant directory categories, including "Computers > Open Source > Advocacy."

As Harik explains it, the decision to go with Open Directory -- just one player in a space that includes Yahoo, LookSmart and Excite, among others -- had two main drivers. One was the licensing -- it's pretty hard to argue with "free," whether you're talking software, beer or content. The ODP's evolving editorial culture was also a plus, said Harik. "We like the way submissions are made to the Open Directory, and we think it has the potential to be more accurate and more timely than other directories. The people who contribute to it care about what they're doing." The truly decisive point, though, was the Open Directory's potential to scale in parallel to the Web's hypercharged expansion.

That potential scalability derives from the "open content" aspect of the ODP. The premise is familiar to anyone who has followed the phenomenal rise of the Linux operating system: A highly motivated, globally distributed community of contributors, each with particular talents and expertise, can out-code and out-debug any corporate engineering team on Earth. The trick, though, is to evaluate and integrate those parallel, semi-anarchic efforts.

Different "open" projects take different approaches, from the benign despotism of Linus Torvalds (who, as he puts it, personally "sprinkles holy penguin pee" over each new Linux release) to the barely moderated laissez faire of the discussion site Slashdot. While it's clear that "to many eyes, all bugs are shallow," it's not so obvious how to keep new bugs out of each new mix. And if that's true of software, where "Does it work?" provides an unbending benchmark, it's doubly true of the Open Directory's attempt to map and rank the online universe.

But if Harik was concerned that adding the Open Directory to the mix might disperse Google's mojo somehow, he didn't show it. He neatly elided any unspoken reservations about the ODP's editorial limitations, focusing on synergy instead: "We have confidence in the combination of their editorial process and things we can do with PageRank and automated methods over the long term to make the directory better."

In fact, the biggest obstacle Harik sees for the company has nothing to do with the ODP at all. One of Google's most powerful advantages -- and a pillar of its claim to objectivity -- is its link-based page ranking system, PageRank. But a user study demonstrated how poorly some users understand how Google's search engine works -- and gave an inkling of how entrenched online cynicism has become. "We asked people, 'Why do you think we're giving you this site first?' And they said, 'Well, they probably pay you more than the other sites do.' This was really shocking to us, that people would assume we were getting paid to display certain search results."

Shocking or not, the popular assumption that search engines are agents of the marketing devil is now canonical. The ability to act instantly on online information has inspired many search portals to treat their results pages like slot machines, with every search button clicked presenting a new opportunity to dangle "buying decisions" in front of their short-fused, impulse-driven users. And why not? The Internet Gold Rush is upon us, and if a search engine is at least as honest as a Las Vegas casino, we consider ourselves lucky. If the roulette wheel turns out to be rigged, well, what did we expect?

For a company like Google, this world view presents a problem. The company is well funded, but for the most part that money goes into research and engineering, not marketing, so opportunities to counter e-commerce FUD are few and far between. Besides, the reflexive belief, (first formulated by Freud) that "denial is avowal" makes "I am not a crook" (first formulated by Nixon) a bad marketing tack. What's an honest search engine to do?

The answer: Emphasize reality, and hope people notice. The reality is that Google, while clearly looking to make a buck, or even several, has its roots in the world of academic research. Besides its genealogical link to Stanford Research, the company boasts a research group of its own, despite numbering less than a hundred employees. "The research group develops the core technologies that we'll be using a year from now," says Google CEO Brin. "It's three people, going on four. Aside from that, we have about 15 Ph.D.s on our engineering staff."

The research group is all the more vital because of the company's focus on search technology, as distinct from the marketing-driven portal plays that now pepper the Web. "It's critical to start developing the next generation technologies. AltaVista would never have existed if it weren't for DEC having SRC [Systems Research Center] and WRL [Western Research Laboratory], the two research centers in Palo Alto. And IBM has certainly been doing well based on their long-term research. So for us, providing such a core technology-based service, research is critical."

How does this research and technology focus square with the Open Directory initiative? Brin sees Google's technology and the ODP's directory as complementary approaches that allow people to leverage each system's respective strengths. He describes a scenario in which the user targets the initial search as much as possible, then uses the directory links included in the results to navigate outward from there, like a paratrooper dropped behind enemy lines to run reconnaissance.

For the same reason, Brin isn't overly concerned that the Open Directory -- begun as a grass-roots project staffed entirely by volunteers with a passion for some corner of the world's knowledge -- be globally comprehensive: The point isn't to navigate down through the hierarchy from the top. "If you look locally on one of these maps, you'll find it's usefully accurate. That's the important thing for a directory: how things work locally. Because there are other ways" (read: search) "to get close to where you want to go."

ODP: The power of people

Chris Tolles may be a marketer, but he's no hired flack. As one of the co-founders of the Open Directory Project, the ODP's marketing director has been around long enough to remember the June 1998 Slashdot posting that described "GnuHoo," the directory's original moniker, as "an interesting experiment which might work." Unfortunately, the name didn't work for Slashdotters, who found the allusion to the Free Software Foundation's GNU Public License galling, especially since GnuHoo's code and content were then proprietary. The "GnuHoo Booboo" -- and the ensuing flamefest -- led to a homophonous rechristening as "NewHoo." The new name lasted until the directory's December 1998 acquisition by Netscape, which dubbed the non-commercial, all-volunteer project (which had since shed its proprietary shackles) the Netscape Open Directory, soon to be known to open content enthusiasts by the initials ODP.

The ODP has come a long way since the flame wars of yore. The project's Web site proudly proclaims it is "the largest human-edited directory of the Web" -- not too shabby, given that the competition includes semi-namesake Yahoo. The ODP's 20,000-plus editors have covered more than 1.5 million sites in a quarter-million distinct categories, and show no sign of slowing down.

That rapid pace -- and the enormous team's potential to scale as the Web scales -- is one of the main reasons Google chose to work with the Open Directory. But when you talk to Tolles about the directory's future, he speaks less about the imperative to ramp up and more about the need to preserve the character of the editorial community that has grown up around the ODP. He's glad to talk about the technology the team uses: editing tools have recently been localized for Italian, French, German and Japanese as well as English, and other infrastructure improvements have made the editors' job much easier. But his voice shows real passion when he speaks of the need to defend his editors' autonomy and the community they've created.

Tolles takes the project's self-description as a "self-regulating republic" very seriously. "We've turned over control of the community to the editors themselves," he says, including everything from editorial decisions to the admission (or exclusion) of new editors. They'll be "more selective going forward," says Tolles, partly to preserve the working relationships that now exist. "When you sign up to be an editor, you're applying for citizenship. You may get it, you may not." He cites the role of precedent in guiding the group's self-governance, describing the directory's administrative model as "more open yet controlled than any other."

Underscoring the need for control, Tolles points to "the signal-to-noise ratio on Usenet" as an example of how bad ungated mass participation can be. "We believe that keeping the voice of our editors ... is one of our most important responsibilities. We have a goose that lays golden eggs here. The goose eggs are great, but the goose is the real miracle."

But as the ODP becomes more influential -- it's used by such major players as America Online (duh -- AOL owns Netscape), Netscape (duh redux), Lycos and now Google -- it's hard not to wonder whether this concentration of influence is altogether consistent with the notion of "open content." How does the ODP solicit broader input as to what will appear in what's fast becoming the Web's default directory? Nonplussed by the question, Tolles notes the small, closed teams at competitors like Yahoo and LookSmart, then points to his 20,000-plus editors: "How open is the ODP? Roughly 220 times more open than anyone else."

The Open Directory's influence isn't lost on Tolles, however -- including its power to ignore. "Implicit in our directory is that we exclude sites as well. You're also creating a list of things that aren't on your list." And there are two groups other than the editors whose views do command attention: big licensees ("We take their input seriously") and the press. In fact, he considers those licensees a benchmark of the project's editorial success: "As long as we're getting substantial players who choose us, I think we're headed in the right direction."

Despite the more selective approach, Tolles views the ODP as a model of inclusion. "We've made this a more participatory project than any other open source effort. How many people participate in the actual building of Linux? At most, it's a thousand people. We have 5,000 people who regularly participate, and 22,000 who are signed up and eligible. Look at Linux, look at Apache, and I will guarantee you that less people worked on the major distributions -- the thing that you can easily download -- than on the Open Directory. Can anybody come and demand to be an editor? No. But it's more open than anything else out there."

While understanding why people think search sites are all about filthy lucre, Tolles sees directories and relevance technologies as an antidote. "All the major players in the search space are going to a similar model: You have paid listings; then there's a directory, either LookSmart, Yahoo or ourselves; below that, you have Web crawler listings, usually from Inktomi, Google or Direct Hit.

"The move now is toward things like Google and Direct Hit, to carve away that top layer of crust. A whole industry has grown up doing search engine placement, tweaking sites to get a better ranking. Google is able to route around that based on their technology. Likewise, a directory is a way of avoiding those play-for-pay things -- at least if you don't play for pay in your directory. Using relevance technology is very important, and will be increasingly important down the line. With the rise of directories, and with things like Google, you're going to have much better search results."

Tolles is clearly used to taking the long view: Phrases like "50 years from now" roll off his tongue without hesitation. "We're conscious that we're building something that will outlast us," he says. Eventually, he envisions the ODP taxonomy "as a platform for other products and services" -- including technologies like Google's. For now, though, the key goals are simple: to scale with the Web, to be useful and to be as impartial as possible. "Will we create something that's of substantial use and scalable?" he asks. "If so, we'll have done the right thing."


By Mark Durham

Mark Durham is editor in chief of sendmail.net.

MORE FROM Mark Durham


Related Topics ------------------------------------------

Google