Visiting Google's Mountain View, Calif., headquarters is like taking a time machine to happier, fatter times. Think Silicon Valley in 1999, at the height of the dot-com boom.
The time warp starts before you enter the building. Part of the office's parking lot is cordoned off for twice-weekly rollerblade hockey games, complete with Google-supplied team jerseys. The lobby is decorated with a merry array of lava lamps and a piano at which stressed-out coders can knock off a few bars to decompress. At the company "cafe" -- never called a cafeteria -- employees eat free gourmet lunches styled by the Grateful Dead's former chef. And let's not forget the on-site masseuses.
There's a pool table, a Ping-Pong table, a Foosball table and a gym. And behind a love-bead curtain in the women's locker room is a sauna. It's a workaholic Californian approach to luxury -- bring the spa to work! -- although a company spokeswoman confides that she's never seen anyone actually use the sauna.
A goofy poster in the lobby has snapshots of employees using their entire bodies to spell out the words "Google World's Best Search Engine." They're all smiling, and it's not hard to see why. While others in Silicon Valley are grousing bitterly as they sip pink-colored drinks at increasingly pathetic pink-slip parties and while other search engines are desperately selling out their links to the highest bidder, Google is in the plum spot of providing search capabilities to Yahoo. All told, Google currently gets 100 million search queries a day.
Monika Henzinger, 35, director of research at Google, is driving the technical research to make those searches better. With her team of 10 computer scientists -- all men -- the German-born Ph.D. works on improving Google's search functionality and moving Google into new areas such as mobile phone and voice-activated searching. Over lunch at the Google cafe, she told us where the science of Web searching is headed.
How does Google search now?
Google goes out to the Web and collects Web pages. Then we build an internal representation of them, and when a user types in a query this internal representation is used to quickly find the documents that contain these words. We have more than 1.3 billion in our index, which we completely update every 28 days.
For each word, we store all the documents that contain it. So, when you type in the query terms, we can just go to the words, and do an intersection of the lists -- find the documents that contain all these words. We've pre-done the search for each original term alone and stored all the answers. So if you type in the word "car" and the word "repair," we search the list for the word "repair" and the word "car." And we will input documents that contain both of them. Then we have to order them. But we don't only return those documents, we also return documents where other people point -- have a hyperlink -- to this page.
How does Google figure out the order?
A very important criterion is how many people link to you and how many people link to them. It's a recursive definition where your "quality" depends on the quality of links that point to you.
To order them, we look at the link structure. When we build the internal representation, we assign a number to each document that depends on its link structure. We then use this information to order the documents. The page-rank measure is based on the whole Web structure.
The results are ordered by a combination of what we think the quality is and also the query terms. Basically, do we think by looking at the document that it's on this topic? That's also what lots of our future work concentrates on: trying to understand better what documents are about, and also trying to understand better what the user queries are about. The problem is that most user queries are very short -- two or three words -- so it's hard to figure out what they mean, even if you're a human being. Did you see the queries in the lobby?
Yes. [In Google's lobby, a constantly scrolling list of queries projected on the wall behind the front desk shows what visitors to Google are searching for. A random sample when I visited: Chadwicks of Boston, Olympics archery, Ph.D. salary survey, upright scaffolds, World Cup luge.]
That's a filtered version, except that the filter doesn't work well in other languages. So we had people here from BMW, and they told me that there were some German queries that got through that shouldn't have.
[Note to self: Curse on Google only in foreign tongues.]
What can you do beyond just using the keywords to give the users what they want?
You can look at the distribution of keywords in the document. You can look at the distribution of other words on the page. You can look at words on similar topics on the page. You can look at words that other people use to point to this page, and how related they are to the keywords -- things like that.
What's the toughest part of improving searching?
I think the hardest issue is determining what the user really wants, figuring out, when someone types in "car," whether he wants used cars. Does he want the Kelley Blue Book? Or does he want to buy a car? Understanding better what users want -- that's the hardest challenge.
When a query is a little bit more specific -- take, for example, "car repair Palo Alto" -- then we can say, OK now, we sort of understand. But we're still not 100 percent sure. Does he just want different car repair places? Or does he want the one closest to his house?
We do know that we should make sure not to return a page that's a report about a trip to California and then they had to have their car repaired in Palo Alto.
You can try to return documents that are specifically on this topic. We're developing more sophisticated techniques to return documents that might not mention the query words, but are [still relevant to] the topic. We're getting away from just pure word matches and getting more into topics.
But one also has to be a little careful there because the more sophisticated users like having complete control, while the more naive users like having the system help as much as possible.
We can't completely rewrite the query into something that we think is more appropriate, because, you know, people like my husband would get crazy. He just wants to find pages that have his words.
So you have to strike a balance.
People have been trained for a long time now by search engines [to expect] that if they type in search terms they'll get documents that contain those search terms. Now, if you start doing something better for them or that's different to them, then you better explain [why] it's worth it.
What other kinds of search are you developing?
We have a voice-search project with BMW -- BMW wants to put voice search into their 7 Series cars. They want to put microphones in the cars -- you can just speak whatever your search is and then it gives you answers back on a display. Then you just say the result number and the search jumps to that result.
And then you crash?
It might only do the search when the car is stopped or something like that. They don't know yet when they would enable it; right now it's just, Can we do it at all?
And the other push is languages. The vision is that no matter what language your query is in, and no matter what language the document is in, we should [be able to] find the document for you and translate it for you. We have a translation service that just started, but it only translates another language into English. So currently you can translate German into English or English into German, but you cannot translate German into French.
When I do German searches, it really amazes me how limited the German Web is. If you search for medical information in German, you get barely anything, whereas in English you get a wealth of information.
First of all there are fewer [German] people, and the scientific language is English. And then there are not as many Web sites in Germany. Like here, for pregnancy, there is ParentSoup and whatever else, but you just don't have that in Germany -- there might be one Web site if you're lucky. So this is actually more of a service for our international users, so that they can understand the English pages, and get much more information. Of our 100 million queries a day, half are in English and half are in other languages.
I think for Americans, the most interesting use will be translating news articles. Because I think it will be interesting to see what other countries write about the U.S. But the service still has to be improved.
How did you integrate DejaNews, the archive of Usenet postings?
We had Google engineers working very, very hard to get the service out by the time that Deja said they would shut down, in February.
At first it was not clear when Deja actually would shut down. Suddenly it was like, "Now it's going to happen next week," so we had to go live with whatever we had. We did not want the service to be down at all, so we decided to go live and then gradually improve it. You couldn't post initially, but now people can post again. We had to rewrite all the code.
Will Google ever be able to search for things like video and audio on the Web, or is that too hard?
If people know the name of the picture or of the video, then, yes, we can find it. But if people just say, "There was this video of a woman on the beach running into the water," that's hard. The image-understanding technology is not so far [along] yet.
This is far-fetched, but could you combine face-recognition technology with search technology and then search for images of a person based on the measurements of his or her face?
If the research could [be done], we could plug something like that into Google. There are no fundamental restrictions, except that technology is not there yet; the research community can't reliably do that yet.
If I give the system five pictures with your name, and then I get a different picture that shows you from a different angle, it's just very hard to say that it is you as opposed to this other person. So detecting a face can be done, but recognizing that something is a specific face is very hard.
Are you aware that, at the Super Bowl, pictures were taken of everyone's face as they walked in and then compared to a database of known criminals?
Probably what the image-matching technology did there was pull out a few that were [match] candidates for the pictures of the people they took, and then have a human being compare the faces. On the Web we're talking about completely automated, right?
You're right. They had a human check them. How do you feel about the attention you get for being a woman in a field that's mostly men? All the people you manage are men, and so on.
In general, I don't think it's a big deal. Silicon Valley, especially, is very performance driven. It doesn't matter what country you're from, it doesn't matter what gender you are; as long as you perform, it's fine. And if you don't perform, then there's a problem. I think everybody gets evaluated according to the same standard.
When you read about all these Internet companies going up in flames, it must be odd to be working for one of the few that's still going forward.
Being at Google is great. It's kind of sad that there was first this euphoria and now there's the complete opposite -- doom and gloom.
The irony is that there are more Web pages.
And the usage just goes up.