To listen to the raves, Fred's Coffee Shop serves a mean weekend breakfast. The omelets at this little joint just a skip across the Golden Gate Bridge from San Francisco are said to be "fluffy beyond belief," the bacon "thicker" and "tastier" than at anyplace else, and the French toast -- oh, the French toast, cooked up soft and then deep-fried and slathered in sugar -- will help you find Jesus. Fred's, according to reviews posted on the popular local-ratings site Yelp, makes the best breakfast in Sausalito. Thirty-eight reviewers give it an average score of 4.5 stars -- a number that really stuck in my craw as I gulped down limp slabs of two-star French toast, sipped at one-star coffee, and took in the ordinary two-star ambience.
We live in an age under constant review. We rate -- and consult other ratings of -- local businesses on Yelp, movies on Netflix, music on iTunes, hotels on Tripadvisor, and books and everything else on Amazon, the whole world surveyed and sliced-up on a five-star scale. The ratings are often enormously helpful; you might even argue they've revolutionized the very nature of shopping and the culture itself, bringing something like empirical precision to a marketplace otherwise suffused in P.R. and spin. Yelp will point you to the small produce shop with better and cheaper apples than the giant supermarket nearby, and Amazon shows you that a $21 chef's knife is every bit as good as a $125 model.
But sometimes, as my experience at Fred's shows, reviews and star ratings fail to capture and convey something essential -- something essentially bad -- about a product or place. Why? Researchers looking into the question have lately come upon some intriguing ideas about why ratings don't capture the true measure of a product. Their findings suggest that the five-star scale may be an outmoded way of representing consumer sentiment; and now a few Web companies are designing ways to make ratings much more useful.
Online ratings are beset by one main flaw, something pollsters call "response bias." Because people are more likely to rate products that have moved them in some way -- either positively or negatively -- ratings for most items brim with extreme opinions. On Yelp everyone is above average; company CEO Jeremy Stoppelman told me that 85 percent of local businesses on the site get a three-star or better average rating. Stoppelman's explanation for this -- truly bad businesses don't stay open very long -- makes sense, but it turns out that Amazon, too, attracts a lot of happy people. Last year, three information systems analysts, Nan Hu, Paul Pavlou and Jennifer Zhang, studied the ratings of 230,000 books, almost 300,000 DVDs and 60,000 videos for sale at the site. The average in each category was over four stars -- which makes sense only if you believe Hollywood is making mostly great movies. When the researchers mapped the reviews on a graph, they found J-shaped curves -- there were a lot of one-star ratings, very few twos and threes, a whole lot of fours, and a king's ransom of fives.
To see how an Amazon star-rating compares to society's "true" opinion, Hu, Pavlou and Zhang conducted their own survey of one product, singer-songwriter Jason Mraz's 2005 album, "Mr. A-Z." In a survey of 66 college students, about two-thirds gave the album three or four stars. There were also a bunch of twos, some ones, and very few fives. On Amazon the picture is completely different. More than half of reviewers judge "Mr. A-Z" a five-star CD, while there are only a small number of threes, twos and ones.
Pavlou explains the lovefest by citing a specific kind of response bias, what he calls "purchasing bias." In order to review something, you must have already purchased it. But people buy stuff they think they're going to like -- that's why they buy stuff. If you cringe every time you hear the overplayed Jason Mraz track "The Remedy," you won't buy "Mr. A-Z," and so your probably negative opinion will go unrecorded by Amazon. Purchasing bias, Pavlou points out, is related to the price of a product; a higher price reduces the probability that someone who is unlikely to enjoy a product will buy it and review it anyway. Think about it this way: If the Jason Mraz album was $200 rather than $11, then only die-hard fans would buy it and rate it, skewing its average review higher. Purchasing bias thus suggests some helpful advice when you're looking at online ratings: The more expensive a product, Pavlou says, the more you should discount its high reviews.
The market for reviews is thriving, with a host of start-ups pinning their fortunes on showing customers what people really think about a product. One of the most innovative is a firm called Summize, which has developed a clever way to highlight how response bias affects any product's reviews. Summize crawls the Web for product ratings -- currently, it scans only Amazon, but soon it will be looking at other sites and blogs -- and then it translates those ratings into a "heat map" rather than a five-star scale. The heat map for "Mr. A-Z" looks like this: showSnip("product","jason mraz mr. a-z",120,9);. -- a small colored bar that goes from red on the left to green on the right, showing the full breakdown of one- to five-star ratings. It tells you whether only true fans are rating the product, or if you might like it too.
The heat map doesn't eliminate response bias, but it lets you take it into account when making decisions, according to Summize's founders. Say you're looking for a KVM switch -- a little doohickey that lets you use one keyboard, mouse and monitor for multiple computers. If you search on Amazon for "KVM switch," you see, right up there at the top, the IOGEAR MiniView Micro. It's just 23 bucks, and a 113 customers give it an average rating of four stars. Seems like something you don't even have to think about -- cheap, good, sold.
Look up the same product on Summize, though, and you see the clear split in the reviews: showSnip("product","iogear kvm switch",120,9);. There are lots of fives and fours, but also a large number of ones. And when you dig down into these one-star ratings, you find something interesting. Several people complain that the switch doesn't seem to work with IBM ThinkPads, Toshibas or Macs -- a flaw completely obscured by the four-star rating. In this way, says Greg Pass, Summize's CTO, the site helps you find "the wisdom in the crowd," rather than the collective "wisdom of the crowd."
Amazon displays an average rating at the top of each of its product pages, but it has recently added a small distribution map low on the page, near the reviews of each product; the map shows how many ratings of each star-category a product has received. Jeremy Stoppelman says that Yelp has also considered adding this feature. But Stoppelman and others at Yelp also have another bit of advice about star ratings -- that it's wise to look past them and to judge a product or a place according to the people reviewing it, not how many stars it gets. It's the people, not the stars, who shine on Yelp; you can find some wonderful writers here, sharp and funny and full of firm opinions about the little diners, hideaway bodegas, fun candy stores, and mean shopkeepers who populate urban life. The best part is that you can follow them just like on any other social-networking site -- see reviewers' links to other people, their opinions on other places, a trove of personal data that gives you some sense of whether someone else's opinion will match yours.
Yelp has completely altered the fortunes of some local businesses. I called up Pat Ryan, the owner of San Francisco's Pat Ryan Moving and Storage, which has a five-star Yelp rating, better than just about any other business in the city. He told me that Yelp doubled his business during the past year -- he's had more than 200 customers come to him through the site. Yelp, he added, has kept him vigilant about customer service; because every one of his customers can review his company, Ryan has gone to great lengths to keep people happy. He gets enough calls to put out six moving trucks on the weekend, but he usually sends out three -- he turns people away, he says, because if he gets stretched too thin, his service, and thus his Yelp reviews, might fall.
Other businesses, though, report a less salutary experience. Fior D'italia Restaurant, in North Beach, is the oldest Italian restaurant in the country and a pretty popular place besides. On Yelp, though, it rates 2.5 stars. Chris Ritchie, the manager, told me that he's looked at the rating, and he shakes his head at how disconnected Yelpers are from his customers. People on Yelp think his food is "bland," it's "disgusting," it just plain "sucks." (One person, though, calls it "the best Italian food I've ever had.") But Ritchie says that his best customers are tourists, and though they love Fior D'italia Restaurant, they're not taking the time to post their reviews because Yelp isn't important to them. In other words, he blames response bias.
I've never been to Fior D'italia Restaurant, so don't know what to think. The thing is, though, the Yelp rating put me off. And I'm put off for the same reason I was attracted to Fred's. In a world of dizzying choice, where there are more movies, books, songs, shops -- more blasted things of all kinds -- we need other people to help us out. You may understand, intuitively, that a five-star rating doesn't guarantee happiness; but what other guidance do you have? Last year, inspired by Yelp, I hired Pat Ryan when moving from my apartment. His men did a terrific job. Sometimes, five stars is really that.