Let's Get This Straight: Reach for the hits

Why is it so hard to find a valid yardstick for measuring Web traffic?

By Scott Rosenberg

Published February 5, 1999 8:00PM (EST)

When Yahoo bought GeoCities last week in a stock swap valued at $3.5 billion or so (depending on when you checked the companies’ gyrating stock prices), one big force driving the deal was Yahoo’s yen to top “the ratings.” Yes, like its idiot-box forebear, the Web now has ratings — and most often today, Web ratings means Media Metrix’s monthly “reach” stats.

Thus Yahoo’s press release boasted, “With the addition of GeoCities to the Yahoo! network, the companies expect their combined unduplicated home/work reach to exceed 58 percent, which would make it the second largest network of properties on the Web.” (AOL/Netscape is No. 1.) News reports dutifully repeated this claim.

But what is this thing called reach? It’s a measure of the percentage of the Web-surfing population who alight on a company’s pages at least once in a given time period (usually a month). There are all sorts of questions and problems surrounding reach, and the specifics of Media Metrix’s approach to measuring Web traffic. But that hasn’t stopped the media and industry analysts from beginning to adopt it as a standard.

Of course, in the novel, hype-driven world of Web publishing, we need standards. Vast sums of money — both stock-generated “play money” and real cash — are changing hands based on perceptions of how much traffic sites are garnering. So it’s no wonder people have begun to grab the nearest yardstick, however bent it may be.

Still, to judge the success of Web businesses by reach is to continue to view the Web by standards more suited to TV or radio. Not only is it strategically unwise for businesspeople and investors; it also risks crippling the still-nascent Web by creating incentives for site operators to favor one-time visitors and stray traffic over loyal customers.

To understand why, let’s review the history of Web traffic measurement from the medium’s genesis.

In the beginning was the hit. And the hit was with the server, and the hit was … whatever you wanted it to be.

You probably remember ebullient Web hucksters touting astronomical “hit
counts” circa 1994 and 1995, as in, “We’re getting millions of hits a day!”
Gullible TV and newspaper reporters who knew nothing about the Web would
say, “What do you mean by ‘hits’?” and, since the answer was too
technical, they’d often just equate hits with visitors — contributing to
enormously inflated public notions of Web usage in those early days.

As most Web users and all honest Web businesspeople now understand, a
hit is a number recorded in the log files generated by Web server
programs (which send pages to your computer). A hit gets generated for
every file a Web server sends out — and the typical Web page
includes anywhere from a handful to dozens of separate files (generally,
each image on the page is another file, and often what looks like one image
on a page is actually a mosaic of several files). So hit counts, though
valuable in setting technical benchmarks for server performance, are
useless as any kind of realistic measure of Web traffic.

From hits we graduated to the far more useful “page views” — a count
that discarded all those image files and simply tracked the number of HTML
files a site sent out (which correlates quite closely to the number of Web
pages read). Total page view counts for a day, week or month do tell you
something about how much traffic a site has. But they don’t tell you much
about how many visitors it has: 100,000 page views in a week could
be 10 people each reading 10,000 pages, or 100,000 people each reading one
page, or any variation in between.

So sites began tracking “unique visitors,” or the similar “unique IP
numbers.” Even though Web sites don’t know who you are by name, they know
the computer your browser is on by a unique IP number, and by tallying
these numbers they can get a pretty decent sense of how many individual
people visited their site during a given time period.

Page views and unique visitors are valuable statistics, and auditing
companies like I/Pro emerged
to validate them, but they both have serious drawbacks. If you hook up to
the Net by dialing a modem into a service provider, odds are good you are
assigned a “dynamic IP number”: The number changes each time you dial up,
so you might show up as 30 different “unique visitors” to a site you
visited daily for a month. And the practice of “caching,” which different
kinds of networks (most notably AOL) adopt to speed the delivery of Web
pages, reduces the amount of traffic (both page views and IP numbers) the
originating site can accurately count.

The new online medium had promised marketers and advertisers that it
would offer them a wealth of detailed information about usage. But because
of these complexities — and because Web users resisted early experiments
requiring site registration (HotWired was an important litmus test) — it
began to seem that the Web actually offered less reliable
information about who saw what than even those dinosaurs, TV and radio.

So over the last two years a wave of new companies — led by Media Metrix but also
including its competitors Relevant Knowledge (which Media Metrix purchased last October) and NetRatings (now allied with Nielsen)
— set out to measure the Web the old-fashioned way, following the “Nielsen
family” model. Each of these companies finds a random, statistically valid
sampling of users, plants tracking software on their computers and follows
them on their online travels. Then it extrapolates those usage patterns to
get a full picture of Web use: If 40 percent of participants visit any page
on a particular site during a month, then that site has a 40 percent reach.

In itself, sampling is a proven approach — ask any pollster. But it’s
alarming that the competing sampling services seem to come up with wildly
differing rankings. And Media Metrix has
problems building a reliable sample: To date, its Achilles’ heel has
been its troubles getting its tracking software installed on workplace
computers. Many commercial Web sites see their highest traffic during work
hours and believe that their users are visiting from their office desks;
these sites feel they’re being short-changed by reach numbers. And though
Media Metrix says it’s working hard to improve its workplace coverage, many
businesses are reluctant to install the software — possibly for technical
reasons, possibly because they don’t want to learn how many of their
employees are sneaking peeks at
Penthouse on the job.

Even if Media Metrix could guarantee that its sample properly captured
both workplace and home use, though, I’d have problems with the concept of
reach. As a measure of Web use, reach is weighted toward the superficial:
It favors sprawling sites with vast collections of largely unrelated pages
(like, for instance, GeoCities)
over well-focused sites that collect specific groups
of users with shared interests.

Say one site has 200,000 loyal users who visit regularly; another has
little regular traffic, but its wide variety of pages turn up in enough
disparate search-engine results to attract brief visits during the course
of a month from, say, 5 million visitors. The two sites may have
identical page-view counts. The former site may actually have a more
valuable franchise to sell to advertisers or to hand over to e-commerce
partners — but the latter site wins the reach contest by a landslide.

Reach has its place in the arsenal of Web metrics, and I think the folks
at Media Metrix and similar companies are honestly trying to
build useful gauges of Web traffic. But their choice of which
measurements to promote — and the media’s choice of which ones to
adopt as standards — will have a direct impact on the shape of the Web to
come, and on what kinds of sites ultimately thrive.

Reach is built upon an assumption that the Web is pretty much like TV
and radio, and can be measured in a similar way. The more we adopt reach
as the measure of Web success, the more we will be encouraging Web sites to
attract and hold audiences using broadcast media’s time-honored and
depressing techniques of least-common-denominator persuasion, and the less
we will be taking advantages of the traits — interactivity,
personalization, diversity of voices — that make the Web unique.

The point of the Web industry had better be to evolve into something
radically different from the broadcast world. Otherwise, hell, TV is a lot
easier to use — and the video quality’s better.

Let's Get This Straight: Reach for the hits

Why is it so hard to find a valid yardstick for measuring Web traffic?

Published February 5, 1999 8:00PM (EST)

By Scott Rosenberg

By

Related Articles