Alex Wright

From ivory tower to academic sweatshop

After a few dot-com-era bumps, online education is back and bigger than ever. But so is corporate influence and bottom-line pressure.

  • more
    • All Share Services

From ivory tower to academic sweatshop

As he walked into the gloomy, windowless auditorium inside Denver’s Colorado Convention Center, Geoff Hunt remembers thinking, “God, there are a huge number of people here.”

Hunt, a history professor at the nearby Community College of Aurora, had accepted a friend’s invitation to attend the University of Phoenix graduation ceremony for its Denver-area students. Hunt was keen to take a closer look at Phoenix, the for-profit juggernaut whose booming distance-learning programs were changing the calculus of higher education at schools nationwide, including his own. Outside the Aurora faculty lounge, dark rumors were swirling of state bureaucrats talking up a troubling notion: the “professor-less classroom.”

Hunt listened intently as the commencement speaker, a Phoenix professor who had recently been named Faculty of the Year, gave a speech describing how Phoenix had transformed her role as a professor. “She defined her job,” he remembers, as “delivery of chapters.”

That phrase, Hunt says, “just sent chills down my back.”

Hunt isn’t the only faculty member feeling the chill. As distance learning grows into a $5 billion a year market — up 38 percent in 2004 alone — virtual classrooms are no longer the sole province of dot-coms and for-profit schools like DeVry and Phoenix. Top universities such as Harvard, Stanford and Duke now offer full credit for online courses. On campuses nationwide, distance learning is moving out of the pedagogical fringe and into the institutional mainstream.

While faculty continue to debate the educational merits of online teaching (a recent national survey found their opinions roughly divided), most agree that distance learning is here to stay. To some optimists this is an unqualified good thing — a chance to increase access to educational opportunities and to break down the hierarchies of traditional university bureaucracies. For every worried Geoff Hunt, another teacher is happily working at home, content never to see the inside of a lecture hall. But others are more alarmed and are beginning to wonder whether their jobs will ever be the same.

Just as the Internet brought wrenching operational changes to many corporations, so online learning is triggering a seismic shift in the academic power structure. Those changes stretch far deeper than the visible presentation layer of courseware, online discussions and multimedia presentations. Distance learning is changing not only teaching methods but also the shape of the curriculum itself. As schools reach out to a market composed largely of professional, career-minded students, they face growing pressure to cater to employers’ agendas; in some cases, even wiring themselves into the corporate information technology (IT) infrastructure. If a company like Lucent underwrites online courses at a business school, it expects a direct return on its investment.

“Universities are not simply undergoing a technological transformation,” writes York University professor David F. Noble, a vocal critic of distance learning. “Beneath that change, and camouflaged by it, lies another: the commercialization of higher education.”

When a cat named Colby earned an MBA online from Trinity Southern University in Plano, Texas, last year, distance-learning critics found a ready caricature for a popular stereotype: distance-learning schools as glorified diploma mills, doling out easy credentials to anyone with a Web browser and a credit card.

Indeed, plug the words “distance learning” into Google and you’ll see ads in the right-hand column of the Web page for dubious alma maters like Almeda University, promising your choice of associate’s, bachelor’s or master’s degree with “No Books! No Courses! No Studying!” But if distance learning were so easily dismissed, one might expect a little less enthusiasm from the 97 percent of public universities that now offer online courses. Last year, an estimated 3 million students took at least one class online and 600,000 students completed all of their coursework online.

While many educators continue to insist on the irreplaceable quality of in-person teaching, numerous studies show that under the right circumstances, and with certain subjects, online students achieve learning outcomes similar to those in physical classrooms.

Even critics acknowledge that distance learning opens doors for working professionals and residents of remote areas who would otherwise have limited access to higher education. But these students differ significantly from on-campus students, who often take years off to immerse themselves in a particular discipline. Distance learning students are typically older, mid-career, and careful about managing their time. They favor practical, skill-building courses like those in business, nursing, accounting, computer science and other marketable trades.

“Hitting the sweet spot in online education today means going after the working professional who wants to advance their career by taking courses,” says Philip DiSalvio, program director of Seton Hall University‘s SetonWorldWide program.

While many schools also endeavor to offer “soft” subjects in the humanities online, the market overwhelmingly favors professional education. “There is strong pressure to make education more technical, more like training,” says Andrew Feenberg, research chair in philosophy of technology at Vancouver’s Simon Fraser University. “That pressure comes both from the corporate world, and from students themselves, who are very career oriented.” The result: a growing commoditization of the curriculum and a tendency for schools to market education as a “product.”

At some schools, the boundaries between physical and virtual classrooms are dissolving into so-called blended learning environments that incorporate the Internet as an adjunct to the traditional lecture hall. Many faculty now routinely take advantage of courseware like Blackboard or WebCT to publish their lesson plans and lecture notes and to moderate online discussions as an extension of the classroom experience.

Noah Butter is working on his master’s degree in library and information science at San Jose State University, a blended program that incorporates online and offline courses. Of the 11 classes he has taken so far, four have met exclusively online, including his two current semester classes in online searching and information technology. All of his courses involve some form of online component, some meeting in person as infrequently as twice a semester.

Butter has discovered that online courses are no cakewalk. “Online courses are a lot more work,” he says, pointing out that classes require students to participate actively in online discussions and to stay on top of a constant stream of e-mail. Indeed, Butter feels that he has gotten more for his money from online classes than from some of his in-person classes. “It depends on the teacher,” he says. “When teachers don’t use the technology, and you only meet a few times during the semester, you end up feeling a little ripped off.”

But while Butter knows he is acquiring the professional skills he needs to pursue his chosen career, he sometimes longs for a more traditional campus education. “I have missed having more student and teacher face-to-face interactions,” he says. “In the courses where I have met students in class, I wished we could have spent more time together.”

Given the demonstrated effectiveness and broader outreach made possible by distance learning, only the most strident Luddite would argue that distance learning has no place in the arsenal of modern instruction. But the larger effect of distance learning technology extends beyond student-teacher dialectics and into the realm of institutional power relationships.

In addition to external market pressures, corporate influence also manifests itself in the expanding role of commercial software vendors, administrators and information technology professionals, who not only wield a growing influence over teaching methods, but who also bring to bear corporate values like teamwork, accountability and an overarching emphasis on “the customer.”

Arlene Hiss is a former Indy race-car driver, now the owner of a commercial recording studio and an occasional washboard player in a bluegrass band. She lives in a geodesic dome in Lake Elsinore, Calif., where she logs on each week to conduct an undergraduate class in critical thinking at the University of Phoenix Online. A Phoenix professor since 1991, Hiss loves teaching in the distance-learning program. “They give you everything: the syllabus, the textbook, weekly assignments,” she says. ” They put the lectures on the Web.” By “lectures,” she means the written documents furnished to her, and her students, by the Phoenix courseware servers.

With a Ph.D., MBA and 30 years of teaching experience, Hiss is perfectly qualified to create her own course materials. But Phoenix has built its business through economies of scale, developing a course once and then replicating it, so that many teachers can administer the same course to the school’s vast 200,000-plus student body.

That model of replicable courseware is taking hold at other schools as well. When she’s not teaching at Phoenix, Hiss leverages her Phoenix experience to develop courseware for the University of Liverpool, where she works as a so-called module manager, creating class syllabuses and assignments for online business classes. After she develops the course, Hiss then oversees a network of lower-paid instructors who teach the class using her materials. The other instructors are welcome to make suggestions, but as the module manager, Hiss has the final say, ensuring that teachers won’t make idiosyncratic changes to the curriculum.

When she’s not teaching at Phoenix or Liverpool, Hiss also finds time to teach online courses at Capella University, Southern New Hampshire University and Upper Iowa University.

Hiss may have her hands full, but she’s happy. “As long as my eyes work, as long as my fingers work, and as long as my computer works, I can’t even imagine going back to the ground.” Teaching at Phoenix gives her time to juggle other teaching jobs, manage her recording studio, play with her bluegrass band, and enjoy the freedoms of the contractor lifestyle. But personal freedom is one thing, academic freedom quite another. Like the other 8,000 faculty members who teach at Phoenix Online, Hiss will never have tenure.

Computer-based distance learning has been around in one form or another since the 1970s. But most of those efforts remained confined to academic computing labs until the Internet boom of the 1990s. The explosion of Web access, coupled with advances in educational software, set the stage for an expansion that quickly mushroomed into a dot-com-era boom.

Amid the contagious optimism of the IPO era, universities began investing aggressively in online learning initiatives. Starting around 1998, big schools like UCLA, NYU, Temple, Columbia and Cornell all kicked off heavily funded virtual-campus initiatives. Other schools hedged their bets by joining online consortia like UNext (funded by Larry Ellison and Michael Milken, among others) and the Western Governors’ University.

In many cases, these dot-edu projects took shape as for-profit subsidiaries, owned by the parent institutions but operating with a clear mandate to generate profits. In some cases, universities launched their dot-edus as joint ventures with commercial software companies. In 2000, four companies — Kaplan Ventures, Knowledge Universe, Pearson and Sylvan Ventures — invested $3.6 billion in online initiatives.

To the MBAs and university administrators who led the charge, the dot-edu business looked like an unbeatable proposition: a proven product, new markets unbounded by geographic constraints, economies of scale in the form of “write-once, run-anywhere” courseware, and potentially higher operating margins than all those labor-intensive physical classrooms.

“The dream was to transform colleges into record companies, selling CDs and ‘colleges in a box’ for $49.95,” says Feenberg. “But the people who made these predictions had never themselves used the technology for education and knew almost nothing about it.”

Amid a flurry of press releases and mostly breathless media coverage, the dot-edus built their businesses in a hurry, only to find themselves staring down a stark reality: the students never showed up. “University presidents and administrators were talked into this by computer companies and journalists,” says Feenberg. But like many other would-be Internet entrepreneurs, the dot-edus discovered that building an Internet business turned out to be considerably more complicated than buying a few million dollars’ worth of hardware and software, hiring pricey consultants, and waiting for the money to pour in.

Worse, faculty members were getting restless.

The UCLA faculty threatened to walk out when the administration issued a dictum requiring the submission of lesson plans to the for-profit subsidiary (without offering the faculty a dime in extra compensation). More galling yet, the administration wanted to invite corporate sponsors to paste their logos across the professors’ syllabuses, in exchange for a $10,000 “curriculum development” fee. Similar protests erupted at other schools, as the faculty rose up to defend the curriculum against what they perceived as shameless profiteering.

By 2001, the dot-edu bubble was bursting fast. NYUOnline closed its doors after burning through $25 million of the school’s money; Temple shut down its dot-edu before it even opened; Wharton’s online business school — in no small irony — filed for bankruptcy; UNext laid off half its staff; and Harcourt Higher Education, an ambitious online venture that had launched with much fanfare and a plan to enroll 50,000 students by 2005, shut down in 2001 after enrolling a grand total of 32 students.

Other schools managed to keep their dot-edus afloat, but with drastically lowered expectations. “The overselling was so enormous that it was self-defeating,” says Feenberg. The result: a boom-and-bust cycle familiar to anyone who bought Internet stocks in those days.

“E-learning was massively misconstrued early on,” says Matthew Pittinsky, the chairman and co-founder of Blackboard, “with predictions of the transformation of higher education — where everyone would go to the elite schools online — that just proved to be plain false.”

With millions of dollars’ worth of software and infrastructure sitting on the shelf, however, administrators and university information technology departments weren’t about to just pack up and admit defeat. After all, distance learning was hardly a failed business model. DeVry and Phoenix were flourishing; and the corporate education market was going like gangbusters. The business was still out there; they had just gotten the formula wrong.

For many faculty members in the late 1990s, the dot-edu bubble may have seemed a distant rumble: an emblem of that era’s speculative excesses and of the vainglory of administrators and dubious Internet visionaries. Now, fast-forward to 2005. Just as many companies spun out their Web operations as dot-com subsidiaries in the late 1990s, only to bring them back into the fold after the IPO market evaporated, so have many of the dot-edu initiatives found new life back on campus.

SetonWorldWide launched in 1998, at the height of the dot-edu boom. Growing slowly and deliberately, Seton today enrolls about 300 online students. DiSalvio, the program’s director, expects the online school to fold itself back into the university mother ship over the next few years. “We started out as an entrepreneurial unit, but as online education has become mainstream within the school, as it’s become more prevalent and accepted, we see the logic of decentralizing the program and putting it into the respective schools and colleges, under the management of the deans.”

Although many schools made the mistake of approaching distance learning as an entirely new product during the dot-edu boom, they are now beginning to recognize its potential as a new channel in the supply chain. And just as the Web has enabled many companies to reengineer their supply chains to integrate more closely with partners and customers, so some schools are beginning to integrate their distance-learning programs more deeply with corporate agendas.

At Arizona State University, students can now earn not only a fully accredited MBA online from the W.P. Carey School of Business, but many of them do so under the auspices of the school’s Corporate Program, in which local employers like Lucent and ChevronTexaco partner directly with the business school to create tailor-made MBA programs for their employees.

When a Lucent employee enrolls in a managerial economics class online, the course Web site comes pre-populated with a set of Lucent financial data, which provides the fodder for most of the class exercises. To earn the MBA, the student must undertake an applied project that produces a measurable business outcome for the employer. “The goal is to realize a cost saving for the corporation,” says Steve Salik, the manager of delivery systems and strategic development for the business school. “By having the students achieve that cost savings, [the corporation] can recoup the entire cost of the program.”

Corporations aren’t the only customers looking for that kind of deep integration. In 1999, the U.S. Army launched eArmyU, a distance-learning network that ties together 29 accredited universities into an online learning consortium. The network offers degree programs through a centralized portal developed under contract by IBM. To date, 30,000 active-duty soldiers have enrolled in eArmyU; program administrators hope to have 80,000 student-soldiers enrolled by the end of 2005. The Army program has proved a great boon for schools like Excelsior College in Albany, where active military make up more than 25 percent of the student body.

While soldiers in Iraq and Afghanistan undoubtedly benefit from access to educational opportunities, their academic freedom is hardly unbound. The Army will reimburse students only for classes taken within strict degree requirements, and it won’t reimburse for elective classes that fall outside those requirements; you won’t find Uncle Sam footing the bill for Renaissance poetry seminars. “The military wants courses that are relevant to what they’re doing,” says Susan Nash, the associate dean of liberal arts at Excelsior and a longtime distance-learning professor. Recently, she has worked with the Army to develop practical course offerings with titles like “Leadership in Difficult Times.” “I can understand their reasoning, but I think it’s bad for education,” Nash says. “If we’re not careful we’re going to lose the ability to think spontaneously. We’re being programmed.”

As schools react to growing institutional pressures, faculty are discovering that those influences extend beyond the contents of the course catalog. “Institutions have put in place a production process that hadn’t existed before,” Pittinsky says. Just as the Web transformed the role of the information technology staff in many corporations — bringing them out of the back office and into the front line of marketing and sales operations — so online learning technologies are changing the makeup of academic organizations.

The most dramatic change for most academic departments has been the emergence of IT professionals from the administrative back office to the forefront of curriculum development. “Seven or eight years ago, the only systems administrators [on campus] would be managing things like e-mail systems, systems that really didn’t touch teaching and learning at their core,” Pittinsky says. “They were this kind of back-office priesthood. Now, you see an entire group of professionals who have the tech savvy to manage systems at a large scale, but they are also consultative to faculty on instructional design.”

The ascendancy of information technology staff is changing the way courses get produced and is introducing a corporate organizational model into the traditionally benign dictatorship of the lecture hall. For faculty brought up in the old school, amid the Byzantine hierarchies of academic departments, the new model of integrated teamwork may take some getting used to.

“There are some faculty who get it, and some faculty who don’t,” DiSalvio says. “We have found there are some faculty who may be charismatic in person, but they are terrible online.”

Those faculty who do participate in online-course development often have to adjust to the unfamiliar dynamics of team-based course design. In many cases, that means faculty members work as part of interdisciplinary curriculum-development teams, alongside other skilled “knowledge workers” like instructional designers, systems administrators and media specialists.

“If you look at how a lot of [courseware] is really being produced, they’re sweatshops,” Nash says. “You have these busy people creating these objects — like multiple-choice tests, or little games, or learning objects — these are people who are paid nothing, whereas other people are paid a lot for overseeing it, like factory owners.”

“Our professors are content experts. That’s all they are,” says ASU’s Salik, voicing a not-uncommon administration view of the professor’s role in online-course development. For institutions, the reduction of faculty to “content experts” does yield clear economies of scale. That sentiment also echoes an old dot-com ethos: separating content from delivery. Says Salik: “If the executive education director calls me up and says, ‘This guy from Honeywell is here, and they want a one-day executive education seminar, but they want one piece from course A, one piece from course B, one piece from course C,’ we can roll that together and send it out the door in about 20 minutes.”

The reuse of online courseware will likely extend not just between courses in a single school, but between institutions as well. “Once universities start learning how to cooperate with each other through productive associations,” Nash says, “I think we’ll see a lot more sharing of learning objects, a lot more sharing of strategies and even revenue.”

The prospect of assembly-line course production and the repurposing of courses between schools seems to confirm some of the critics’ worst fears. “Faculty have much more in common with the historic plight of other skilled workers than they care to acknowledge,” Noble writes. “As in other industries, the technology is being deployed by management primarily to discipline, de-skill and displace labor.” And while breaking instruction into modules may yield tangible benefits to students and employers, faculty find themselves in an increasingly reactive posture to institutional pressures on the curriculum.

The trend, Feenberg says, leads toward “deprofessionalization,” which he describes as “taking highly respected and reasonably well-paid professionals and substituting them with part-time people who would have no regular employment, sub-contractors, and so forth.”

Whether online learning spells a new age or a dark age for higher education, even its most strident critics agree that distance learning will be part of the educational firmament for a long time to come.

But if the Internet has taught us anything, it is this: Open networks have a way of undermining institutional agendas, and putting power back in the hands of individuals.

While corporate software vendors and university administrators seem to be steering the distance-learning agenda today, there are signs of a nascent open-source movement on the horizon that just might upset the balance.

In 2002, MIT announced an ambitious initiative to publish all of its course materials online — free of charge — through the MIT OpenCourseWare projects. By 2007, the school hopes to have the full contents of all 2,000 of its courses available on the Web. By making its course materials freely available, the school hopes to encourage academics at other institutions to do likewise and percolate a broad resource-sharing movement among universities.

Already, many professors are contributing their materials to public open-learning object repositories, freely available on the Web and easily accessed through ad hoc courseware using personal publishing tools like blogs or HTML editors.

It’s too early to say whether these experiments will ever pose a threat to the corporate distance-learning economy, but they hold out at least the possibility of a new model of courseware development. “I think there’s a strong force back to the individual,” Nash says. “I think that eventually stuff won’t be so locked away. I think we’ll see more porous borders.”

That kind of porousness might someday even call into question the structure of educational institutions themselves. No less a futurist than Peter Drucker has predicted that by 2020, “the universities of America, as we have traditionally known them, will be barren wastelands.”

Whether or not such a dire scenario comes to pass, a more open model of distance learning does seem to hold out at least the possibility that institutional pressures might give way to a renewal of personal bonds between teachers and students. “There’s an old saying that the ideal college is Mark Hopkins on one end of a log, and a student on the other,” Pittinsky says. “When you break the classroom out of the limitations of time and place, that becomes a lot more achievable.”

But that Arcadian ideal seems a long way away from the commercial reality of today’s distance-learning market. “The reduction of education to a kind of simplified training violates one of the most basic features of all human societies: the personal transmission of culture,” says Feenberg, who wonders just how far we have come from the deeper origins of teaching, when an elder would gather children around the fire on some ancient evening and say: “‘This is the story my father told me, and I’m going to tell it to you, and you will tell it to your children.’ And then he tells them a story about plants and animals, and the gods.”

In search of the deep Web

The next generation of Web search engines will do more than give you a longer list of search results. They will disrupt the information economy.

  • more
    • All Share Services

In search of the deep Web

When Yahoo announced its Content Acquisition Program on March 2, press coverage zeroed in on its controversial paid inclusion program, whereby customers can pony up in exchange for enhanced search coverage and a vaunted “trusted feed” status. But lost amid the inevitable search-wars storyline was another, more intriguing development: the unlocking of the deep Web.

Those of us who place our faith in the Googlebot may be surprised to learn that the big search engines crawl less than 1 percent of the known Web. Beneath the surface layer of company sites, blogs and porn lies another, hidden Web. The “deep Web” is the great lode of databases, flight schedules, library catalogs, classified ads, patent filings, genetic research data and another 90-odd terabytes of data that never find their way onto a typical search results page.

Today, the deep Web remains invisible except when we engage in a focused transaction: searching a catalog, booking a flight, looking for a job. That’s about to change. In addition to Yahoo, outfits like Google and IBM, along with a raft of startups, are developing new approaches for trawling the deep Web. And while their solutions differ, they are all pursuing the same goal: to expand the reach of search engines into our cultural, economic and civic lives.

As new search spiders penetrate the thickets of corporate databases, government documents and scholarly research databanks, they will not only help users retrieve better search results but also siphon transactions away from the organizations that traditionally mediate access to that data. As organizations commingle more of their data with the deep Web search engines, they are entering into a complex bargain, one they may not fully understand.

Case in point: In 1999, the CIA issued a revised edition of “The Chemical and Biological Warfare Threat.” It’s a public document, but you won’t find it on Google. To find a copy, you need to know your way around the U.S. Government Printing Office catalog database.

The world’s largest publisher, the U.S. federal government generates millions of documents every year: laws, economic forecasts, crop reports, press releases and milk pricing regulations. The government does maintain an ostensible government-wide search portal at FirstGov — but it performs no better than Google at locating the Hatfill report. Other government branches maintain thousands of other publicly accessible search engines, from the Library of Congress catalog to the U.S. Federal Fish Finder.

“The U.S. Government Printing Office has the mandate of making the documents of the democracy available to everyone for free,” says Tim Bray, CTO of Antarctica Systems. “But the poor guys have no control over the upstream data flow that lands in their laps.” The result: a sprawling pastiche of databases, unevenly tagged, independently owned and operated, with none of it searchable in a single authoritative place.

If deep Web search engines can penetrate the sprawling mass of government output, they will give the electorate a powerful lens into the public record. And in a world where we can Google our Match.com dates, why shouldn’t we expect that kind of visibility into our government?

When former Treasury Secretary Paul O’Neill gave reporter Ron Suskind 19,000 unclassified government files as background for the recently published “Price of Loyalty,” Suskind decided to conduct “an experiment in transparency,” scanning in some of the documents and posting them to his Web site. If it weren’t for the work of Suskind (or at least his intern), Yahoo Search would never find Alan Greenspan’s scathing 2002 comments about corporate-governance reform.

The CIA and Dick Cheney notwithstanding, there is no secret government conspiracy to hide public documents from view; it’s largely a matter of bureaucratic inertia. Federal information technology organizations may not solve that problem anytime soon. The deep Web search engines may just solve it for them.

For almost as long as there has been a Web, there have been Web search engines. So one might reasonably ask why the deep Web has remained out of view for so long.

Traditionally, Web search engines have grown their databases through simple brute force. All the major search engines survey the Web by dispatching legions of simple programs known as spiders, crawlers, robots or harvesters to trace their way through the endless chains of hyperlinks that tie Web pages together.

That method works well for the static HTML pages and predictable URLs that make up the upper strata of the Web. But the deep Web resides mostly in databases, shielded by a lattice of registration gateways, session cookies and dynamically generated links. Unless an organization consciously chooses to share its data, by opening up an API or Web services feed — the way Amazon books show up in a Google search — then the data will likely remain unseen to most users.

New search engines now under development are exploring methods for penetrating the database barriers. BrightPlanet has developed a formula for brokering queries across multiple deep Web data sources at once, aggregating the results and letting users compare changes to those results over time — a process known as “differencing.”

That capability has attracted considerable interest from certain government agencies that shall remain nameless. “Some of our clients are spooky,” says BrightPlanet COO Duncan Wittes. Other BrightPlanet customers include state governments, competitive intelligence researchers, and political campaigns whose “oppo” teams may want not only to search for what a candidate has said but also for what he or she may have “unsaid” over time.

Soon-to-launch Dipsie is pursuing an alternative approach to unlocking the dynamic Web, by deploying a kind of souped-up spider that penetrates barriers like forms, drop-down lists, dynamically generated URLs and session cookies. Dipsie’s spider works by emulating a “well-formed user” that, from the Web site’s point of view, behaves just like a real flesh-and-mouse user, enabling the spider to cache the kind of data typically visible only to a human user.

Other search developers, including IBM, Google and Intelliseek, are exploring their own approaches to mining the deep Web. But in the wake of this week’s announcement, Yahoo is now the elephant in the living room.

Yahoo won’t discuss the specifics of how its search algorithms work. But the company does acknowledge that its Content Aggregation Program will give paying customers a more direct pipeline into its search database. Yahoo Search vice president Tim Cadogan says, “Ultimately we want to search the whole Web for free,” but he nonetheless sees the CAP program as a way of enabling “direct, structured relationships with content providers” to “deliver a higher-quality search experience for users.”

It takes a fine ear for P.R. nuance to distinguish “higher-quality search experience” from “better results.” Yahoo has issued copious disclaimers assuring non-paying customers that they will receive the same algorithmic treatment as paying ones. But the company acknowledges that paying customers will likely benefit from a “quality review” designed to help companies improve their chances of showing up in search results.

“Cadogan claims that people who send money can’t count on getting better results,” says Bray. “Do you believe that? I don’t.”

Every year, the University of California at Davis pays the publisher John Wiley about $14,000 for a subscription to the Journal of Comparative Neurology, which publishes breaking research in its field. That may sound like a steep price tag for what is essentially a magazine subscription, but it’s a tiny dollop of the $20 million the U.C. libraries spend every year on scholarly journals.

Scientific, technology and medical publishing constitutes an $11 billion industry. And like the rest of the publishing business, scholarly publishers have undergone massive consolidation in the past two decades. Once the province of small university presses and boutique academic imprints, scholarly journals now emanate from giant publishing conglomerates such as Elsevier, Thompson and Blackwells.

“The well-established subscription model that evolved around print journals is a cash cow,” says Peter Lyman, professor at the UC-Berkeley School of Information Management and Systems. “One that the publishers are terrified of damaging accidentally, through online publishing.”

But unlike trade-book publishers, who count on Amazon and Barnes & Noble to move physical units of the latest Harry Potter tome, scholarly publishers rely increasingly on electronic journal subscriptions and paid search services to fuel their revenues. Their customers — mostly academic institutions and research organizations — insist on providing Web access to journal content. To meet that demand while protecting their valuable data stores, the large publishers have responded by rolling out private permission-based search gateways to the contents of their journals, usually under highly restrictive license terms and tightly managed IP access.

But those pricey journal databases now compete for attention — and search queries — from students and faculty with ready access to Google, Yahoo and the rest. And while the public search engines may not find every article in the journal literature, a growing portion of published research also finds its way out onto the Web.

For example, when gene researchers identify a new DNA sequence, they usually submit the sequence to the National Institutes of Health’s GenBank — a public deep Web resource — before submitting it to journals for publication.

Legislation pending in Congress would ensure that all research funded by federal taxpayers be made available free of charge to the public, over the Internet. Meanwhile, new cooperative academic initiatives like the Public Library of Science and the National Science Digital Library are trying to expand access to scholarly research, opening up more indirect competition for the proprietary publishing systems.

And as more scholarship finds its way onto the Web, page-ranking algorithms are also providing an alternative quality rating system to the traditional scholarly peer review that journals have always employed.

While page ranking won’t replace the scholarly review process anytime soon, the expansion of public Web search engines will put downward pressure on the premium that publishers can command. “I don’t think [page ranking] is more reliable,” says Lyman, “but I do think it’s perceived as legitimate. The cost of creating formally quality-controlled information may drive people to consider lower-cost alternatives.”

Lyman adds, “When the public begins to use and accept non-qualified information — relying on Google or other things to perform that function, like Technorati — there are beginning to be quality mechanisms out there that are user-centric or generated by users,”

How will scholarly publishers react to the encroaching competition from deep Web search engines? “The publishing industry is not famous for being progressive, forward thinking or fast moving,” Bray says. “But if they ignore [deep Web search], they could find themselves in a situation like the record companies, where someone finds a way to subvert them.”

- – - – - – - – - – - -

The deep Web contains some 500 times more data than the surface Web; but to regard the deep Web as simply a bigger and better version of the current Web is to overlook the essential feature of databases, which is structure. Most of the deep Web is structured or semi-structured data, as opposed to the sea of flotsam HTML that bobs across the surface Web.

“Once you get into the deep Web, all of these data sources often have much more metadata available,” says Bray. “This could be a huge opportunity for companies looking at new ways of presenting search results.”

Deriving search results from structured data sets will open up new possibilities for search engines. In all likelihood, search engines will gradually abandon the flat listings-style result pattern you see on a typical 12-page Google result. (And who ever gets to the 12th page, anyway?) Not only could deep Web search engines present more useful and manipulable views into structured data but, given some basic lingua franca of structural vocabularies, they could also aggregate those results in endlessly permutable combinations.

“It’s ridiculous to think that the one-dimensional result list is going to be the universal paradigm for all imaginable searches forever,” Bray says. “If you type ‘bicycle’ into Google, you get a list of results having to do with bicycles. But that result is, in a very important way, a lie. It ignores the fact that some of these things are about bicycle racing, some are about bicycle manufacturing. It ignores things that Google might not even know about.”

As deep Web search engines unearth the structures of large data sets and make those structures visible across organizations, they will create a powerful incentive for organizations to invest in more consistent, predictable structures (a trend already manifest in the growth of Web services and in Yahoo’s search quality guidelines). In exchange for the benefits of increased exposure, these organizations will yield another level of autonomy.

While government and academic institutions may generate the greatest volume of deep Web content, corporations undoubtedly generate the most monetary value in Web data: customer databases, product catalogs, technical knowledge bases and myriad other data sources with quantifiable business value.

Over the last decade, companies have invested heavily in Web infrastructure, including countless local search engines. While many companies already outsource their public Web site search functions to companies like Google, many also have developed specialized search engines for their own deep Web data, like technical support databases.

Those investments make plenty of sense when that data won’t readily show up in a public Web search. But as deep Web searchers penetrate these gateways, will companies continue to see the value of investing in their own public interfaces?

In the near term, deep Web search engines will likely dampen company expenditures on local search initiatives. But in the longer term, the changes may prove more far reaching. “The quality and ubiquity of Web search engines hides the fact that most organizations have really crappy search mechanisms,” Bray says. “I think that’s creating a tension within organizations.”

As public search engines continue to supplant the role of organizations’ own information-retrieval systems — be they search databases, call centers or sales engineers — once internal-facing systems will assume increasingly outward-facing roles. “When the ability to develop different messages for different audiences is curtailed by universal availability,” says Gartner analyst Whit Andrews, “the nature of the message, its format and associated issues become paramount.

No one expects IT departments to go out of business, but the external pressures of deep Web search will almost certainly force long-term changes in the role, structure and autonomy of local IT organizations as they gradually lose direct control over customer transactions.

- – - – - – - – - – - -

Every search query is a unit of desire. Search companies, like all businesses, exist by transforming desire into hard currency. As deep Web search engines insinuate themselves into deeper and deeper levels of organizations, they will not only offload search traffic, they will trigger a series of massive disruptions in the information economy.

If you buy the Cluetrain maxim that “hyperlinks subvert hierarchy,” then surely deep Web search engines will amplify that subversion. As search engines extend their reach deeper into and across organizations, the boundaries between those organizations will feel more fluid — both to consumers and to the organizations themselves. The first thing most of us notice may be better search results.

Somewhere inside that complex apparatus of desire and fulfillment, a transformation is taking place, one whose effects we can barely foresee.

Editor’s note: This story has been corrected since its original publication.

Continue Reading Close