Our species created about 5 billion gigabytes of information from the dawn of time until 2003. Before long, we will create that much information many times per day, according to IBM. The problem: No one is doing enough to select and preserve the bits that really matter.
One of the great paradoxes of the digital age is that we are producing vastly more information than ever before, but we are not very good at preserving knowledge in digital form for the long haul. There’s a difference between creating big server farms to store the information somewhere for near-term retrieval (industry is very good at that) and in fact choosing and preserving the data that matters, and being able to render it useful, at some time in the future (something that, scarily, we are not nearly as good at). We are radically underinvesting in the processes and technologies that will allow us to preserve our cultural, literary and scientific records.
Consider the experience of pulling out an old shoebox from under a bed and discovering a series of floppy disks there from the 1980s. Perhaps you smile, thinking of what might be on them; perhaps you shiver. How would you find out? Most of us have not preserved a vintage Macintosh SE to be able to play them back. Data formats have changed multiple times since then. From 8-inch to 5-and-a-quarter inch to 3-and-a-half-inch floppy disks to compact disks to thumb drives, we are continuously making progress in how we store our media — and trapping information in lost formats in the process. Best that you put the box back under the bed and not worry too much about it.
Obsolescence of this kind may, in fact, be a blessing. It’s important that much of the information we create is ephemeral. Otherwise, the world will become far too cluttered. Our behaviors would shift, torqued by the constant surveillance to which we increasingly subject ourselves. We will have an even harder time finding the knowledge that’s important in the vast ocean of the unimportant – much less making sense of it all.
It’s fine when it’s your old term papers that are locked away in an obsolete format. And many blogs, tweets, photos and status updates don’t need to be kept for the long run. It’s not so fine, though, when the lost knowledge has historical significance.
The problem is not that it’s impossible to transfer information from one format to another; with enough effort and cost, most data can be transferred to formats that can be read today. A cloud-based world, to which we are headed, is likely to be simpler to manage than a world of shoe-boxes, floppy disks and thumb drives.
But different problems come into relief in a digital era in which we are creating information at such speed and scale. First, most of the parties holding the data are for-profit firms, whose core business is not long-term storage. Unlike universities, libraries and archives, these firms are unlikely to be around for hundreds of years. In the blog-hosting business, the industry has changed enormously in just a decade or so. Even in traditional publishing, consolidation and change have been the watchword, not persistence of firms over the centuries. Second, the scale of what is being created is so far beyond what has been created in the past, which means that we will need new, technologically sophisticated approaches – which can scale along with the pace of production – to curate the meaningful bits of it.
Today, librarians and archivists are not involved enough in selecting and preserving knowledge in born-digital formats, nor in developing the technologies that will be essential to ensuring interoperability over time. Librarians and archivists do not have the support or, in many cases, the skills they need to play the central role in preserving our culture in digital format.
There is added reason to worry. Our national systems have been found to be weak in information technology. This concern was confirmed in March, when the United States Government Accountability Office published a report that criticized the Library of Congress for its information technology practices. The report’s headline: “Library of Congress: Strong Leadership Needed to Address Serious Information Technology Management Weaknesses.”
The good news: the National Archives is much stronger and more advanced when it comes to digital matters, under the leadership of David Ferriero. The completely wonderful Internet Archive, the brainchild of Brewster Kahle, saves iterations of the web in a converted church in San Francisco. A group of universities has come together in a partnership called the Digital Preservation Network, founded at the University of Virginia, to address key aspects of the problem. And despite the GAO’s deep and valid concerns, the Library of Congress has projects focused on this task, including the National Audio-Visual Conservation Center, which features recording and playing devices to help render materials in now-obsolete formats.
But no small handful of institutions is up to the scale of this task. The scope of the issue is enormous and will take a coordinated, national effort to accomplish for the United States alone, let alone the world. Too few librarians and archivists have the technical ability to lead the way.
The deeper problem behind the problem of digital preservation is that we undervalue our libraries and archives. We tend to think of them in the wrong light, solving an analog-era set of problems. We are underinvesting badly in them during this transition from the analog to the digital. If we fail to support libraries in developing new systems, those who follow us will have ample reason to be angry at our lack of foresight.
Drew Faust, now president of Harvard University, wrote an important book on the Civil War, titled "This Republic of Suffering." Her primary texts were gripping letters sent home from the battlefields, sometimes marked with blood and grime, the physical manifestations of war. Faust’s book draws upon these letters to bring to life gripping accounts from 150 years ago. The book represents a major contribution to our understanding of our country’s most challenging moments.
A century and a half ago, in 1865, Walt Whitman wrote of those who died in the Civil War: “Their precious precious blood” would live on “in unseen essence and odor of surface and grass, centuries hence.” So, too, do their last words, in the form of their letters. They live on, not just in Whitman’s "Leaves of Grass," but in Faust’s book and a film, by Ric Burns based on Faust’s work, that commemorate and describe the lives of these soldiers.
Will we have the emails, texts, Skype calls and voice mails sent home from the battlefields of the 21st century? Historians who seek to follow Faust’s example decades from now may not be able to repeat her methodology for a major event that occurs today.
If we don’t address our underinvestment in libraries and archives, we will have too much information we don’t need and too little of the knowledge we do.