Archiving ourselves

We need better ways to save the media we're all creating, for our kids and for the historians of tomorrow

By Dan Gillmor

Published November 5, 2010 6:10PM (EDT)

Our cultural heritage isn’t just the books, magazines and newspapers we read, nor the movies and TV we watch or the radio we listen to. More and more of our culture takes the form of digital media — and more and more of that is what we create, not just what we consume.

Heritage is about preserving what we know (or at least what we think we know) for generations yet to be born. And in the age of democratized media, as we collectively create information that has news value for communities, small and large, the people who care most about saving what we’re creating are wondering how to do it.

No archive is as comprehensive as the one at the Library of Congress, where I’ve been a participant in a two-day meeting this week about the subset of “user generated” media we sometimes call citizen journalism. As usual, at sessions like this one – this is my third visit to the library to help out with its ambitious digital-preservation project, the National Digital Information Infrastructure and Preservation Program – there are more questions than answers.

The reason for libraries and archives like the Library of Congress is simple: We need a record of who we are and what we’ve said in the public sphere. We build on what we’ve learned; without understanding the past we can’t help but screw up our future.

It was easier for these archiving institutions when media consisted of a relatively small number of publications and, more recently, broadcasts. They’ve always had to make choices, but the volume of digital material is now so enormous, and expanding at a staggering rate, that it won’t be feasible, if it ever really was, for institutions like this to find, much less, collect all the relevant data.

Meanwhile, those of us creating our own media are wondering what will happen to it. We already know we can’t fully rely on technology companies to preserve our data when we create it on their sites. Just keeping backups of what we create can be difficult enough. Ensuring that it’ll remain in the public sphere — assuming we want it to remain there — is practically impossible.

Blogging pioneer Dave Winer, a participant in this week’s meeting, has some smart recommendations on creating what he calls “future-safe archives” — including “long-lived organizations to take part in a system we create to allow people to future-safe their content.” He lists universities, government and insurance companies as examples of such institutions. The Library of Congress knows it can’t store everything. Its archiving experts are working with a variety of partners, with a long-range goal of creating archives that are loosely connected but where researchers (and I hope regular folks) in the future will be able to easily find, retrieve and work with what’s being created today.

The technology industry isn’t an obvious candidate to provide the archiving institutions; as Dave notes, the tech companies are too likely to disappear or change in ways that make them unreliable. Even Google, for all its reach and power today, isn’t the place I want to store my work, in part because it’s a company that makes money by using our data to sell advertising. That’s not the relationship I want with my own archivist.

But the tech industry has a vital role to play in preserving the material we create ourselves, e.g. blogs, at the edges of the networks. It can work with the archiving institutions to ensure that we, the creators of media, can play a role in our own archiving.

What do I mean by this? Here’s an example. I use WordPress to create my personal website, and the website that accompanies my soon-to-arrive new book, “Mediactive.” I wish there was a plug-in for WordPress that would let me save my site to the wonderful Internet Archive, the nonprofit that is trying to archive as much online material (among other things) as possible. All blogging software vendors should have features like this, assuming the Internet Archive wants the material, which I’m fairly sure it does.

The value for future historians of what we do online comes from much more than blog posts. Among the sites that tell us most about our modern culture are such services as Craigslist and eBay. They are created entirely by their users, or at least the content is. How could they be persuaded to regularly archive what they do, for future reference?

I have little hope that Facebook would participate in such a system, because it’s Facebook’s obvious plan to itself be the repository for history. This is one reason that I don’t spend a lot of time posting things on Facebook, despite its usefulness; even though I can download what I do there, or at least some of it, no one but Facebook itself can get at the greater value of the service: the relationships among the users.

So when and if the Internet Archive (among others) makes a deal with WordPress and various content-creation platform providers, as I hope will happen someday, the information that goes into the archive needs to include more than just our blog posts. It should include the links I’ve made to other sites and reader comments, of course; but it should also include the inbound attention from people who’ve linked to what I’ve written, among the other relationships.

The complications go on and on. On my personal site I have RSS feeds from other sites where I’ve created some content, including such things as my Amazon and Yelp reviews and Twitter stream. I have no idea how to archive all of my public work in a compact way, or even if I should.

I’m hoping, sometime in the next few months, to help organize a meeting that connects technology people with archiving people so we can talk about personal archiving of this kind. One of the ideas raised at the Washington gathering was a “public commons” — a federated collection of services, I’d hope — where we could all save our creations, and if enough of the right people got together on this they could make and connect the tools to make it all work.

We need this for our children and grandchildren. They need it, as do the researchers and creators of tomorrow, to make their own world a better place — or at least to understand more clearly how their world got the way it is.

Archiving ourselves

We need better ways to save the media we're all creating, for our kids and for the historians of tomorrow

Published November 5, 2010 6:10PM (EDT)

By Dan Gillmor

By

Related Articles