Prowling the ruins of ancient software

Famous programs from just a generation or two ago are in danger of disappearing from human ken, forever.

Topics: Copyright, Intellectual Property,

Prowling the ruins of ancient software

For Grady Booch, the nightmare goes something like this: Deep in the future, a team of archaeologists stumble onto a rare cache of 20th century art, a major assortment of works thought lost to the ravages of time.

The only problem, of course, is that they don’t know it. All the images are recorded in an obsolete digital format, JPEG, and nobody knows how to unscramble the data. As a result, the hard disk containing said artwork spends its days not in a museum but as a coffee coaster in some college professor’s crowded office.

“It might seem silly now, but put yourself 1,000 years in the future,” says Booch, chief scientist at IBM’s Rational Software subsidiary. “It’s not too hard to imagine.”

In an industry where one man’s clever C code is another man’s Linear B, Booch already knows the frustration of playing software archaeologist. As co-developer of the Universal Modeling Language (UML), a mid-1990s effort to create a common “blueprint” notation for object-oriented software programs, he’s spent the last 10 years laboring to spare future programmers the same torment.

It’s an uphill battle on a hill that is only growing steeper. With new programs replacing old and no major company or institution playing the central role of source-code archivist, the amount of software history currently circling the memory hole is scarily large. And even if there were a central institution, recent changes to the copyright code have made the transfer of source code from old media to new forms of storage a dicey prospect, legally. Add it all up, and you have the ideal makings for what some are already calling the “digital dark age.”

“Things are going to be lost not because people don’t want to save them or because the original creators don’t want to save them, but because they can’t save them,” says Brewster Kahle, founder of the Internet Archive, an institution that has lobbied for a safe harbor within the Digital Millennium Copyright Act to shield institutions looking to archive source code.

For Booch, the barriers to software preservation aren’t so much legal as educational. Most developers have come to accept the evolvable nature of software programs. What is lacking is the ability to examine static source-code snapshots with a scholarly, comparative eye. In the interest of encouraging that skill, Booch this fall will lead a seminar on software archaeology and preservation at the newly reopened Computer History Museum in Mountain View, Calif.

“Our industry has had a major effect in changing the world,” says Booch, talking over the phone from his Denver, Colo., office. “It would be great if we could preserve the artifacts and interview the architects while they’re still alive.”

Booch isn’t alone. Now that the hysteria surrounding Y2K has faded, developers are free to worry about legacy code again. One increasingly common worry is what to do with it? For every modern offshoot of DOS/Windows, Unix and Macintosh OS evolving with the marketplace, a dozen ghost programs lurk inside yellowed engineering pads, punch-card stacks and slowly degaussing magnetic memories. Even if programmers could get their hands on these programs and find a way to preserve and update their contents, a new question emerges: How do you qualitatively analyze those contents on a historical basis?

“It’s funny,” says Dave Thomas, a Dallas software consultant and co-author, with Andrew Hunt, of “The Pragmatic Programmer,” a 1999 book on software design methods. “Colleges spend a lot of time teaching people how to write code, but very few teach them how to read code. When you think about it, we programmers spend most of our time reading code, not writing code.”

To help fill the gap, Thomas served as cohost of the 2001 Software Archaeology: Understanding Large Systems workshop, hosted by Object Oriented Programming, Systems, Languages and Architecture (OOPSLA). Starting with the unifying question, “How do you come to grips with 1,000,000 lines of code right away?” conference speakers traded various tips, tools and techniques acquired through professional and personal encounters with unfamiliar systems.

“Whenever we’re faced with big problems in software, we tend to fall back on metaphors,” says Thomas. “In this case archaeology metaphor happens to be a good one. Sometimes you do archaeology with a backhoe. Sometimes you do it with a toothbrush.”

Those partial to the backhoe approach can use Ward Cunningham’s Signature Survey program. Billed as a “method for browsing unfamiliar code,” Signature Survey scans through source code and compresses lines of text into a single punctuation symbol. Operating on the assumption that a file’s size is proportional to the number of punctuation marks separating individual elements (packages and files in Java, for example), Signature Survey offers a quick guide to programming thickets and areas of quick repetition.

“It’s a satellite system for looking over large bodies of work,” Cunningham says. “It lets you use your own human pattern recognition to see variation over the whole program. It also leads you to interesting parts of the program to read.”

Thomas says his own preferred technique is to import a program’s contents into Microsoft Word and reduce the zoom factor as far as it will go. The resulting 50-page image leaves little for the eye to make out other than jagged patterns of text and blank page. Still, even these patterns can reveal peculiar anomalies in developer mood or style. “Sometimes the structure is easier to see at that level than if you’re digging around line-by-line,” he says.

Both Thomas and Cunningham liken their techniques to the aerial surveys some archaeologists use to spot the overall structure of burial mound networks, neolithic cairn patterns, etc.

“It shows the most interesting places to dig,” says Cunningham.

It also provides a quick way to track the flow of ideas and source code from one program to the next. Cunningham, a man best known on the Web as the creator of the Wiki collaborative online authoring language, has loaned out his forensic talents to companies embroiled in legal disputes over intellectual property and prior art. He’s also used it to refactor, or streamline, his own programs, stripping out redundant sections and commands.

You Might Also Like

When it comes to the toothbrush level, forensic tools and techniques are still in development. Booch says the fall workshop will discuss ways to analyze the fine structure of programs and to detect the emergence of novel techniques. One potential benefit of such knowledge would be a steep reduction in the number of frivolous patent claims filed by software companies.

“IBM believes in patents. I believe in them, too, but there are a lot that look suspicious,” Booch says. “What better way to check for prior art than to have the source code ready and available for inspection?”

Herein lies the final goal of the fall Computer History Museum conference: to provide a foundation for a future exhibit on classic software programs and to provide a “vocabulary” for the intellectual dissection and discussion of these programs.

“Maybe I’m horribly geeky,” says Booch, “but I find tremendous beauty in looking at well-written software programs. There’s an elegance, a brilliance that we’re only now developing the critical means to describe. We have literary critics. We have art critics. We don’t have any software critics, yet. We need software critics, too.”

Booch and his allies will need to overcome a number of obstacles, first. The largest obstacle at the moment is the lack of a central source code repository. In an online article, Elisabeth Kaplan, an archivist at the University of Minnesota’s Charles Babbage Institute, lays out the frustrating history of software preservation. In 1986 the Computer Museum, a Boston forerunner of the current Computer History Museum, commissioned a report on how to archive software programs. That report identified many of the challenges but left the solutions to future reports. In 1988, the Library of Congress created a Machine Readable Collections Reading Room, essentially a repository of old machines capable of reading out-of-date programs. The project was phased out a few years later, however.

Since then, the topic of preservation has resurfaced every three years or so, a periodic rate roughly coincidental with the upgrade cycle of most commercial software programs, by the way.

“The issue comes up again and again,” says Kaplan. “From an archival perspective, though, it’s just not worth it to put resources into preserving software. There’s just not enough projected use. The fact is, when you add up the amount of people who can use these programs, there are like five of them.”

One institution willing to take up the burden is Kahle’s Internet Archive. The Internet Archive already stores screen shots of Web sites and other artifacts of the digital age. Adding source code to the mix would be easy enough, says staff software preservationist Simon Carless. Unfortunately, legal issues and aging copy-protection mechanisms make it difficult to provide a decent record of historic programs.

Carless says the Digital Millennium Copyright Act clouds the current preservation landscape. Although the 1998 law lets archives make copies of copyright-protected works for preservation purposes, it imposes harsh criminal penalties for any circumvention of copy-protection mechanisms. Rather than risk legal blowback, Carless and the Internet Archive are currently petitioning Congress to clarify that archival organizations are exempt from such penalties.

“Even if you’re an institution that’s allowed to archive stuff, there’s still a possible DMCA problem,” Carless says. “If there’s a physical hardware dongle that restricts copying, are you allowed to emulate that dongle to get the software running or does that qualify as a circumvention? We don’t know.”

Carless and the Internet Archive have recently requested that Congress expand its list of exemptions to Sec. 1201 of the DMCA, the portion that prohibits the circumvention of copy-protection mechanisms, to include software source-code preservation efforts. While waiting for a response, the Internet Archive has built a page displaying famous programs currently on the brink of software extinction.

In a similar attempt to rally the public, the Computer History Museum’s Booch has sent out surveys asking programmers to nominate “classic” programs for a potential source-code exhibit. The list, originally intended to be a Top 50, already includes more than 150 games, applications, tools and programming languages. He hopes to devote the upcoming seminar to discussing how to present such programs to the public in a way that encourages further study and preservation.

“There’s a great difference between walking up and showing somebody the Illiac and showing them the original source code for Lotus 1,2, 3,” Booch admits.

Booch hopes to ally the preservation movement with two powerful forces: the World Wide Web and the open-source software community. Both have already proven invaluable in the preservation and publication of coding techniques, he says. He also plans to lobby companies with a stake in seeing their early works preserved.

Though Booch is hesitant to predict a donation of the original DOS source code from Microsoft, he has spoken with archivists inside the Redmond-based company wrestling with the same ideas. He also holds out hope that, with a little schmoozing and a little ego massage, the Computer History Museum might be able to encourage a more direct form of participation.

“Imagine somebody 100 years from now watching Bill Gates explaining the structures of his first program,” says Booch, throwing out yet another hypothetical scenario. “Just think: Fox could have a reality show on software programming.”

Booth punctuates his dream scenario with a quick laugh: “Actually, that’s pretty scary when you think about it.”

Sam Williams is a freelance reporter who covers software and software-development culture. He is also the author of "Free as in Freedom: Richard Stallman's Crusade for Free Software."

More Related Stories

Featured Slide Shows

  • Share on Twitter
  • Share on Facebook
  • 1 of 13
  • Close
  • Fullscreen
  • Thumbnails

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    Young Daya has yet to become entirely jaded, but she has the character's trademark skeptical pout down pat. And with a piece-of-work mother like Aleida -- who oscillates between jealousy and scorn for her creatively gifted daughter, chucking out the artwork she brings home from summer camp -- who can blame her?

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    With her marriage to prison penpal Vince Muccio, Lorna finally got to wear the white veil she has fantasized about since childhood (even if it was made of toilet paper).

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    Cindy's embrace of Judaism makes sense when we see her childhood, lived under the fist of a terrifying father who preached a fire-and-brimstone version of Christianity. As she put it: "I was raised in a church where I was told to believe and pray. And if I was bad, I’d go to hell."

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    Joey Caputo has always tried to be a good guy, whether it's offering to fight a disabled wrestler at a high school wrestling event or giving up his musical ambitions to raise another man's child. But trying to be a nice guy never exactly worked out for him -- which might explain why he decides to take the selfish route in the Season 3 finale.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    In one of the season's more moving flashbacks, we see a young Boo -- who rejected the traditional trappings of femininity from a young age -- clashing with her mother over what to wear. Later, she makes the decision not to visit her mother on her deathbed if it means pretending to be something she's not. As she puts it, "I refuse to be invisible, Daddy. Not for you, not for Mom, not for anybody.”

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    We still don't know what landed Brooke Soso in the slammer, but a late-season flashback suggests that some seriously overbearing parenting may have been the impetus for her downward spiral.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    We already know a little about Poussey's relationship with her military father, but this season we saw a softer side of the spunky fan-favorite, who still pines for the loving mom that she lost too young.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    Pennsatucky had something of a redemption arc this season, and glimpses of her childhood only serve to increase viewer sympathy for the character, whose mother forced her to chug Mountain Dew outside the Social Security Administration office and stripped her of her sexual agency before she was even old enough to comprehend it.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    This season, we got an intense look at the teenage life of one of Litchfield's most isolated and underexplored inmates. Rebuffed and scorned by her suitor at an arranged marriage, the young Chinese immigrant stored up a grudge, and ultimately exacted a merciless revenge.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    It's difficult to sympathize with the racist, misogynist CO Sam Healy, but the snippets we get of his childhood -- raised by a mentally ill mother, vomited on by a homeless man he mistakes for Jesus when he runs to the church for help -- certainly help us understand him better.

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    This season, we learned a lot about one of Litchfield's biggest enigmas, as we saw the roots of Norma's silence (a childhood stutter) and the reason for her incarceration (killing the oppressive cult leader she followed for decades).

    The 12 most incredible pint-size look-alikes in "Orange Is the New Black" season 3

    While Nicki's mother certainly isn't entirely to blame for her daughter's struggles with addiction, an early childhood flashback -- of an adorable young Nicki being rebuffed on Mother's Day -- certainly helps us understand the roots of Nicki's scarred psyche.

  • Recent Slide Shows



Comment Preview

Your name will appear as username ( settings | log out )

You may use these HTML tags and attributes: <a href=""> <b> <em> <strong> <i> <blockquote>