When Data Disappears
By KARI KRAUS
Kari Kraus is an assistant professor in the College of Information Studies and the English department at the University of Maryland.
College Park, Md.
LAST spring, the Harry Ransom Center at the University of Texas acquired the papers of Bruce Sterling, a renowned science fiction writer and futurist. But not a single floppy disk or CD-ROM was included among his notes and manuscripts. When pressed to explain why, the prophet of high-tech said digital preservation was doomed to fail. “There are forms of media which are just inherently unstable,” he said, “and the attempt to stabilize them is like the attempt to go out and stabilize the corkboard at the laundromat.”
Mr. Sterling has a point: for all its many promises, digital storage is perishable, perhaps even more so than paper. Disks corrode, bits “rot” and hardware becomes obsolete.
But that doesn’t mean digital preservation is pointless: if we’re going to save even a fraction of the trillions of bits of data churned out every year, we can’t think of digital preservation in the same way we do paper preservation. We have to stop thinking about how to save data only after it’s no longer needed, as when an author donates her papers to an archive. Instead, we must look for ways to continuously maintain and improve it. In other words, we must stop preserving digital material and start curating it.
At first glance, digital preservation seems to promise everything: nearly unlimited storage, ease of access and virtually no cost to making copies. But the practical lessons of digital preservation contradict the notion that bits are eternal. Consider those 5 1/4-inch floppies stockpiled in your basement. When you saved that unpublished manuscript on them, you figured it would be accessible forever. But when was the last time you saw a floppy drive?
And even if you could find the right drive, there’s a good chance the disk’s magnetic properties will have decayed beyond readability. The same goes, generally speaking, for CD-ROMs, DVDs and portable drives.
Even the software needed to read the bits may prove elusive. Like Egyptian hieroglyphs, whose code was indecipherable until the rediscovery of the Rosetta Stone, the string of 1s and 0s on a floppy is meaningless in the absence of a set of computer instructions for translating them. If you don’t have a copy of WordPerfect 2 around, you’re out of luck. No wonder preservationists often wax ominous about the “digital dark ages.”
Of course, there’s always the option of migrating data from old to new media. But migration isn’t as simple as copying files — it’s more like translating from Japanese to Hungarian. Information is invariably lost; do it enough times and the result will be like the garbled message at the end of a game of telephone.
Another option is emulation, in which a software program impersonates a retro hardware environment; essentially, an emulator temporarily “downgrades” a modern computer to act like an old one. But over time, emulation becomes unwieldy: because the host systems for which emulators are designed will themselves become obsolete, emulators must eventually be moved to new computer platforms — emulators to run emulators, ad infinitum.
Nor is the problem just with the medium. We generate over 1.8 zettabytes of digital information a year. By some estimates, that’s nearly 30 million times the amount of information contained in all the books ever published. Even if we had perfectly stable storage, could we ever have enough to preserve everything?
The short answer is no — but only because we’re trying to replicate the practices used for decades to maintain paper archives. In this model, preservation begins only after a record is past its use. With data, intervention needs to happen earlier, ideally at an object’s creation. And tough decisions need to be made, early on, regarding what needs to be saved. We must replace digital preservation with digital curation.
Perhaps the most impressive effort to curate digital information is taking place in the realm of video games. In the face of negligence from the game industry, fans of “Super Mario Bros.” and “Pac-Man” have been creating homegrown solutions to collecting, documenting, reading and rendering games, creating an evolving archive of game history. They coordinate efforts and share the workload — sometimes in formal groups, sometimes as loose collectives. Nor does the data just sit around. These are gamers, after all, so they are constantly engaged with the files. In the process, they update them, create duplicates and fix bugs.
Despite often operating in legal gray areas, such curatorial activism can be a model for other digital domains. A similar pattern is emerging in data-intensive fields like genetics, where published data sets are often “cleaned” by third-party curators to purge them of inaccuracies.
It might seem silly to look to video-game fans for lessons on how to save our informational heritage, but in fact complex interactive games represent the outer limit of what we can do with digital preservation. By figuring out how to keep a complex game, like a classic first-person shooter, alive, we develop a better idea of how to preserve simulations of genetic evolution or the behavior of star systems.
True, not all data is worth saving. But that’s as true for bits as it is for sheets of paper. In this model, at least, the decisions on what to save are informed by a deep knowledge of the field, while the cost is shared by everyone involved.
Above all, the model allows us to see preservation as active and continuing: managing change to data rather than trying to prevent it, while viewing data as a living resource for the future rather than a relic of the past.