A tape I cannot read
When I left Bell Labs in the late 1990s, I was given a cartridge tape containing my files: papers, slides, email, and a lot of half-finished ideas. I still have it.
I also still cannot read it.
I never had a drive that could read it. I had Windows and Linux PCs, and this was an archive that was probably created on a Sun workstation. Even if I found the right hardware on eBay, I would still need to answer a second question: what exactly is on the tape?
Was it a plain tar or cpio archive? Was it a vendor-specific backup format that expects a specific program? Without that context, "having the bits" is not the same as "having the information."
That tape is my reminder that digital preservation is not one problem. It is a stack of problems, and the stack tends to fail at the weakest layer.
The "digital dark age" warning
Vint Cerf, one of the architects of the Internet, has warned that we could create a "digital dark age"—a future where we have a vast amount of recorded history but no practical way to access it.
The point is not that storage is impossible. Long-term readability requires the entire ecosystem to survive: media, hardware, operating systems, applications, file formats, and an understanding of how they fit together.
None of that will happen.
1. Hardware disappears faster than you expect
To read old data, you often need old machines, or at least old interfaces.
Think about removable media alone: punched cards; various types of reel-to-reel tape drives; then 8-inch, 5.25-inch, and 3.5-inch floppy disks; tape cartridges in dozens of formats; Zip and Jaz drives; then an endless parade of memory-card shapes and standards. Even when the media still exists, the readers stop being manufactured, drivers stop being maintained, and working units eventually fail.
Beyond the media itself, the interfaces disappear. Early PCs used disk controller cards that only make sense inside the machine and OS they were designed for. Then came waves of disk interfaces: MFM/RLL, ESDI, IDE/ATA, SCSI, SATA, and now NVMe. Each transition reduces the number of people and parts that can bridge the gap, and sometimes the only practical "adapter" is to resurrect an entire old system.
My first laptop was an IBM ThinkPad 750Cs, and it had a 2.88 MB extended-density floppy drive. That format never became common. Today, "can I find a working reader?" is not a hypothetical question; it is the preservation problem in miniature. Should I happen to come across any old floppy disks in that extended-density format from decades ago, I have no hope of reading them.
A few years ago, I came across a bunch of Jaz drives in my basement. I used them both as backups and as complete, bootable file systems for various Linux distributions. These were 1- and 2-GB removable hard disks that were produced by Iomega in the late 1990s. I even had the reader. Unfortunately, the reader had a 50-in SCSI-2 interface, but no computer I've had for the past 20+ years supported this interface. I also had a SCSI-2 adapter card for a PC, but the adapter was a PCMCIA interface that, again, none of the computers I've had in recent history supported. I'd have to boot up a vintage computer and run a version of Linux to read the data on an ext2 file system. The effort wasn't worth it to me. Doing this will get progressively more difficult over the years until it will become impossible.
Unix V4: A 50-year-old tape comes back to life
In December 2025, a remarkable recovery demonstrated both the fragility and the possibility of recovering and preserving old data. A tape containing Unix V4, the first version of the Unix operating system in which the kernel was written in C, was successfully recovered from a 1970s nine-track tape that had been sitting forgotten at the University of Utah for over half a century.
The software dates to 1973. Not ancient history by any standard.
Al Kossow of the Computer History Museum used specialized equipment to read the raw magnetic flux variations from the tape and then reconstructed the digital data. Only two blocks had read errors; the rest were recoverable. The recovered system now runs in an emulator, and it can be downloaded from the Internet Archive.
But this recovery required a working nine-track tape drive, custom flux-reading software, expertise in 1970s tape formats, and emulation software to run the recovered operating system. Without that chain of knowledge and equipment, the tape would have remained an unreadable artifact.
The deeper lesson is sobering. This tape wasn't lost because anyone decided it was worthless. It was simply not considered interesting or valuable enough over the past several decades to be periodically copied into newer formats. It sat in storage while the world moved on. The recovery happened because someone cared enough to look for it ... and because the right expertise still existed.
Apollo 11: When the hardware becomes archaeology
A more historically significant example comes from Apollo-era telemetry and video, and it is precisely the kind of problem that does not appear on a neat "backup checklist."
During Apollo 11, the lunar module carried a slow-scan TV camera that transmitted an unconventional format: fewer lines and a much lower frame rate than broadcast television. The world watched the moonwalk live, but what viewers saw was a converted signal. At the ground stations, the raw feed looked better on the slow-scan monitors than what made it through the scan-conversion chain.
NASA's ground stations did not just watch the signal; they recorded it. As the downlink arrived at Goldstone, Honeysuckle Creek, and Parkes, the raw slow-scan video and other telemetry were captured on one-inch analog instrumentation tape. These were wide-band, 14-track recordings made on instrumentation recorders, running at very high tape speed, which meant reels had to be swapped constantly during the EVA.
After the mission, those one-inch telemetry tapes were boxed up, shipped to the Goddard Space Flight Center, and then transferred into long-term storage. Decades later, the tapes could not be located. The best explanation is also the least dramatic: they were treated as reusable media, degaussed, recertified, and put back into service during a tape shortage years later.
Even if the original Apollo 11 tapes magically turned up tomorrow, reading them would not be a matter of finding "a tape drive." By the early 2000s, the world's last remaining machine capable of playing the slow-scan tapes was sitting at Goddard's Data Evaluation Lab, and the center was planning to mothball the facility. The ability to recover the data depended on a specialized, aging, effectively one-of-a-kind playback and processing chain.
If that chain disappears, then the job becomes archaeology: you have to reconstruct hardware, signal formats, and calibration details, and you have to do it without the benefit of a living supply chain or a room full of engineers who have touched the equipment recently.
If your preservation plan assumes you can always find the right hardware later, you are making a bet.
2. You might have the drive, but not the filesystem
Let's say you somehow find working hardware. Great. Now you have to interpret what is stored.
Sometimes this is easy: a FAT-formatted disk or a plain tarball is friendly to future you.
Sometimes it is not: RAID layouts, proprietary NAS formats, or backup systems that spread data across incremental sets, requiring the original software logic to reconstruct a point-in-time view. Even when the underlying blocks are intact, you can still be missing the map.
At home, I have had RAID arrays in a couple of Thunderbolt-connected Promise boxes (which I haven't used in years) and in QNAP NAS servers, and they excel at one job: keeping you running when a disk fails. However, they are not a guarantee that I will be able to reassemble the data later with different hardware.
If the enclosure fails, I will need to hope to find compatible hardware or have a good understanding of how the array was laid out, and a great deal of patience to reconstruct it. Even when the disks are fine, you can still be stuck because the map of the RAID layout lives in metadata that assumes a particular implementation. In the best case, you move the disks to an identical unit and it comes back. In the worst case, you have a box of healthy disks and no straightforward way to interpret what is on them.
The BBC Domesday Project: Digital irony
In 1986, the BBC created a modern "Domesday Book" to mark the 900th anniversary of the original. Over a million people, mostly schoolchildren, contributed text, photographs, video, maps, and personal testimonies, creating a digital snapshot of life in Britain.
The project was stored on LaserDiscs in a proprietary LV-ROM format, which required an Acorn BBC Master computer with custom hardware and a specialized Philips LaserDisc player. It was cutting-edge multimedia for its time.
By 2002, the content was nearly inaccessible. The hardware had become rare. The format was obsolete. The irony was hard to miss: the original 1086 Domesday Book, written on sheepskin parchment, remained perfectly readable after 900 years. The digital version was struggling after 16.
The content was eventually rescued through emulation and data extraction efforts, but it required years of specialized work. The head of the original project criticized the archivists for failing to preserve the material effectively, but the deeper problem was that no one had planned for a format that would become obsolete within a generation.
The data in its LaserDisc format could have been lost forever after only 25 years. The sheepskin parchment has lasted 900.
Even if we could preserve the knowledge of how to decode the data, it's not clear how long the polycarbonate platters of a LaserDisc or the adhesive that binds the thin layer of aluminum to them would last. A phenomenon known as Laser Rot has been used to describe the situation where adhesives cause a chemical breakdown of the aluminum layer that contains the data.
RAID helps with one kind of failure. Preservation has to plan for the other kinds, too.
This is one reason archivists push for restore tests, not just "backups exist." A backup that cannot be restored is not a backup. It is a comforting myth.
3. Formats go obsolete (or get blocked)
Even if you extract a file successfully, you still need software that can interpret its format.
Text is the lucky case. Plain text survives. That is one reason formats like LaTeX, troff, and source code age better than many binary document formats.
But many formats are fragile in practice, including some that once felt universal.
Microsoft Office is an example of a subtle failure mode: some older Word formats can no longer be opened in modern Office builds. I discovered this because many of my early lecture notes were written in Microsoft Word, and I can no longer open those files. The risk of storing in formats like Word is that I am depending on a long chain of compatibility, security policies, and vendor support. A document format can become "legacy" not only because it is technically unreadable, but also because modern tools choose not to support the format anymore.
I have the same concern in the other direction. At Bell Labs, some of my documents were done in FrameMaker. It was readily available to us and had a better interface and much better typography than Microsoft Word. I no longer have access to the program and the files are essentially useless now.
That is one reason I mostly use Markdown for my documents these days.
Even plain text isn't plain
While text may seem to be universally readable, that hasn't been the case.
-
ASCII became a U.S. standard 7-bit character code in 1963 but the character set did not cover most non-English characters
-
At the same time, IBM encoded characters in their own EBCDIC format, which was not compatible with ASCII.
-
Japan defined a large character set with JIS X 0208 in 1978.
-
China's GB/T 2312-1980 became a foundational Simplified Chinese character set in 1980, while Taiwan created the Big5 format in 1984 for Traditional Chinese characters
For languages like Greek and Cyrillic, legacy text often depends on an implicit code page (ISO-8859- vs Windows-125 vs DOS vs Mac), and if you guess wrong, the file still ‘opens’ but turns into garbled text because the same bytes map to different letters.”
In the worst case, decoding old text files can be treated as a simple exercise in cryptanalysis: know or guess the language, deduce the bit length, and try different substitution alphabets.
NASA's Viking data: Lost in translation
Data from NASA's Viking missions to Mars in the 1970s was stored on magnetic tape and microfilm: the archival formats of the day. The tapes began to dry out and crack. NASA transferred the data to CDs in the 1990s, but the software used to view the images was created specifically for the mission and is no longer supported.
Recovering just 3,000 of more than 56,000 images took two years. A scientist at Goddard's archives recalled holding the microfilm for the first time: "We did this incredible experiment, and this is it, this is all that's left. If something were to happen to it, we would lose it forever."
The bits were preserved. The ability to interpret them was not.
Photography's format fragmentation
Photography adds another twist. RAW formats are almost always vendor-specific and evolve by camera model. The file might still be intact, but future software may stop supporting it, especially if you also rely on a companion catalog or edit history.
Each camera manufacturer uses its own proprietary format for raw images, and that format is often incompatible across models by the same manufacturer. Raw images hold far more data than JPEGs, and many photographers shoot and store images in these formats. Software like Adobe Lightroom stores edits to these images, keeping the originals unmodified. What happens when future versions start dropping support for outdated models?
Even web image formats shift. WebP has become widely supported across major browsers, but "widely supported" is not the same as "assured for decades or centuries." Apple's HEIF/HEIC became a mainstream default in the iPhone era, and it already comes with compatibility caveats across devices and ecosystems.
4. Media decays quietly
Even if you solve the hardware, filesystem, and format problems, the physical media itself has a shelf life.
Magnetic media
Magnetic storage loses its data over time through gradual demagnetization. The magnetic domains that encode your bits slowly randomize. Temperature, humidity, and stray magnetic fields accelerate the process.
Beyond demagnetization, the adhesion of magnetic coatings to their substrate decays over time, making the media fragile. When old tapes are read, the magnetic coating can literally rip off and stick to the read heads—destroying the data in the act of trying to recover it.
Flash memory
NAND flash stores data as electrical charges trapped in floating gates. Those charges naturally leak over time. The cells degrade with each write cycle, and even unused flash can lose data after years of storage—especially at higher temperatures.
Optical media
When writable CDs came out, people treated them as archival media: something that will last for hundreds of years. That turned out to be optimistic.
Writable CDs and DVDs are based on organic dyes. Phthalocyanine (light-green dye) claims to offer 100+ year stability. Azo (dark blue dye) offers decades-long life. The older cyanine (blue) dyes are less stable, offering around 50 years of life. In all cases, the discs are sensitive to quality, write speeds, handling, and storage conditions.
Scratches matter, and the label side can be surprisingly fragile on some discs because that is where the reflective layer lives. Longevity also depends on how the disc was written. High-speed burns, low-quality media, and poor storage conditions can all shorten life.
Even if the disc is intact, the "reader problem" returns. Modern computers have no optical drive at all, so playback hardware becomes another piece you have to hunt down.
Hardware you plan to keep will also age
Even if you try to keep old hardware around, electronics are not frozen in time. Electrolytic capacitors dry out. Fans seize. Rubber belts and pinch rollers turn to goo. Connectors oxidize. Batteries leak. Relays and switches get intermittent.
Power supplies can fail in several ways. Capacitors drift, rectifiers and regulators fail, and transformers can develop insulation problems after decades of heat cycling. On older equipment, even the mechanical parts are consumables: stepper motors, servo assemblies, bearings, and read/write heads wear out.
Tin whiskers are another slow-motion failure mode: microscopic metal filaments that can grow and short adjacent conductors. The practical result is that "I'll just keep one old machine in the closet" often turns into a restoration project.
If the plan is "store it and forget it," the physical world will eventually disagree.
5. The cloud is convenient, not eternal
Cloud storage can be a good operational choice, but it is not a guarantee of permanence.
Cloud services change business models, retire products, and sometimes shut down. Google's retirement of Picasa Web Albums is one example: users were moved toward Google Photos, and the old service was phased out on Google's timeline, not the user's. AT&T Photo Storage is shutting down in 2026 and backups stopped in 2025. In 2009, Digital Railroad ran out of money and users were given only 24 hours to retrieve their data.
Even when providers offer export tools, that still puts the burden on you to notice the change, run the export, validate it, and then maintain the exported data going forward.
Cloud can reduce day-to-day operational risk, but it does not eliminate long-term preservation work. It just changes who does what, and when.
Improving your odds in the short term
If you want your data to survive more than a laptop upgrade cycle, treat preservation as a repeating task.
Prefer open, boring formats. For text, UTF-8 plain text, Markdown, and (when you need a final form) PDF/A tend to age well. For images, keep a high-quality export in a widely supported format alongside RAW if you care about the content long-term.
Migrate on a schedule. Every few years, copy to new media, verify checksums, and make sure you can still open the files.
Keep multiple copies in different places. Use administrative and geographic diversity: local storage, cloud storage, and an offline copy.
Preserve context, not just files. For important collections, keep a short README that explains what is inside, what created it, and what software versions you used.
Test restores. Pick random files quarterly or annually and actually restore and open them.
However, this doesn't address the challenge of preserving data for centuries or millennia.
What "permanence" would really require
If you zoom out, you can summarize the preservation challenge as three questions:
-
Can we keep the bits intact for a long time?
-
Can we reconstruct the environment needed to interpret them?
-
Can a future reader understand what the data means?
Modern digital preservation has to address barriers well beyond "store it somewhere safe." Preservation is a framework problem, not a single technology problem.
There are fascinating "durable media" ideas, including writing data into extremely stable physical substrates like quartz crystals, etching data into ceramics, or encoding data into DNA. But even perfect media does not solve the format and interpretation layers. A future archaeologist might recover a flawless binary blob and still have no idea what to do with an Excel file without the application logic, documentation, and context.
The stack keeps failing
My old tape is a tiny version of the same story. The bits might still be there, but the surrounding world moved on.
Digital preservation is not a single problem with a single solution. It is a stack comprising media, hardware, interfaces, file systems, formats, and applications. The stack fails at its weakest layer, and that layer changes over time. What was state-of-the-art storage becomes landfill. What was a universal format becomes a legacy curiosity.
The solutions are not glamorous: migrate regularly, use open formats, test restores, keep multiple copies, and document what you have. But even these only extend the timeline; they don't eliminate the problem.
If we want future generations to understand us, we have to think beyond storage. We need durable media, durable formats, durable context, and a little humility about what we cannot foresee.
For a companion discussion of the human dimensions of preservation, what gets saved, what gets lost, and why, see What We Choose to Remember.*