Archiving for the longue durée (Tools we use)

Do you backup? Good. But not good enough.

First, lets talk about backup. A good backup strategy should be regular, redundant, and involve multiple locations. Regular, so that you don’t have to worry about whether or not you backed up your data the day, week, or month before you accidentally spill your soup on your keyboard. It should be redundant, so that if your backup drive was shorted out by the same thunderstorm that destroyed your computer you still have another copy. And it should involve multiple locations so that if a fire burns down your house there is still a copy of your most important stuff at your parent’s house.

There are lots of ways to make sure you meet these basic requirements. My solution involves:

I feel pretty good about this system. It may not be perfect, but it meets the minimal requirements I listed above. However, it isn’t good enough for me, and it might not be good enough for you either…

There are two reasons why it isn’t good enough. The first is because you might have a lot of data that is on external drives which doesn’t get backed up this way. If you count your video files in terabytes instead of gigabytes, backing this stuff up isn’t easy. Sure, you could duplicate your drives and leave a copy at your parents house, but it turns out that this isn’t particularly reliable.

This brings us to the second reason: hard drives aren’t an archival storage solution. They might be good for ten to twenty years, but that is assuming you have them stored in a climate controlled room and maintain them properly by taking them out and spinning up the drive once a year. Even then, the magnetic properties of the drive will inevitably decay over time, resulting in small errors which might go unnoticed, but might just as easily corrupt important files. A decade might seem like a long time, but imagine you are going back to listen to an interview you conducted for you Ph.D. research and discover that the file is corrupted? Wouldn’t you prefer to have those files stored on something a little more durable?

So how do you go about storing large amounts of data without loosing sleep over the rate of magnetic decay?

The best solution for long-term storage is to carve your data on a rock. Unfortunately, carving all those ones and zeros might take a bit of time, but you can do a little better by investing in a Blu-ray M-Disc burner. When properly stored, data stored on an M-Disc should last at least one thousand years! That’s because it is essentially the same as carving it on a rock:

While the exact properties of M-DISC are a trade secret, the patents protecting the M-DISC technology assert that the data layer is a “glassy carbon” and that the material is substantially inert to oxidation and has a melting point between 200° and 1000 °C.

M-Discs burners and media are relatively inexpensive, but they currently only run in gigabytes, not terabytes, and the recording process is slow, so you aren’t going to be using M-Discs to backup your video archive.

The option I chose was to use the technology that is standard in many large corporations: tape drives. mLogic makes a Thunderbolt LTO-6 drive which works with relatively high capacity tapes. These tapes are cheaper and longer lasting than hard drives (I’ve read that they can last around 25 to 30 years). But I can’t say I’m happy with them. The first problem is that the longevity of tapes ignores the fact that the technology is constantly being upgraded and each new generation of tape drives will only work with x number of previous generations. That means that eventually you may not be able to find a machine which can read your tapes. The second problem is the software. LTO drives like mLogic advertise that they work just like desktop drives in LTFS mode, but my experience was that it was nearly impossible to make reliable backups of large amounts of data in LTFS mode. Instead I found specialized backup software which uses its own proprietary compression techniques. This software is expensive and difficult to use. Since we have a system going, I don’t mind using tapes for now, but I wouldn’t recommend it for anyone who doesn’t have a full-time IT department running backups for you.

So what’s left? Jon Jacobi has a pretty good rundown of the options in an article from PC Magazine earlier this year. Besides the ones I’ve listed above he also mentions using various online backup services, but points out the problems:

… there are drawbacks. First off, though the means of delivery may seem magical and your data is often referred to as being safely stored “in the cloud,” in reality, it’s stored on someone else’s hard drives or other media. It’s as safe as a given service has made it.

Then there’s the ongoing cost in the form of monthly fees, and in some cases transfer charges. Also, speed and availability are limited by your online connection (DSL often has very slow upload speeds) and when your service is down, your archive is unavailable. There are also privacy and security concerns. I consider these trivial, but just FYI—the NSA had a hand in funding just about every open-source encryption project out there.

So what does he recommend in the end? Surprisingly enough, hard drives. But he recommends maintaining them regularly by recopying the data every year. It sounds like a pain, but it actually might be easier than the tape backup solution I have.

Finally, there are two other points he makes in his article which I think are worth mentioning:

  • First, “Stay away from proprietary file formats if possible.” PDFs might still be readable in a few decades, but other formats are likely to be phased out.
  • And second, “Don’t use encryption except for truly sensitive data. Passwords can be lost or forgotten. Remember we’re talking long haul here.”

I’ve mostly been talking about our own needs as researchers, but we might also want to keep in mind the needs of future data archaeologists. If you archive in a way that will work for them, it will likely work for you as well!

2 thoughts on “Archiving for the longue durée (Tools we use)

  1. Entrusting your files on these big cloud data storing services – i.e. computers that belong to others – is probably not the best idea. The main concern is with privacy (remember Snowden revelations?), but also for the freedom aspects ( https://www.youtube.com/watch?v=Cl6XFZH5aWU ). If you definitely need a cloud, an option is to create your own. Its not so difficult. for example using a cheap Raspberry Pi computer and ownCloud – manuals on how to do it are easily to find. More alternatives exist: https://prism-break.org/en/categories/servers/#file-storage-sync

Leave a Reply