Wednesday, August 26, 2015

How can we preserve our digital content for future generations?


My wife and I spent a week at the Chautauqua Institute in New York this summer. (For those who don’t know, Chautauqua is a summer-long festival of lectures, classes, music, theater and the arts, in a picturesque country setting in far-west New York State; they have been doing this since the 1880s.) One of the highlights of my experience there was attending a lecture by Vinton Cerf, a man the media often calls ‘the father of the internet’. Parent or not, his contribution to modern technology is huge, since he was the co-author of the TCP/IP protocol on which the internet is based, and was for years the man who managed ARPANET for the defense department. He’s now a vice president at Google and his age of 70+ years belies his vigor, intellectual sharpness and continued involvement in technology issues to this day.

Cerf’s topic was an interesting one, and something that I had not given much thought to before: how our digital formats and vast storage capabilities are going to make it harder for us to document and preserve todays’ information for future generations.

Yes, you read that right, and I know it sounds counter intuitive. After all, we know that information on the internet lives forever, and aren’t digital devices routinely recording every document, email, tweet, text, post, pulsebeat, etc., ad nauseam? So how is the digital revolution going to prevent us from preserving all that information?

The answer lies partly in the innovations that we take for granted in the computer age. Storage formats and protocols are constantly evolving, giving way to new versions and standards that are bigger, better and faster. But they’re also different, and there’s the problem. Anything saved on floppy disks, CDs and eventually memory sticks or even hard drives will eventually become inaccessible as the technologies to read these formats become obsolete and ultimately unavailable. You may have your whole system backed up on 3.5 inch floppies (well, I once did) – but you don’t now have a reader for it. The same thing goes for writeable DVDs. And the next storage format and the next one after that.

Cerf is also concerned that the everyday transactions of our lives are not being preserved. He cited the example of the historian Doris Kearns Goodwin, who in her book “Team of Rivals”, was able to reconstruct so much of the story of the Lincoln Administration through her use of all the letters that the principals wrote at the time. The paper those letters were written on was preserved, but what of the emails and tweets of today? Future historians may find it impossible to reconstruct the daily life of today, when it is being preserved unevenly, not well or not at all.

The problem is even worse with “dynamic content” – information that is preserved in a proprietary format that requires software to reveal it. For instance a WordPerfect document, produced, say in the year 1990, requires the WordPerfect program to read it. So much of our content today – from documents to web pages and emails – requires software to access it. How will future historians deal with this? This information will be like ancient hieroglyphics, to which the Rosetta stones have been lost. And incidentally, ancient software will require ancient hardware to run it too. That WordPerfect program, even if your great grandson has it, will require Windows 3.0 and a year 1990 PC to run it on. Good luck with that.

Cerf sees some hopeful signs here and there. One that he cited was a project called OLIVE (the Open Library of Images for Virtual Execution) established at Carnegie Mellon University, which is setting protocols for virtual machines and virtual OSs that can run virtual copies of old software, so that older content can still be accessed. Whether such an effort is scalable far into the future, is something I very much doubt. Cerf is no more gullible than I am: “I want to emphasize that it is technically challenging to make this work” was one thing he said to stress that this problem has no easy solution.

The title of his talk was “Digital Vellum”. Vellum refers to the ancient medium of parchment made from calf skin, which (more so than paper) was used for documents that were intended to last, well, forever. We need something similar for the content of today, or else future generations will barely know us.

As a technology guy, I was thrilled to hear Dr. Cerf deliver his lecture in the Chautauqua amphitheater, and I was equally thrilled to meet him afterwards and chat for several minutes. I can assure you that this technology pioneer is not only brilliant and insightful on the high tech issues of today, he is also warm, personable, friendly and easy to talk to. I was impressed with him and with his message. By the way, his recommendation for content that you feel absolutely has to be preserved for the future? Paper.
Howard with Dr. Vinton Cerf at Chautauqua

1 comment:

NikeBlack said...

Yes, let's hear it for all that paper in libraries and elsewhere. I have heard about this problem from an organization in Chester County - one that has water data going back to the 1970's. I don't know that they have solved their problem of how to preserve the information for the future. I envision a point when we begin de-digitizing things - going from the digital to a paper format.