Archiving media requires devoting resources in the present to make sure that, in the future, we can access the past. The goal of implementing good archiving and storage practices is to therefore preserve our media down the line. There has not been a comprehensive study of the preservation of public media, but the statistics that we do have are frightening.
In a study conducted by the Columbia Journalism Review, they found that 19 out of the 21 participating news organizations were not taking any steps to preserve their web output. From the interviews with 48 people, they discovered that: “the majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces.”
In a 2019 Preserve This Podcast survey of 556 podcast creators, 28% of respondents said that they were “unfamiliar with their organization’s backup strategy”, and another 11% said their organization had “no system in place for archiving files.”
If news is the first draft of history, then history is being erased due to a lack of institutional knowledge and resources going towards media archiving and storage.
If news is the first draft of history, then history is being erased due to a lack of institutional knowledge and resources going towards media archiving and storage. In today’s digital media environment, the act of storing and archiving media quickly gets technical, bogged down in the mind-melting trivia of products, brands, price tags and procurement processes. In this piece, I’d like to provide some macro concepts and considerations for archiving media that apply to any media, at any size.
The actual system or suite of tools that a media organization uses to create a repository for its digital assets is called a Digital Asset Management System, or DAMS. Some are called Media Asset Management System, or MAMS. A DAMS or MAMS can be a third-party product or homegrown; expensive or budget; one piece of software or a medley of interconnected tools. But all of them need to assist in the following:
Appraisal, Arrangement and Description, Preservation, Access: This is the life cycle of a record, using archives terminology. There are lots of DAMS to choose from, but they should all take care of these four archival functions.
Significant properties: What makes a thing a thing? This sounds philosophical, but it’s a question that archivists struggle with everyday. In order to preserve something, you need to understand its essential qualities. For example, in order to preserve a podcast, we need to understand what makes a podcast a podcast. Is it just the audio file? Is it the RSS feed? Is it the artwork and the comments from listeners? In order to preserve media, it’s helpful to start by defining its significant properties.
Provenance: This is an age-old archival principle that applies more now, in the era of deep fakes, than ever before. The provenance of a record is its origin and chain of custody. Where did this record come from? Who created it? Who edited it? What was its original context? This is a crucial concept for asserting authority, truth, and verification.
Fixity and Checksums: Checksums and fixity are scripts, or tools, that ensure that digital files have not been changed or corrupted. They can be used in bulk for files that are being migrated in large batches.
Evidential value: The provenance of media is so important because media provide evidential value. It is evidence for what happened, when it happened, how it happened, etc.
Digital Content and its Discontents
The banal complexity of digital archiving, combined with its unfortunate spot at the end of the information lifecycle, means that archiving is too often neglected, under-resourced or inadequate. This is a problem.
Content on the internet feels ubiquitous and ever-lasting, but the opposite is true. Digital media is just as at risk, if not more at risk, than analog media.
The ever-presence of the internet is deceptive. Content on the internet feels ubiquitous and ever-lasting, but the opposite is true. Digital media is just as at risk, if not more at risk, than analog media. The vast majority of contemporary public media is born-digital – meaning that the media asset was created as a digital file. Digital files face unique preservation challenges.
Firstly, digital files require layers of hardware and software to render them. If one layer of technology goes obsolete, deprecates, or becomes inaccessible, then the asset is lost. From WordPerfect to Zip disks to miniDVs, recent history is littered with technologies that have been abandoned for one reason or another, leaving media files impossible to retrieve without digital forensics equipment.
Secondly, for the tech industry itself, digital preservation is a game of constant change and migration. Storage devices, like hard drives and computers, have a median lifespan of about six years. Every five to six years, files need to be migrated onto new devices. Files stored on one, or even two devices, are not safe. Digital storage media are fragile – they can be damaged by liquid, rogue magnetic waves or technical failures. Three or more copies of an asset is the digital preservation industry standard.
Thirdly, digital media production is awash with third-party proprietary software that require licenses to access. These include audio and video editing software like Adobe Creative Cloud, ProTools, or Izotope. It also includes cloud storage platforms like Dropbox, Google Drive and Amazon S3. Assets that are stored on proprietary platforms are always at risk because they are tied to accounts.
Based on conversations with journalists and media makers, it’s clear that devoting time and resources toward preservation can be a challenge. Digital publishing cycles are brutal. It’s easy for content creators to get consumed in the production process without taking the extra step to archive their work post-publication.
But implementing archiving and storage best practices yields increasing returns. Digital preservation can be built into the production workflow, and automated through tools and scripts. Through my work with independent podcasters, I’ve also witnessed the myriad ways that preservation helps with production.
Preservation requires file organization schemas, consistent file naming, backup storage copies, and solid metadata. All of these steps have the added benefit of making the production process smoother, especially if you’re working with a team.
So take some time to think about why you are creating media before you begin any production. Why is the work that you do important? Putting an archival system in place takes time and energy, but if you remember why it’s important, it will all seem a little easier – and more urgent.
When it comes to digital media, the question is not if, but when, it will be lost. The threat of losing your life’s work is existential, but unfortunately it’s not hypothetical. I’ve heard from journalists who lost years of their life’s work when their website got shut down. I’ve talked to audio engineers whose backups were all compressed because they had accidentally checked the wrong button in the settings of some software. I’ve talked to so many people about how their hard drive died in some kind of spoof accident. All journalists, editors, producers and media organizations need to put a preservation plan in place if we want to save the future of public media. The time is now.
Molly Schwartz (@mollyfication) is a Digital Media Fellow at Mother Jones. She is the host and producer of two podcasts about libraries and archives – Preserve This Podcast and Library Bytegeist. Molly did a Fulbright grant at the Aalto University Media Lab in Helsinki, was part of the inaugural cohort of National Digital Stewardship Residents in Washington, D.C., and worked at the U.S. State Department as a data analyst. She holds an MLS with a specialization in Archives, Records and Information Management from the University of Maryland at College Park and a BA/MA in History from the Johns Hopkins University.