Our latest guest post is by Stefan Elnabli, Moving Image and Sound Preservation Specialist at Northwestern University Library’s Digital Collections department. This is the first part of a two part post on reflections from the Preservation and Archiving Special Interest Group (PASIG) meeting on Jan. 11-12, in Austin, Texas.
Here’s a thought: life today without online banking, streaming video sites, social networking platforms, and email… hmm, I can’t imagine it. We depend so incredibly on information in the digital domain that it is impressive to think about how much data is created and saved on a daily basis. It’s even more impressive to think about how that myriad amount of data is managed, made accessible, and preserved to offer running services to the public.
What I am concerned with primarily in my job at Northwestern University Library as Moving Image and Sound Preservation Specialist is the latter topic of preservation. Our library strives to present culturally significant material to researchers and the public digitally, all the while advocating that this material must be archived, preserved in perpetuity, and made available upon request. As a collecting institution with a highly active digitization program, we acquire materials often and we digitize materials often. In my work with archival audiovisual materials, the amount of data that is created in the digital reformatting process is daunting. Considering the practical aspects of our digitization and preservation mission, a dilemma is presented: with limited resources such as funding, digital storage, and staff time (how nice would it be to have unlimited resources, eh?), we want to preserve everything but really we can’t preserve everything. It may at first sound counterintuitive regarding the former, but either of these two outcomes is equally undesirable in a practical context. To be clear, I am not saying that digitally preserving all of our collections is impossible. In fact, the technology and preservation models exist for us to achieve this today. However, in a practical context, if we were to devote all of our resources to accomplish this with a cultural heritage collection that is perpetually growing, we would have no resources to do anything else! Therein lies the dilemma.
The Preservation and Archiving Special Interest Group (PASIG) meeting that took place January 11-13, 2012, in Austin was the first of its kind that I attended to contextualize our digital dilemma across a multifaceted community of librarians, digital archivists, software developers, enterprise research specialists, and vendor representatives. The basis of the meeting was to share practical experiences with digital preservation in a variety of contexts, familiarize attendees with technological issues the community faces, and to discuss current trends of digital preservation practice and what the future holds. Among the important subject threads that tied the entirety of the three day meeting together were “perfect is the enemy of good,” bit level preservation is only a component of the goal of object level preservation, and open computing solutions and best practices must be shared and supported by the community if they’re going to be for the community.
Perfect is the enemy of the good. This relates to my mentioning of resources and their practical limitations. Every institution has unique challenges and shared ones. A commonly shared challenge in the library world is the limitation of resources to facilitate preservation services for evergreen collections. Even in an institution that claims resources are not an issue, due for example to pecuniary fortitude, there will always be data to preserve and infrastructure to maintain. In a perpetual future, workflows change, infrastructure requires updating, and unforeseen setbacks occur. There will always be the aim for improvement—something better, more efficient—especially in a practice that is constantly in flux with new file formats, standards, and technology. However we choose to attain the best digital preservation system in our respective environments, striving for perfection in light of our resources must not be a distracting process. Sometimes what is good now is perfect, and what is perfect is not good now. In William Kilbride’s introductory presentation to digital preservation boot camp, “Digital Preservation: What I Wish I Knew Before I Started,” the point that I have mentioned here was an underlying current. William Kilbride brought attention to this right off the bat, asserting we cannot wait for perfection. We must act now and periodically to evaluate our systems, of which obsolescence is a risk just as much as it is for file formats. Supporting this assertion, he detailed how to evaluate the components of preservation systems with a set of sustainability factors outlined by the Library of Congress. Even if we can’t be perfect now, we can do something that counts with the resources available.
Bit level preservation is only a component of the goal of object level preservation. When I tell people what I do for a living, one of the seemingly popular responses is, “So you digitize stuff and put it on DVD or a hard drive or something…” I do and I don’t. DVD is a very handy access medium—it’s tangible, portable, widely playable, and a whole bunch of other “ables,” but it is by no means a preservation medium. Hard drives also share some of these qualities, but you can keep the bits free from transcoding for DVD playback, retaining their original integrity. However, preservation means forever, and we know that spinning disks at some point always stop spinning. This means we need to be prepared to know specific information about files so that we can plan for their migration to new physical carriers. Amassing terabytes of data with the intention to preserve the integrity of that data and make it discoverable requires even more data, data about data. We call that metadata, and we love it. In order for files to be retained in a trusted digital repository, services need to be provided that associate metadata with files so we can track provenance, access the technical details of the files, and access descriptive information about them. The file and the metadata thus represent the digital object to preserve. In “A Vision for Digital Preservation,” Michael Peterson shared important points to consider in the face of infinitely growing data. Among those points were that preservation infrastructure needs to shift from the physical to the virtual and to utility-like services that support object-level retention. The idea that the resources of a digital preservation service can be divided into multiple environments relieves us of the confines of a physical repository silo. In this model, access platforms are a service, infrastructure is a service, storage is a service, and so on. This concept allows us to think about preservation at the point of creation, to consider all of the object’s components and how such services facilitate their preservation. With an eye toward the future, Peterson even claims that this model can lead to self-healing systems in the face of corruption and even cost reduction through “chunk-level” distribution and deduplication of data. At this point in our history, it’s not hard to imagine that this line of thinking will be the basis for innovation and the standard for practitioners.
Open computing solutions and best practices must be shared and supported by the community if they’re going to be for the community. This echoes a tenet of PASIG, one that was evidenced by the vocational diversity of its attendees and the forty-six speakers representing a range of institutional and corporate backgrounds from major research library to enterprise storage vendor. There are a variety of digital information types and people who preserve them: email and email archives, contracts and business records, electronic medical records, digitized genealogical records, audiovisual material, and the list goes on. No matter the type of data you preserve, we’re in this together because we share the same goal of digital preservation. We learn from each other when we are open about our experiences, our successes, and our failures—especially when we share technology openly. Two of the sustainability factors outlined by William Kilbride were transparency and external dependencies. For example, if a file format is proprietary then it may lack a degree of transparency to be directly analyzed with basic tools. It may even be dependent on external hardware or software. Formats that are locked down and dependent have proven to be terrible for preservation (remember all those WordStar files you can’t open anymore? Thankfully I don’t, I was not alive). If we do have to adopt proprietary formats, hardware, or software, then we must have a way to communicate to vendors what we need. PASIG had a great deal of representation from the likes of Oracle, Tessella, Microsoft, and more. They were not sequestered in a sales oriented vendor showroom, but actively involved in delivering presentations and participating in the group’s discussions to learn about and respond to the needs of the preservation community at large. In my experience, this aspect of PASIG was the most profound.
It’s easy to take for granted the amount of effort required for businesses and cultural institutions to offer digital services to the public. It’s even easier to take for granted the amount of data we create on a daily basis that we expect to save permanently! Because we depend so much on information management in the digital domain, it is critical that society understands the importance of preserving digital assets as much as the difficulty in which to do so. In an environment of limited resources, we strive to do our best despite the difficulties perpetually present. At my job, this is why policy is instated – to facilitate our mission objectives while sustaining our institution. In digital preservation, the most successful policies are the ones established on the basis of experience within the institution and throughout the community. PASIG was an enlightening experience not only because I learned from my peers, but also because I engaged with representatives from private sectors in a collaborative problem solving process rather than a sales pitch. With that being said, PASIG proved to be a valuable experience that I recommend to anyone in the field of digital preservation, whether that’s the public or the private sector.