Inside CDL

UC Libraries Digital Preservation Program

Preserving UC's Research, Teaching, and Learning

In partnership with the University of California libraries, the California Digital Library established the digital preservation program in 2002. The program ensures long-term access to digital information that supports and results from research, teaching, and learning at UC.

See sections below:

Services

  • UC Libraries Digital Preservation Repository: Supports the long-term retention of digital objects for the benefit of the UC libraries and their users.
  • CDL Web Archiving Program: Develops tools and standards and engages in research and collaboration to support the preservation of vital web resources. The primary services of this program are:
    • The Web Archiving Service: Enables librarians and scholars to capture and analyze web-based content, and to create publicly accessible web archives.
    • Web Archives: CDL hosts access to the archives created with the California Digital Library's Web Archiving Service. See The California Digital Library and Web Archiving [PDF] for an overview of the archives.
    • The Web-at-Risk: A distributed approach to preserving our political cultural heritage. Funded by a Library of Congress grant and completed in July 2009, this project developed web archiving tools now used by libraries to capture, curate, and preserve collections of web-based government and political information.

Curation Microservices

Developed with CDL programs and partners (e.g., LoC, UMich), curation micro-services offer an unbundled alternative to all-in-one repositories that can be expensive to support and modify (cf DSpace, Fedora, LOCKSS). Using native operating system file and web services, we define minimal conventions to turn a file system into an "object system" and provide low barrier tools for full lifecycle enrichment (identity, fixity, replication, annotation, etc.) of objects. For more background see curation services.

Open specifications and tools. We welcome feedback on these works in progress.

  • Noid (Nice Opaque Identifiers): Noid provides minting, binding, and resolving services in support of preservation-ready identifiers. Persistent identifiers may be obtained by a committed provider with help from these kinds of identity services. Software: download.
  • Dflat: Simple File-Based Object Storage: An object residence, or "digital flat". Common amenities, such as versions, metadata, annotations, administrivia, and the occupant itself (as intended by the depositor), if present, are always found under reserved names. We will likely have "Dflats" at the ends of Pairtree paths.
  • Pairtrees for Collection Storage: A filesystem convention for holding a collection of digital object directories. The directory path ending at an object is formed by taking the identifier and making a sub-directory for each next pair of characters. Conversely, one can recover every object and its identifier simply by "walking" the Pairtree. Software: download.
  • Content Access Node (CAN): A CAN holds a repository instance, which is a set of collections (Pairtrees) plus policy configuration files to govern such things as fixity, replication, indexing, and annotation, depending on the purpose of the repository.
  • CLOP: A Class-Based System for Managing Object Properties: Allows policy declarations to be attached to files, versions, objects, and entire repositories.
  • Directory Typing with Namaste Tags: Namaste (NAMe AS TExt) tags are primitive directory-level metadata exposed directly via filenames. As such, they greet visitors who request a directory listing with a glimpse of what the directory holds. Alpha software: download.
  • Reverse Directory Deltas (ReDD): ReDD is a way to represent differences between two sets of files, which permits great cost reduction when storing multiple versions. To optimize access to recent versions, a chain of ReDD "reverse deltas" stretches backward in time. We will likely use ReDD for Dflat version directories.
  • Checkm: a checksum-based manifest format: Checkm is a general-purpose text-based manifest format designed to support tools that verify the bit-level integrity of file groups for such things as content fixity, replication, import, and export.
  • JHOVE2 Architecture for Format-Aware Characterization: A next-generation framework and application for format-aware characterization, building on the succcess of the original JHOVE system. JHOVE2 generalizes the process of characterization to include signature-based identification, validation, feature extraction, and policy-based assessment.
  • BagIt File Package Format: A "bag" is a hierarchical file package format suitable for the exchange of generalized archival content via the network or hard-disk. It has just enough structure to safely enclose its payload but does not require the receiver to have any deep knowledge of its internal semantics. Software: download.
  • N2T: Name-to-Thing Resolver: N2T is a centralized, scheme-agnostic identifier resolver to protect URL stability for organizations with web server hostnames that might change.

Best Practices and Standards

  • Archival Resource Key (ARK): a naming scheme for preservation-ready identifiers. [HTML]
  • WARC File Format (ISO 28500:2009): co-authored by CDL preservation staff, this international standard specifies a structure for storing and exchanging resources harvested from the web and elsewhere. [HTML]

Partners

The CDL's preservation partners include the UC campus libraries, and digital library and preservation researchers around the world. Grant funding has been received from the Library of Congress National Digital Information Infrastructure and Preservation Program, The Andrew W. Mellon Foundation, and the Institute of Museum and Library Services.

Reports

  • UC libraries digital preservation program: Report on aims, overview, and initial priorities. Spring 2004 [DOC]
  • Web-based government information: Evaluating solutions for capture, curation, and preservation. [PDF]
  • Preserving digital materials: Final report for the IMLS about creating a preservation repository for multi-institution use. [PDF]
  • Systemwide strategic directions for libraries and scholarly information: The UC libraries' strategic plan. June 2004 [PDF]
  • Digital preservation flier: Spring 2004 [PDF]