Inside CDL

2003-2004 Progress Report: Functional Specifications for Tools

To meet the user scenarios and implement the collection building strategies, the American West project will develop a range of tools.

See more detail about these tools in the complete progress report submitted to the William and Flora Hewlett Foundation: [PDF]

See sections below:

Basic Infrastructure

A basic digital asset management infrastructure will provide the means of persistently managing and surfacing into a variety of access systems the data and metadata content that make up the American West collection. The infrastructure includes:

  • Archival Resource Key (ARK): The ARK provides for the persistent identification of digital objects. It serves as a useful index key for the location, retrieval, and manipulation of objects.
  • Digital Object Standard: Digital objects that are managed by the California Digital Library must comply with the CDL Digital Object Standard, currently under revision and expected to be released in 2005. The revised standard specifies the use of the Metadata Encoding and Transmission Standard (METS) as the wrapper for descriptive and technical metadata.
  • eXtensible Text Framework (XTF): The newest generation content management system at the CDL is the eXtensible Text Framework (XTF). XTF organizes and searches collections of large documents in multiple formats, providing sophisticated query capabilities and flexible navigation with search hits marked within context.
  • Digital preservation repository: In order to secure the longevity of the digital assets that it manages, the CDL has built a digital preservation repository. The repository system is open and extensible.

Collection Building Tools

Collection building tools will gather digital information from distributed online collections so they may be integrated into virtual uniform collections. The tool suites support the following tasks:

  • Automated ingest of data and metadata content: The voro suite of ingest tools permits content-submitting institutions to batch deliver objects conforming to specified standards for objects of certain types (such as EADs, texts, and images) for processing and ingest into CDL content delivery systems.
  • Automated capture of item-level metadata: The CDL has implemented the Open Archives Initiative’s Protocol for Metadata Harvesting (OAI-PMH), which specifies a method for digital repositories (also called data providers) to expose metadata about items in their collection for harvesting by aggregators (also called service providers).
  • Automated capture of static web pages: Using web-crawling tools, the CDL will bring selected pages from the surface web into its content management system, generating basic metadata at ingest for the discovery, location, and presentation of the stored web pages. The CDL has established functional specifications for the web-crawling tools and will be developing them with support of the Library of Congress under the auspices of its National Digital Information Infrastructure Preservation Program. See the web crawler requirements: [PDF]

Curatorial Tools

Curatorial tools will select items that are gathered for inclusion in a particular collection. They will include tools that enrich item-level metadata where it is insufficient to support essential selection decisions and/or the service features that are planned for the virtual collection. Two separate suites of curatorial tools are being developed:

  • Guidelines for data creators: These detailed best practice guidelines for digital objects are for data creators who wish to make their content available for ingest by the CDL or other aggregators.
  • Curatorial tools for service providers: This suite of tools will allow the CDL and other service providers to overcome the numerous deficiencies that will remain in the metadata associated with any virtual collection they wish to build. There will be four sets of tools:
    • Analysis tools will assess the metadata associated with the collections gathered via web crawling, harvesting, or ingest.
    • Normalization tools will standardize the way in which values are recorded for metadata elements. Tools of this type are envisaged as a suite of web services against which metadata elements can be evaluated and through which they can be transformed to some normal representation.
    • Enrichment tools will provide a automated process by which metadata can be augmented with contextual and other information.
    • Subsetting tools will enable the service provider to determine what metadata to harvest based on certain criteria.

Access Tools

Access tools that will integrate, subset, and present collection content and provide search and retrieval functionality, as well as a number of standard browsable views, have been created and maintained by the CDL.

  • Metasearch tools: Metasearch tools will integrate disparate content pools and allow users to create subsets of available content pools. The CDL is currently evaluating the MetaLib metasearch engine supplied by Ex Libris, which will likely provide the content integration and access functions required by the American West project.

Customization Tools

Customization tools will provide a high degree of interactivity for organizational (library) and individual users of the American West collection. With them, users will be able to select, annotate, and export items into locally defined collections that can be saved for later reference or exported into other software environments and presented with a high degree of control over the resulting look, feel, and functionality.

Candidate customization tools most likely to be developed:

  • Curatorial tools will enable the building of browsable views or subsets of the American West collection.
  • Interface customization tools will enable customization of the appearance of the American West collection or any subset derived from it.
  • Specialist tools will enable specialist manipulation of the collection content:
    • Citation management tools will enable users to identify, parse, capture, and export citations in a format that allows direct linking from the citation to the online version of the object.
    • Export tools will enable users to capture individual items or groups of items plus any annotations they may have supplied and export them in formats appropriate for use in other local software platforms.
CDL Digital Special Collections Helpdesk
  • Need assistance? Contact us via e-mail: oacops @ cdlib . org