Inside CDL

The Web-at-Risk: Preserving Our Nation's Cultural Heritage

About Web-at-Risk

Web-at-Risk was a four and one half-year effort led by the California Digital Library (CDL) to develop tools that enable librarians and archivists to capture, curate, preserve, and provide access to web-based government and political information.  While the primary focus of Web-at-Risk was state and local government information, collections also included web content from federal and international government as well as non-profit sources.

Web-at-Risk received one of eight grants awarded by the National Digital Information Infrastructure and Preservation Program (NDIIPP).  The work was undertaken by the CDL and its partners New York University and the University of North Texas, with additional support from Stanford University, the San Diego Supercomputing Center, and the Library of Congress.  The University of California libraries also participated, with staff from the Berkeley, Davis, Los Angeles, Riverside, San Diego, San Francisco, Santa Barbara, and Santa Cruz campuses lending their domain expertise.

Web Archiving Tools - Filling a Need

The need for web archiving tools stems from the ephemeral nature of web resources, especially local government and political information. Print publications now often appear in digital format on the web. As changes are made to web sites these publications are susceptible to disappearing. Librarians need a new suite of tools to fulfil their historic mission of preserving our cultural and political heritage.

The Web Archiving Service (WAS)

To address these concerns, the CDL has built the Web Archiving Service (WAS), a web application designed to capture, curate, and preserve Web content. WAS offers the ability to capture sites, to search and browse captured content, to compare sites over time, and to make archives of captured content publicly available.

Project Deliverables

While the Web Archiving Service was a major focus of the Web-at-Risk grant work, the project encompassed more than just software development work.  Below are some of the deliverables completed for the grant:

The Web at Risk: A Distributed Approach to Preserving our Nation’s Political and Cultural Heritage.  Interim Report from the California Digital Library. [PDF]
A summary of grant work between 2005 and 2008.  This report includes evaluations of each pilot release of the web archiving service, and links to detailed technical documentation.

Web-at-Risk Needs Assessment Summary Report [PDF]
A summary of the needs assessment work that took place at the outset of the grant.  This included several focus groups of librarians and archivists as well as surveys and interviews to determine what librarians need from web archiving tools.

Web Archiving Service Guide [PDF]
Provides an in-depth overview for capturing sites, controlling capture settings, analyzing results and building collections.

Web Archiving Service Collection Planning Guidelines [PDF]
The Collection Planning Guidelines are used to help Web Archiving Service users plan their collection activity.

The Web Archives
The archives themselves represent the culmination of the grant work. While not every archive available here was developed under the auspices of the grant, archives focused on California local government agencies, American left-wing and labor movements and Middle Eastern political movements were all developed in the course of the Web-at-Risk grant. The content of these archives was also transfered to the Library of Congress using the Bag-It specification developed for NDIIPP-funded content.

Further Information

Web-at-Risk wiki
News and reports from Web-at-Risk grant activity.

National Digital Information Infrastructure Preservation Project
Information from the project’s funders.

Web-at-Risk Project Partners [PDF]

Web-Based Government Information: Evaluating Solutions for Capture,
Curation, and Preservation.  [PDF]

Web-at-Risk Collections: You Tube Video
An overview of what places web publications at risk, and a glimpse at the collections that the CDL’s work is enabling.