The Web-at-Risk, a CDL Digital Preservation Group project, achieved a major milestone with Release 1 of the Web Archiving Service (WAS) to a pilot group of project curators. Development of the WAS marks a crucial step in enabling the libraries to extend their historic collection building role in a web-published world. The Web-at-Risk is three-year effort led by the CDL with the goal of building tools that will enable librarians and archivists to capture, curate and preserve web-based government and political information. The primary collection building focus is federal, state, and local government information, but may include web documents from non-profit and international government sources and also policy documents, campaign literature, and information surrounding local political movements.
The “at-risk” designation refers both to the ephemeral nature of web resources in general and to the particularly unstable nature of government and political resources. Critical publications that libraries once collected in print are now often only available on the web and are vulnerable to disappearing as sites are updated, as government agencies themselves are reorganized or as older file formats become unusable. As the scale of this problem expands with the growth of the web, librarians need a new suite of tools to fulfill their historic mission to preserve our cultural and political heritage.
Currently the UC libraries are unable to continue their historic mission of collection building since many important items are fleetingly available only on the web. The Web Archiving Service will fill that gap and allow libraries to continue to collect, manage, and preserve content that is crucial to the research, learning, and teaching at UC. Users of the Web Archiving Service will identify a URL that they are interested in collecting, crawl the content, and put it in the Digital Preservation Repository for safe keeping.
The Web Archiving Service will be released to the pilot group of curators in several stages between July 2006 and December 2007. Each phase integrates a new area of functionality; this approach allows developers to divide an ambitious project into smaller, feasible segments. The service will be operational in December 2007.
This project is one of eight grants awarded by the National Digital Information Infrastructure Preservation Program (NDIIPP). This work is undertaken by the CDL and its partners, New York University and the University of North Texas, with additional support from Stanford University, the San Diego Supercomputing Center, the Arizona State Library and the Library of Congress. The UC libraries are also involved in the project, with staff contributing expertise from UC campuses at Berkeley, Davis, Irvine, Los Angeles, Riverside, San Diego, San Francisco, Santa Barbara, and Santa Cruz.
For more information about the project see:
The Web-at-Risk Wiki: http://wiki.cdlib.org/WebAtRisk/tiki-index.php
California Digital Library Digital Preservation Program: http://www.cdlib.org/services/uc3/dpr.html