Web-at-Risk is a four and one half-year effort led by the California Digital Library (CDL) to develop tools that enable librarians and archivists to capture, curate, preserve, and provide access to web-based government and political information. The primary focus of the collection is state and local government information, but may include web documents from federal and international government as well as non-profit sources.
Web-at-Risk received one of eight grants awarded by the National Digital Information Infrastructure and Preservation Program (NDIIPP). The work is undertaken by the CDL and its partners New York University and the University of North Texas, with additional support from Stanford University, the San Diego Supercomputing Center, the Arizona State Library, and the Library of Congress. The University of California libraries are also participating, with staff from the Berkeley, David, Los Angeles, Riverside, San Diego, San Francisco, Santa Barbara, and Santa Cruz campuses lending their domain expertise.
The "at-risk" designation refers both to the ephemeral nature of web resources in general and to the particularly unstable nature of local government information and political resources. Critical publications that libraries once collected in print now often appear in digital format on the web and are thus susceptible to disappearing as sites are updated, as government agencies themselves are reorganized, or as older file formats become unusable. As the scale of this problem expands with the growth of the web, librarians need a new suite of tools to fulfill their historic mission to preserve our cultural and political heritage.
To address these concerns, the CDL is building the Web Archiving Service (WAS), a web capture and curation service that builds upon CDL's existing Digital Preservation Repository (DPR). While the project's primary constituents are the libraries of the University of California, New York University, and the University of North Texas, the WAS is being developed as an open-source toolset that other organizations can eventually deploy. The WAS draws from existing and widely shared open-source web archiving tools, such as the Heratrix web crawler developed by the Internet Archive.
User assessment has been a key part of the Web-at-Risk project from the outset. In 2005, project partners conducted extensive surveys, focus groups, and interviews with librarians, researchers, and content providers. This needs assessment data informed the WAS requirements and design. Assessment continues to play an integral role, as the project's curatorial partners, a group of government information specialists from several institutions who serve as pilot users of the Web Archiving Service, provide structured feedback through surveys, interviews, and formal usability testing throughout the project.
The first phase of the Web Archiving Service is being released to the pilot curators in several stages between July 2006 and December 2007. Each stage integrates a new area of functionality; this approach allows developers to divide an ambitious project into smaller more manageable segments and also allows integration of feedback and suggestions from the pilot users at each stage. As of May 2007, Basic Capture, Search and Display, and Analysis and Reports have had successful pilot releases. Upcoming releases include Collection Building, Administration, and Preservation. A second phase, beginning in January 2008, will explore providing end user access to curated collections.
Main project partners:
Technical partners:
Curatorial partners:
Adviser: