Skip to main content

Completion of UC-Internet Archive on-site mass digitization projects

By Heather Christenson, CDL Mass Digitization Project Manager

In 2005, the UC Libraries entered into a ground-breaking partnership with the Internet Archive to digitize public domain book collections from the University of California Libraries. With the generous support of external partners such as Microsoft, Yahoo, and the Alfred P. Sloan Foundation, our collaboration grew to encompass two major on-site scanning centers at NRLF and SRLF and scores of dedicated staff at the UC Regional Library Facilities and elsewhere throughout UC, producing an impressive corpus of close to 200,000 public domain books that are now available worldwide to students, scholars, and the general public. Today, five years and over 64 million pages later, we announce the conclusion of this phase of our Internet Archive collaboration and celebrate the work we have accomplished together.

UC’s book digitization partnership with Internet Archive began in 2005 as a founding member of the Open Content Alliance. In February 2006, the first on-site digitization center comprising ten Scribe scanning machines was installed at NRLF; a second 10-station scanning center was opened at SRLF later that year.  In August 2008, UC’s on-site Internet Archive digitization center at NRLF was de-commissioned and relocated to an Internet Archive facility in San Francisco, leaving the SRLF scanning center as our only remaining on-site facility. One year later in August 2009, the UC-hosted Internet Archive scanning center housed at SRLF was closed and relocated to a new off-site facility in the Los Angeles area, marking the conclusion of a digitization project that has made available to the world an unparalleled digital corpus of public domain books drawn from the renowned collections of the University of California Libraries.

Although the closing of the SRLF facility is an ending of sorts, it also marks an impressive milestone in the work that we have achieved in digitizing public domain materials from UC library collections. UC books comprise the second-largest public domain corpus digitized by the Internet Archive. These books come from the collections of all ten UC campuses housed at our two RLFs, as well as selected collections from the Bancroft Library at UC Berkeley, The Charles E. Young Research Library at UCLA and its Department of Special Collections, and the UC Davis Libraries. Notable collections include Italian Comedies, the Center for Oral History Research, the Elmer Belt Florence Nightingale Collection, the Maurice N. Beigelman Collection of Ophthalmology, Robert E. Gross Collection of Rare Books in Business and Economics, The Bulletin of the California Division of Mines and Geology, and the Bulletin of the California Department of Water Resources, among many others. UC Libraries can be particularly proud to have completed the digitization of a major corpus of English language books published prior to 1923 housed at our two regional library facilities (excluding items rejected due to condition or other technical reasons). We were fortunate to be able to continue digitizing additional pre-1923 roman language content at SRLF in recent months with remaining funding from Microsoft, CDL, and the Internet Archive.

While this phase of our work with Internet Archive is coming to an end, we look forward to continuing our collaboration for many years to come as opportunity and resources permit.

CDL is honored to acknowledge the outstanding dedication and efforts of the many individuals involved in this project, including: Internet Archive managers Julie Lefevre, Kris Brix and their teams; Scott Miller, Jutta Wiemhoff, Shondell Beck, Jeanette Kalchik, Tom Hudgens, and Sarah Schrader at NRLF; Colleen Carlton, Matthew Smith, Carlos Mendiola, and Ryan Tanaka at SRLF; Mary Elings and David Zuckerman at UC Berkeley; and Karen Andrews and Sylvia Villa at UC Davis.

The collections created by this project will be included in the HathiTrust Digital Library for preservation and access, along with UC’s Google books. CDL is currently working with the University of Michigan to develop the process needed to add our Internet Archive-digitized books to the HathiTrust in the coming months.

Digitized collections from the University of California Libraries can currently be viewed on the Internet Archive site at the following location:
http://www.archive.org/details/university_of_california_libraries

More information on the UC Libraries’ mass digitization projects can be found on the InsideCDL web site: http://www.cdlib.org/inside/projects/massdig/ .