Download FAQ [DOC]
The Basics:
1. What is mass digitization?
The goal of mass digitization is not to create individual collections but to digitize the books in the world’s libraries on a grand scale - ideally, every book ever printed. Millions of books from the UC Libraries will be scanned through our participation in mass digitization projects. To do this economically and with some speed, mass digitization is based on the efficient photographing of books, page-by-page, and subjecting those images to optical character recognition (OCR) software to produce searchable text. Human intervention is reduced to a minimum so the OCR output is generally used without undergoing additional revision. Also, only limited structural markup, such as page numbers, tables of contents, and indices, are included.
2. What are the University of California’s mass digitization goals?
Mass digitization projects expand the UC Libraries ability to give faculty, students and the public access to information and support our exploration of new service models. These projects are designed to:
Mass digitization will allow the UC Libraries to explore new questions and service models including but not limited to the following areas:
3. What mass digitization projects are currently underway at UC?
The UC Libraries are participating in three mass digitization projects: Google Book Search, Microsoft Live Search Books and the Open Content Alliance. These are non-exclusive agreements and the UC Libraries may enter into other agreements with other digitization projects as they arise.
Google Book Search
In the Google Book Search project, books and serials (both in-copyright and in the public domain) in all languages are being scanned. The Google-University of California contract targets scanning 2.5 million volumes over a period of six years.
Microsoft Live Search Books:
The Microsoft Live Search Books project focuses on the scanning of public domain materials. The Microsoft-University of California contract currently supports the scanning of thousands of volumes annually. All UC Libraries books scanned through the Microsoft project are available on both the Microsoft Live Search Books website and the Internet Archive website.
Open Content Alliance:
The Open Content Alliance (OCA) represents the collaborative efforts of an international group of cultural, technology, nonprofit, and governmental organizations that are building a permanent archive of multilingual digitized text and multimedia content in the public domain. UC Libraries earliest forays into mass digitization were through the OCA. Scanning has been project driven, with funding provided by various organizations including the California Digital Library, Microsoft, Sloan Foundation and Yahoo. These books are available on the Internet Archive website.
4. What books are being scanned?:
Books of American literature, mathematics, and other subjects from both general and special collections have been selected by UC librarians in consultation with sponsors.
The Scanning Process:
5. Who is doing the digitizing?
Google is scanning books and serials for the Google Book Search project. The Internet Archive serves as digitization agent for the Microsoft Live Search Books project and the Open Content Alliance. Books are not destroyed during the digitization operation in any of these projects.
6. How and where is the digitizing being done?
The Google, Microsoft and OCA projects all employ non-destructive scanning technology. Books scanned through the Google Book Search project are being digitized offsite in a Google-managed facility. Books scanned through the Microsoft Live Search Books project are being digitized at two locations operated by the Internet Archive: one within the UC Libraries system at the Southern Regional Library Facility (SRLF), and the other at an Internet Archive facility. Books scanned through the Open Content Alliance are digitized at Internet Archive facilities including the SRLF facility within the UC Libraries system.
7. What will happen to the books after digitization?
All books are returned to their home locations after digitization. Books are generally returned to the shelf within two to three weeks.
8. Are there standards regarding the quality of the scans?
CDL and UC Berkeley Library staff have been actively engaged in deliberations with technical staff at Google, Microsoft and the Internet Archive regarding quality standards for the two scanning projects. CDL will be performing automated quality assurance (QA) checking using a tool developed at Harvard to verify that files conform to the relevant ISO format specification. Partner Libraries for the mass digitization projects have developed technical specifications for image compression to ensure efficient and high-quality long-term storage of the derived page images.
9. What rights to the digitized content does UC have in the projects; will access be limited in any way?
All contracts specify that UC digital images will be available to the UC Libraries to download and manage. The UC Libraries’ digital copy is subject to certain rights and restrictions regarding use and distribution. The University of California’s use or ability to display the downloaded copies of the full text of all books is subject to the restrictions of copyright law. Full-text searching will be possible for all of the digitized books, but some scanned books will not be completely viewable due to copyright restrictions. Specifics include:
Microsoft
Open Content Alliance
10. How will our patrons be able to access these texts, i.e. through MELVYL, or local catalogs, or a webpage, any search engine, or....?
UC Libraries patrons can currently access UC Libraries scanned books via these three digitization partner websites and the Melvyl Catalog:
UC Libraries books and serials made available through Google Book Search that are in copyright are subject to limited views. Google indexes the full text, but does not serve or display the full-sized digital image or make available for printing and/or download unless Google has permission or a license from the copyright owner to do so. Therefore, UC Libraries patrons will have limited access to books and serials in copyright made available through Google Book Search. Books in the public domain are generally fully viewable and downloadable.
Ever since UC entered into its first mass digitization partnership, UC Libraries have been studying the options for resource discovery through MELVYL and campus OPACS. The Bibliographic Services Task Force, as well as HOTS and HOPS have been involved in discussions of the various options for discovery of the mass digitized content. After a great deal of analysis in this area, CDL put forward to the UC University Librarians the recommendation to pursue discovery of mass digitized content via the UC-OCLC WorldCat Local (WCL) pilot as the quickest and most resource efficient way of exposing this content. The WCL pilot is designed to elicit a great deal of information about end user behavior and expectations; the opportunity to gain an understanding of user behavior and needs specifically regarding our mass digitized content is an added benefit.
Following discussion and approval of this recommendation by the University Librarians at their June 2007 meeting, the UC-OCLC Implementation team charged and staffed a Mass Digitized Content task group. This task group’s charge is to work with OCLC to provide discovery and delivery to the UC mass digitized content in the Next Generation Melvyl WCL pilot.
OCLC has a dedicated program, E-content synchronization, which is focused first on loading the Google records. It is designed to expand at a later date to include the content of the other mass digitized providers, including Microsoft, OCA, etc. The current plan for the pilot is to expose all of UC’s Google content to each of the campus branded WCL OPACs, as well as the campus-wide WCL OPAC. OCLC plans to create a separate online record for each print record and attach a UC-wide OCLC symbol to the online record.
It is important to note that the mass digitized bibliographic records are not yet available from OCLC; OCLC’s work plan is to have them ready by the time the WCL pilot is launched in April 2008. OCLC currently has no plans to develop a workflow to return the additional records created by OCLC to the campuses Additionally, there is no mechanism to select a particular campus's print contributions to the project, should a campus decide it wanted only its own material exposed in its OPAC.
Depending upon assessment and feedback from the WCL pilot, any number of options for exposing content in local OPACS can be explored, if desired, including loading records locally or investigating other linking options. Any option other than discovery through the pilot would involve significant campus and CDL resources.
HOPS requests that linking to mass digitization items as part of the WorldCat Local pilot be thoroughly evaluated before considering other options to load records into Melvyl or local catalogs. HOTS members have unanimously indicated that they are not interested in receiving the mass digitization records at this time because of the high volume of records created in this project (anticipated to be more than 3 million records) and the effect they would have on campus ILSs. As of February 2008 many hundreds of thousands of books have been digitized, representing only a fraction of the total that is expected. If a campus library wishes to receive the records, new workflows will have to be thought through and worked out according to OCLC’s schedule. Additionally, each campus will have to work out a mechanism for suppressing these records from going to the existing Melvyl Catalog, which is not designed to handle this increased amount of data. Although each campus could approach OCLC to work out how to get records back locally, should the campuses desire it, CDL might be able to play a brokering role starting in the second quarter of 2008.
As of March 2008, a Google Books API was deployed so that links to Google Books could be displayed in the Melvyl Catalog. The API went live April 25. Depending on the copyright status of the book, Google will return a full text view, a limited preview, a snippet, or record view.
Availability through the pilot will be a timely opportunity to evaluate WorldCat Local as a discovery vehicle for the mass digitized m aterials before planning significant local projects to expose this material locally.
11. Will access be different for the general public than it is for our faculty, staff and students?
The general public and UC faculty, staff and students will have the same access to digitized UC Libraries books when searching and browsing the three websites listed below as well as the Melvyl Catalog:
In the future, should the UC Libraries establish a UC website providing access to UC Libraries digital copies of books scanned by Google, Microsoft and the Open Content Alliance, the use and distribution of these digital copies will be subject to the contractual rights and restrictions outlined in section eight.
12. What can users do with the texts?
Google Book Search website, Microsoft Live Search Books and the Internet Archive all permit users to:
Google Book Search permits users to:
Google Book Search provides several views of the books based on specific privileges:
Microsoft Live Search Books offers all of Google Book Search features except "Find it in a library" and "Learn about the publisher" plus the following features:
Internet Archive's site offers:
Additional services may be available from these partner websites as they are continually evolving and being enhanced.
13. How will patrons be impacted while books are being digitized?
Patrons will be impacted as minimally as possible during the mass digitization projects. All of the digitization partners have agreed to a maximum check-out period of two weeks. In reality, the turn around time on a given book for both projects is closer to one week.