Skip to main content

HathiTrust Update

By Heather Christenson, Mass Digitization Project Manager and CDL HathiTrust Project Manager

The HathiTrust has released a January 2010 update.  The following are highlights of UC contributions to the HathiTrust, as well as items of special interest to our UC community.

UC’s digital volumes continue to be loaded into the HathiTrust

To date, over 1.1 million UC volumes are available in the HathiTrust. CDL and University of Michigan staff are in close communications as we progress with the ongoing loading of UC’s Google-digitized volumes.  Once we are finished with the “backlog” of already-digitized items, our next step will be to plan for the ongoing incremental download of books being digitized from NRLF, UCSC and UCSD into HathiTrust.  The group will also be planning for future UCLA download, as yet unscheduled.  The major barrier encountered in this work has been issues with bibliographic record completeness and formatting.

Ingest of a pilot set of Internet Archive-digitized books for QA testing has begun.  CDL and the Universty of Michigan are actively evaluating the test, and planning for the first batch of Internet Archive-digitized books (from SRLF) to begin being ingested into HathiTrust within the next few weeks.  The ingest of Internet Archive-digitized books will be a major achievement for the HathiTrust and especially for UC.  It will enable other HathiTrust partners with Internet Archive books to follow in our footsteps,  and allows UC to unify our mass digitized content in a common preservation and access repository.

CDL staff Paul Fogel, Lynne Cameron, Andy Mardesich, and Stephanie Collett have been instrumental in these efforts.

Access to HathiTrust public domain volumes via UC-eLinks

The CDL Discovery and Delivery Team is working on a project to demonstrate proof-of-concept success in exposing Hathi Trust public domain books through UC-eLinks. The team has settled on the use of the newly released Hathi Trust Bibliographic API (http://www.hathitrust.org/bib_api) which returns bibliographic, rights and volume information when queried with a standard identifier. Internal testing is underway and the team hopes to implement access to public domain books through UC-eLinks in March this year.  Margery Tibbetts (CDL) is the technical lead for this project.

Evaluation of potential “third instance” of the HathiTrust repository

A group including UC representatives Stephen Abrams (CDL), John Kunze (CDL), Luc Declerck (UCSD), and David Minor (UCSD) evaluated costs and benefits and made recommendations on a third instance of HathiTrust storage.  The group concluded that “…although the group has identified a number of significant benefits accruing to the Trust from a third instance, given the high level of preservation confidence in the existing two instance architecture, and absent specific favorable economic terms for acquisition and operation, there is no urgency in establishing a third repository instance at this time.  Nevertheless, the Trust should be prepared to respond quickly to opportunities to establish a third instance on favorable economic terms as they may arise in the future.”  The full report of the working group is available at (http://www.hathitrust.org/projects#wg_storage).

Working Groups

UC has been leading and actively participating in many other HathiTrust activities including the Strategic Advisory Board, Quality Ingest & Error Rate working group, Discovery Interface working group, Collaborative Development Environment working group, and Research Center working group.  More detailed information on the activities of these HathiTrust groups can be found on the HathiTrust web site: http://www.hathitrust.org/updates.

More information about our UC Mass Digitization projects, including the UC Mass Digitization FAQ and our new Where to Find Our Books page, is available at: http://www.cdlib.org/services/collections/massdig/.