UC Libraries and HathiTrust FAQ
- What is HathiTrust?
- What does the name HathiTrust mean?
- How did the project come about?
- What libraries are participating?
- What is UC's and CDL's role?
- How does this relate to the UC Curation Center?
- Where is the content stored?
- What content is in the repository?
- What are the benefits to UC in participating in this project?
- What services does HathiTrust offer users?
- What services does HathiTrust provide to users with print disabilities?
- What is the HathiTrust Research Center?
- What is Zephir?
- What is the HathiTrust User Support Working Group?
- What do I do if I find an error in a HathiTrust volume?
- What is the Copyright Review Management System?
- How do UC Libraries respond to the annual HathiTrust call for print holdings submission?
1. What is HathiTrust?
HathiTrust is an inter-institutional repository of primarily mass-digitized books and serials. The repository supports both preservation and access, with a range of access services encompassing permitted uses of books digitized by Google and full access to public domain content.
2. What does the name HathiTrust mean?
Hathi (pronounced hah-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength. Trust is a core value of research libraries and one of their greatest assets. In combination, the words convey the key benefits researchers can expect from a first-of-its-kind shared digital repository.
3. How did the project come about?
The HathiTrust began in late 2008 as an outgrowth of the University of Michigan’s local MBooks repository (which contains UM’s books digitized by Google). It was initially re-conceived as a shared repository of the Committee on Institutional Cooperation (CIC) institutions. The University of Michigan, University of California, and the 13 Universities of the Committee on Institutional Cooperation (CIC) are the original founding partners.
4. What libraries are participating?
Including the original partners (the University of California, the University of Michigan, and the 13 CIC universities), HathiTrust partners now include over 90 colleges and universities from the United States and around the world.
5. What is UC's and CDL's role?
Because of the size and scale of the UC libraries' mass digitization program, UC was invited to become a lead partner in the project. UC has significant representation at all levels of governance. CDL coordinates UC's participation and devotes resources to the ongoing development of access and preservation services.
In November 2013, CDL launched Zephir, the new bibliographic metadata management system for HathiTrust. A committed CDL team manages the ongoing bibliographic metadata ingests and updates to Zephir. See below for more information about Zephir.
6. How does this relate to the UC Curation Center?
The UC Curation Center is committed to preserving the digital assets that support UC's research, teaching, and learning mission (e.g., UCTV, web archived content, UC library collections, eScholarship, electronic theses and dissertations, data sets etc.), while HathiTrust preserves UC's digitzed books collectively. The UC Curation Center continues to work with campus partners on an evolving set of digital curation services that are designed to meet UC's unique needs.
7. Where is the content stored?
The current architecture includes two mirror sites--one at the University of Michigan and one at Indiana University. A third tape copy is stored in Ann Arbor, MI.
8. What content is in the repository?
There are currently over 11 million volumes in the HathiTrust Repository. Books digitized by Google (including more than 3 million volumes from UC) form the backbone of the repository. Internet Archive-digitized volumes (including over 190,000 UC volumes) form the second largest group. A number of member institutions are digitizing their own volumes and contributing them to HathiTrust. UC contributes all of its mass digitization materials (well over 3 million volumes) and CDL can assist campuses with the deposit of campus-digitized volumes, helping to unify UC content into one central repository.
9. What are the benefits to UC in participating in this project?
Benefits to UC include:
- Greater service to users through combined content and access to materials digitized by other institutions. This includes content from partner libraries found nowhere else on the web or specifically opened (in the case of copyright-restricted materials) by copyright holders for access to users in HathiTrust.
- Opportunity to provide deeper support for scholarly access to mass digitized materials, including the abilities to retrieve content in different formats (e.g. plain text, PDF, and page image), browse and facet search results, define full-text searches across selected bodies of content, and save items to targeted collections.
- New opportunities for scholars to conduct computational research across digitized texts within HathiTrust Research Center.
- Reduced costs resulting from sharing access and preservation services with multiple partners.
- Opportunities to pioneer, develop, and share technologies to help in the administration and management of digital content.
10. What services does HathiTrust offer users?
Current HathiTrust services include:
- Aggregation of content digitized by Google, Internet Archive, and partner institutions for unified full-text discovery and access with a common user experience.
- Consolidated full-text search (including advanced full-text search) across all items in the repository. Includes the ability to combine with catalog search.
- Consolidated catalog search (including advanced catalog search) across the metadata of all items in the repository.
- Full access to public domain volumes and page-turner application for online viewing.
- Free download of public domain volumes: The public may download one page at a time; Users associated with partner institutions may log in and download entire public domain volumes.
- Mobile site which allows for catalog search and full-text view of public domain volumes. The mobile address is http://m.hathitrust.org/.
- Collection building capacity for partner institutions: Users associated with partner institutions may log in and build, publish, and share individualized collections which are searchable via full-text search. Institutions wishing to create large collections across their digitized volumes may do so--please contact the Mass Digitization Team for help.
- Ability to embed book or search widget into a website or blog. Users may easily embed a Public Domain book (starting from any page) and/or embed a HathiTrust search box.
11. What services does HathiTrust provide to users with print disabilities?
Users in the United States or Canada who are affiliated with a partner institution and who have a print disability may be allowed access to in-copyright works currently held (or previously held) by their institution via a proxy. Each partner institution must meet the requirements for access.
12. What is the HathiTrust Research Center?
The HathiTrust Research Center (HTRC) enables scholars computational access to a portion of works in the HathiTrust Library. HTRC was launched jointly by Indiana University, the University of Illinois, and HathiTrust in a collaborative effort to develop the tools necessary for large-scale text mining and analysis. Learn more about the HTRC on the HathiTrust website.
13. What is Zephir?
Zephir is the bibliographic metadata management system custom-developed and maintained for HathiTrust by the California Digital Library (CDL). Zephir launched in November 2013 after a team from CDL worked closely with HathiTrust staff at the University of Michigan to design this new system. Zephir stores, manages, updates and exports bibliographic records accompanying digital items deposited into the HathiTrust digital repository. Zephir supports record loading and error reporting, as well as ingest, clustering, and versioning. Metadata from Zephir is integrated into a number of HathiTrust systems and processes including content ingest and initial rights determinations. You can find Zephir-managed metadata in the HathiTrust online catalog, HathiTrust data feeds, and the Digital Public Library of America. Read more about Zephir on the HathiTrust website.
14. What is the HathiTrust User Support Working Group?
The HathiTrust User Support Working Group (HUSWG) is a team of 12 librarians from member institutions who are charged with responding to HathiTrust user inquiries and feedback. When users discover errors (in metadata or scanning) they may notify HathiTrust via a Feedback form. The HUSWG responds to users and notifies appropriate staff at member institutions when metadata or scanning errors need to be corrected.
15. What do I do if I find an error in a HathiTrust volume?
If you discover an error in a HathiTrust volume please report it to Feedback on the HathiTrust website. A member of HUSWG will read your error report and send it to the appropriate partner contact to be corrected.
16. What is the Copyright Review Management System?
In 2008, HathiTrust was awarded a three-year grant from the Institute for Museum and Library Studies (IMLS) to create a Copyright Review Management System (CRMS-US). The goal was to build on the work of Technical Services Division staff at the University of Michigan and increase the reliability of copyright status determinations of books published in the United States from 1923 to 1963 in the HathiTrust Digital Library. The resulting CRMS allows trained staff to record copyright status, the corresponding level of access allowed, and the trail of investigation into a work's copyright status. Processes for cross-checking and quality control were built into the system. The grant was completed in 2011 and over 300,000 U.S. books published between 1923 and 1963 were reviewed. About half were determined to be out of copyright and were made available to the public in HathiTrust.
In 2011, the IMLS awarded HathiTrust a second grant to fund the development and deployment of the CRMS-World system and for the continued performance of the Copyright Review Management System (CRMS). With the new IMLS award, the University of Michigan Library and its partners continue to make copyright determinations for U.S. titles and, in addition, make reliable copyright status determinations for foreign-published titles.
17. How do UC Libraries respond to the annual HathiTrust call for print holdings submission?
Each year, typically in the Spring with a Summer deadline, HathiTrust issues an email call to its partner institutions requiring they submit data about their print holdings. This data has several uses - significantly, it is used to for annual partner cost calculations. Other uses include the facilitation of legal use of materials (for print disabled use and for uses described in Section 108 of US copyright law) and the facilitation of collaborative collection development and management operations.
Full information on what data is requested and how it should be formatted can be found in the specifications at https://www.hathitrust.org/print_holdings.
HathiTrust’s email call for print holdings data usually includes specific instructions for delivering the data. Per the current practice, UC campuses can submit data using Box.com. HathiTrust has established a Box.com folder for each UC campus for this purpose, including sub-folders for each year’s data.
For RLF holdings: As recommended by HOTS and CAMCIG in 2012, the campus where a print holding physically resides should be the campus that reports the holding. RLF holdings, thus, should be submitted by the campus with which the RLF is associated. Holdings for all northern campuses housed at NRLF should be reported by UC Berkeley and holdings from all southern campuses housed at SRLF should be reported by UCLA.