The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) specifies a method for digital repositories (also called "data providers") to expose metadata about their objects for harvesting by aggregators (also called "service providers"). Metadata is exposed via "sets," or collections of metadata that data providers decide to make available for harvesting. Service providers harvest sets from data providers of interest, and provide search services for the resulting collections of metadata (for a good example of a service provider, see OAIster). Data providers also decide which metadata formats to expose for harvesting, beyond the one required data format of simple Dublin Core.
As part of CDL's Metasearch Infrastructure Project, which is supporting the development of a number of search portals, metadata was test harvested from a variety of repositories. Experimentation with a Prototype Harvest Search service exposed a number of problems and issues that need to be addressed. Many of these issues, as well as some suggested strategies for dealing with them as well as a proposed infrastructure for metadata harvesting, is outlined in the paper Bitter Harvest: Problems & Suggested Solutions for OAI-PMH Data & Service Providers". In response to the issues outlined in that paper, we are drafting Specifications for Metadata Processing Tools that the recently formed Harvesting Core Group (see below) is charged with implementing.
As a beginning step toward normalizing dates and as a test case for a suite of metadata normalization and transformation tools, we created a prototype date normalization tool. We have also collected date test cases by noting variances in date encodings. This work led to the coding and release of a Date Normalization Utility which anyone is free to download and use (without support).
Internally, we are exposing our harvested metadata to other applications such as our metasearch software via SRU.
Charged with creating an OAI harvesting infrastructure for CDL, the Harvesting Core Group membership is:
See also the "Bitter Harvest" paper above for pointers to additional resources.