See the complete progress report submitted to the William and Flora Hewlett Foundation: [PDF]
Subsetting is a a priority for the American West collection, which means that we must have tools that will enable us to select individual objects that meet certain criteria without manually reviewing all of the objects in the virtual collection.
However, the construction of these tools is substantially complicated by the enormous variation in practice adopted by partner libraries in creating item-level metadata (descriptive information associated with the items in their digital collections).
To assess the extent of the problem, the California Digital Library conducted a thorough technical assessment of the collections on offer to determine their format characteristics and accessibility and the metadata formats and delivery vehicles.
The technical assessment revealed two anticipated challenges: libraries organize their materials to meet local needs, and data providers describe their collections in different ways.
Libraries organize their materials to meet local needs, which becomes a problem when local practices differ from those of the American West project. Collections at the University of Virginia, Indiana University, and the University of Michigan are particularly challenging. There, materials relevant to the American West are present in much more comprehensive collections and are not immediately identifiable as a logical subset.
Even where collections are wholly relevant to the American West (as they are, for example, at the CDL, the University of Washington, and the Library of Congress), they are not readily integrated into most of the browsable views that we might wish to create.
Data providers describe their collections in different ways. For example, they don’t all choose to provide descriptive information about the same attributes of a digital object (such as genre, format, and subject). Where they do describe the same attributes (such as date) they describe them differently.
Data providers also apply metadata at different levels of granularity. The combined effect of these inconsistencies and idiosyncrasies is a substantial reduction in the functionality of a service that is built from items in very different digital collections. In fact, the service can function only as well as the least effective item in the collection will allow.