Metasearch Infrastructure Project: Current Status

 

Roy Tennant, roy.tennant@ucop.edu ¥ November 16, 2004

 

The Metasearch Infrastructure Project seeks to develop a robust set of tools for crafting tailored search interfaces to diverse information resources. For additional background information, see the project web site at <http://www.cdlib.org/inside/projects/metasearch/>. After a thorough and lengthy procurement effort, CDL licensed the ExLibris application MetaLib on behalf of the UC Libraries, which was installed in late August. Training in the software followed in September. Roy Tennant is the Project Manager and Mike McKenna, michael.mckenna@ucop.edu, is the Technical Lead.

 

There are two projects that are actively setting up prototype search portals using the MetaLib system:

1.        An undergraduate "core" portal, tentatively called "SmartStart" led by CDL staff in association with librarians at UCSC and UCLA (contact Roy Tennant for more information). An early prototype of this service was usability tested at UCSC at the end of October, for which a report is forthcoming.

2.        A search portal for information relating to the European Union, being developed by UCLA library staff in association with the UCLA Center for European & Eurasian Studies and with support from CDL staff (contact Kati Radics, kradics@library.ucla.edu, for more information). This project may also require the ability to search an index comprised of a specific set of crawled web sites. UCLA staff are helping CDL to determine what other tools, templates, guidelines, best practices, etc., might be needed to make it easier for other campuses to use MetaLib for their own purposes.

 

In addition, there are two grant-funded projects managed by CDL that are preparing to use the MetaLib system to provide metasearch services:

1.        The American West Project, led by Robin Chandler (robin.chandler@ucop.edu),  was funded by the William and Flora Hewlett Foundation — to assemble an American West virtual collection drawing from the resources of major research institutions — (see <http://www.cdlib.org/inside/projects/amwest/>). A key part of this project requires harvesting metadata from partner institutions (seven institutions outside of UC) and others using the OAI Protocol for Metadata Harvesting. CDL staff are presently working to develop a robust metadata harvesting infrastructure (see <http://www.cdlib.org/inside/projects/harvesting/>) that will enable MetaLib to search this harvested content in association with licensed resources. Experience gained from these efforts are helping to inform the work of the Digital Library Federation OAI Best Practices Working Group (see http://oai-best.comm.nsdl.org/>), and will likely also inform the Digital Library Federation Digital Aquifer project.

2.        The NSDL project was funded by the National Science Digital Library to conduct market research and to create a prototype portal that integrates access to NSDL content and licensed resources (see <http://www.cdlib.org/inside/projects/metasearch/nsdl/> for more information). The NSDL project will also use the CDL harvesting infrastructure.

 

Our goal is to have an easily customizable metasearch infrastructure that can be used to craft tailored services for particular audiences and needs. We also require a set of foundational services such as metadata harvesting and focused web crawling. Therefore, the prototype projects above are not only providing an opportunity to learn about the MetaLib application and ways in which CDL can make it easier for campus staff to use, but are also providing us with the impetus to develop additional key pieces of infrastructure.

 

CDL staff are currently determining the extent to which the existing capabilities of MetaLib will allow us to fulfill our goals. It may not yet be possible to deploy MetaLib in the way in which we desire to do so — thus we are exploring our options, one of which may be to use the provided application program interface (API) to allow us to better control the interface. Although ExLibris staff have been generally supportive of our needs, the development cycle as it is presently constituted has the interface changes we need scheduled sometime next calendar year.

 

Additional related activities include a number of needs assessment activities earlier in the year (managed by an outside contractor), ongoing usability testing efforts such as that identified above (managed by Felicia Poe and Jane Lee of CDL), and user experience design (managed by Steve Toub, CDL).

 

As may be apparent by the description above, we have a good deal of work to do before this infrastructure is ready for widespread campus use. Therefore, we are not currently seeking additional deployment opportunities. The timeframe for opening up to wider deployment is not yet determined, but we should have a better feel for the timeline by the beginning of the calendar year.

 

The scope of the effort to build the metasearch infrastructure is substantial, but the potential benefits are also very significant. Once we have developed the ability to harvest metadata from OAI repositories, the ability to perform focused crawls of specific web sites, and the ability to search those resources along with our licensed databases and library catalogs, campus librarians will have a powerful set of capabilities with which to craft cutting-edge information finding tools.

 

CDLINFO Articles on the Metasearch Infrastructure Project:

 

November 13, 2003

http://www.cdlib.org/inside/news/cdlinfo/cdlinfo111303.html#2

 

March 11, 2004

http://www.cdlib.org/inside/news/cdlinfo/cdlinfo031104.html#7

 

May 13, 2004

http://www.cdlib.org/inside/news/cdlinfo/cdlinfo051304.html#3

 

October 28, 2004

http://www.cdlib.org/inside/news/cdlinfo/cdlinfo102804.html#3