Metasearch
Infrastructure Project: Current Status
Roy Tennant, roy.tennant@ucop.edu ¥ November 16, 2004
The Metasearch Infrastructure Project seeks to develop a
robust set of tools for crafting tailored search interfaces to diverse
information resources. For additional background information, see the project
web site at <http://www.cdlib.org/inside/projects/metasearch/>. After a
thorough and lengthy procurement effort, CDL licensed the ExLibris application
MetaLib on behalf of the UC Libraries, which was installed in late August.
Training in the software followed in September. Roy Tennant is the Project
Manager and Mike McKenna, michael.mckenna@ucop.edu,
is the Technical Lead.
There are two projects that are actively setting up prototype search portals using the MetaLib system:
1. An undergraduate "core" portal, tentatively called "SmartStart" led by CDL staff in association with librarians at UCSC and UCLA (contact Roy Tennant for more information). An early prototype of this service was usability tested at UCSC at the end of October, for which a report is forthcoming.
2.
A search portal for information relating to the European
Union, being developed by UCLA library staff in association with the UCLA
Center for European & Eurasian Studies and with support from CDL staff
(contact Kati Radics, kradics@library.ucla.edu,
for more information). This project may also require the ability to search an
index comprised of a specific set of crawled web sites. UCLA staff are helping
CDL to determine what other tools, templates, guidelines, best practices, etc.,
might be needed to make it easier for other campuses to use MetaLib for their
own purposes.
In addition, there are two grant-funded projects managed by CDL that are preparing to use the MetaLib system to provide metasearch services:
1.
The American West Project, led by Robin Chandler (robin.chandler@ucop.edu), was funded by the William and Flora
Hewlett Foundation to assemble an American West virtual collection drawing
from the resources of major research institutions (see <http://www.cdlib.org/inside/projects/amwest/>).
A key part of this project requires harvesting metadata from partner
institutions (seven institutions outside of UC) and others using the OAI
Protocol for Metadata Harvesting. CDL staff are presently working to develop a
robust metadata harvesting infrastructure (see
<http://www.cdlib.org/inside/projects/harvesting/>) that will enable
MetaLib to search this harvested content in association with licensed
resources. Experience gained from these efforts are helping to inform the work
of the Digital Library Federation OAI Best Practices Working Group (see http://oai-best.comm.nsdl.org/>), and
will likely also inform the Digital Library Federation Digital Aquifer project.
2.
The NSDL project was funded by the National Science Digital
Library to conduct market research and to create a prototype portal that
integrates access to NSDL content and licensed resources (see <http://www.cdlib.org/inside/projects/metasearch/nsdl/>
for more information). The NSDL project will also use the CDL harvesting
infrastructure.
Our goal is to have an easily customizable metasearch
infrastructure that can be used to craft tailored services for particular
audiences and needs. We also require a set of foundational services such as
metadata harvesting and focused web crawling. Therefore, the prototype projects
above are not only providing an opportunity to learn about the MetaLib
application and ways in which CDL can make it easier for campus staff to use,
but are also providing us with the impetus to develop additional key pieces of
infrastructure.
CDL staff are currently determining the extent to which the
existing capabilities of MetaLib will allow us to fulfill our goals. It may not
yet be possible to deploy MetaLib in the way in which we desire to do so
— thus we are exploring our options, one of which may be to use the
provided application program interface (API) to allow us to better control the
interface. Although ExLibris staff have been generally supportive of our needs,
the development cycle as it is presently constituted has the interface changes
we need scheduled sometime next calendar year.
Additional related activities include a number of needs
assessment activities earlier in the year (managed by an outside contractor),
ongoing usability testing efforts such as that identified above (managed by
Felicia Poe and Jane Lee of CDL), and user experience design (managed by Steve
Toub, CDL).
As may be apparent by the description above, we have a good
deal of work to do before this infrastructure is ready for widespread campus
use. Therefore, we are not currently seeking additional deployment
opportunities. The timeframe for opening up to wider deployment is not yet
determined, but we should have a better feel for the timeline by the beginning
of the calendar year.
The scope of the effort to build the metasearch
infrastructure is substantial, but the potential benefits are also very
significant. Once we have developed the ability to harvest metadata from OAI
repositories, the ability to perform focused crawls of specific web sites, and
the ability to search those resources along with our licensed databases and
library catalogs, campus librarians will have a powerful set of capabilities
with which to craft cutting-edge information finding tools.
CDLINFO Articles on the Metasearch Infrastructure Project:
November 13, 2003
http://www.cdlib.org/inside/news/cdlinfo/cdlinfo111303.html#2
March 11, 2004
http://www.cdlib.org/inside/news/cdlinfo/cdlinfo031104.html#7
May 13, 2004
http://www.cdlib.org/inside/news/cdlinfo/cdlinfo051304.html#3
October 28, 2004
http://www.cdlib.org/inside/news/cdlinfo/cdlinfo102804.html#3