The Melvyl Recommender Project
BackgroundPopular commercial services such as Google, e-Bay, Amazon, and Netflix have evolved quickly over the last decade to help people find what they want, developing information retrieval strategies such as usefully ranked results, spelling correction, and recommendations. Library catalogs, in contrast, have changed little and are not well equipped to meet changing needs and expectations. The Melvyl Recommender Project explored methods and feasibility of closing this gap.
Project Report and Prototype
Download the narrative of the final project report (PDF)
Download the appendices (not included with the narrative):
- Appendix A: Project timeline (PDF)
- Appendix B: Project team and contributors (PDF)
- Appendix C: Description and screen shots of prototype (PDF)
- Appendix D: Descriptions of holdings and circulation data (PDF)
- Appendix E: Performance testing (PDF)
- Appendix F: Subject area groupings (PDF)
- Appendix G: Bibliography for assessment activities (PDF)
- Appendix H: Assessment plan (PDF)
- Appendix I: Human subjects approval (PDF)
- Appendix J: Assessment Instruments
- Appendix K: Screen shots from relevance ranking assessment (PDF)
- Appendix L: Screen shots from recommending assessment (PDF)
- Appendix M: Project flyer (PDF)
- Appendix N: New and modified XTF modules (PDF)
Full Text Integration Extension
The aim of this Extension to the Melvyl Recommender Project was to carry out deeper explorations into the most interesting and promising questions raised during the original project, and to add obvious missing pieces of functionality. The principal area of investigation was the impact of adding full-text objects to what had previously been a metadata-only index.
Full Text Extension Supplementary Report (PDF)
Executive SummaryOver the course of a year (June 2005 - June 2006), the project team conducted exploratory development work in five topic areas: use of a text-based discovery system, spelling correction, user interface strategies, relevance ranking, and recommending.
The use of a text-based discovery system, XTF , with its built-in relevance ranking capability, proved to be a promising approach. Performance on a series of simple load tests suggests that the system is capable of scaling to support millions of records and hundreds of concurrent users.
Experiments with index-based spelling correction were similarly positive. Starting with an existing index-based spelling correction algorithm and applying a number of optimizations, we met the goal of producing the right correction for a misspelled word (on the first try) 90% of the time.
Although they were not a central focus of the project, we conducted a shallow initial investigation of two strategies for improving navigation through large record sets: faceted browsing, and grouping results based on functional requirements for bibliographic records (FRBR). In both cases, initial experiments suggest that delving more deeply into these areas will result in better service to patrons.
Our investigation of enhanced relevance ranking considered whether returning result sets using content-based relevance ranking, optionally boosted by weights based on circulation and holdings data, would improve the ability of patrons to complete typical academic tasks. A task-based user assessment showed that in general, academic users do prefer relevance ranked result sets to those that are unranked (current catalogs are typically unranked); preferences differed by level of subject area expertise. Limitations due to the design of the study prevented us from making a strong statement as to which of the three ranked methods that we tested will best serve the greatest number of patrons.
We explored two major strategies for generating recommendations: an approach based on the mining of circulation data (ie "patrons who checked this out also checked out..."), and an approach based on similarities in the content of bibliographic records ("more like this..."). A task-based user assessment of the former method showed that patrons are enthusiastic about using an online library catalog with a recommendation service; testing confirmed that recommendations were successful in supporting academic tasks. Moreover, the recommendation service was useful as a query expansion tool, suggesting alternative search strategies when users were boxed in by small or single result sets.
Plans for future work consist of a mix of shorter- and longer-term initiatives that extend the work done to date. Shorter-term, more discrete activities include support for multi-word spelling correction; incorporating persistent personalization into the prototype as a building block for additional recommending work; and an exploratory effort to identify potential applications and stumbling blocks associated with retrieval in a mixed metadata/full text environment. Longer-term tasks include additional work on automated strategies for grouping and clustering to better support search and presentation of very large data sets; extended work on recommending techniques; and investment in user-centered design and integration of new services.
Sponsors and Partners
Funding for this project was provided by the Andrew W. Mellon Foundation. The UCLA and UC Berkeley libraries, the Research Libraries Group and the Online Computer Library Center supplied circulation and holdings data used in relevance ranking and recommending experiments.
CDL Project Team and Contributors
About a dozen CDL staff members were involved as team members in this project, participating in implementation or assessment activities or offering their expertise as advisors. This team met regularly over the course of the project.
Project LeadPeter Brantley, Director of Technology
Implementation TeamKirk Hastings, Text Systems Designer
Martin Haye, Programmer (Contractor)
Steve Toub, Web Design Manager
Colleen Whitney, Programmer and Project Coordinator
Assessment TeamJane Lee, Assessment Analyst
Felicia Poe, Assessment Coordinator
Lisa Schiff, Digital Ingest Programmer
AdvisorsLynne Cameron, Manager, Ingest Services
Patricia Martin, Bibliographic Services Manager
Roy Tennant, Service Design Manager
Brian Tingle, Content Technology Liaison
Many other individuals at CDL contributed to this project significantly by facilitating or carrying out discrete tasks including data acquisition and analysis, scripting, and systems support.
Annita Auyang, Database Administrator
Rebecca Doherty, Data Integrity Coordinator
Erik Hetzner, Digital Ingest Programmer
Sean O'Hara, Systems Architect
Raymund Ramos, Systems Architect
Michael Russell, Development Programmer
Virginia Sinclair, Bibliographic Analyst
Randy Lai, Digital Ingest Programmer