Inside CDL

eXtensible Text Framework (XTF)

The CDL eXtensible Text Framework (XTF) is a flexible indexing and query tool that supports searching across collections of heterogeneous data and presents results in a highly configurable manner. The highlights of the XTF system are described in an online brochure [PDF] .

The system is divided into four components:

  1. crossQuery: The front-end to the collection search system.
  2. dynaXML: Interface to individual documents.
  3. Text Engine: Used by crossQuery and dynaXML to perform text searches.
  4. Indexer: Full-text indexer based on Lucene.

The CDL uses XTF as a building block for new services and has used it to replace a number of systems previously used for text searching (i.e., DLXS, Greenstone, DynaWeb).

As of 2008, CDL has deployed XTF in the following ways:

  1. OAC texts and eScholarship Editions search and display (January 2005).
  2. OAC finding aids search and display (January 2005).
  3. OAC images search and display (September 2005).
  4. Calisphere search and display (2006).
  5. Mark Twain Project Online search and display (2007).

The Encyclopedia of Chicago, a collaboration between the Chicago Historical Society, Northwestern University, and Newberry Library, is the first non-CDL project to deploy XTF in production. Other institutions exploring XTF include: Research Libraries Group, Indiana University Digital Library Program, Duke University Press, the UC Berkeley Library, and University of Kansas Digital Initiatives.

Downloads and Documentation

System Diagrams

The following diagrams give a general overview of how documents are indexed, stored, queried, retrieved, and displayed using XTF (somewhat outdated).

  • System architecture diagram: A general illustration showing the roles the XTF components play in the user experience. [GIF]
  • Collection searching diagram: A more detailed view of the collection searching process, covering query parsing and results formatting. [GIF]
  • Individual object display diagram: A more detailed view of the object display and internal search mechanisms, covering request parsing, authentication, and document formatting. [GIF]
  • Text indexing diagram: An illustration of the workflow for the creation of collection indexes. [GIF]

Support

Implementers

While CDL does not directly support XTF implementers, we do make a good-faith effort to address the needs of the XTF community through the following resources on sourceForge:

  • The xtf-user email list for those trying to set-up and use XTF. It is monitored by the principal developers of the application.
  • The Support Request Tracker is for support inquiries that go beyond a simple question. For the most part we don't have the resources for these kinds of requests, but it's possible others using XTF will.
  • The Feature Request Tracker is the place to bring our attention to missing features that we might like to consider adding to our development cycle.

Developers

SourceForge resources for XTF developers and others who are interested in contributing to the architecture

  • The xtf-devel email list is where developers share their ideas. It also logs all CVS commits.
  • The Bug Tracker is the place to submit bug reports.
  • The Patch Tracker allows developers to submit XTF patches for our approval.
  • Access to the CVS repository is also available