By Ellen Meltzer, Information Services Manager; Photo by Craig Thompson, Web Producer
How extraordinary to have an undergraduate senior thesis portend the themes throughout one’s career! That’s the case for Stephen Abrams, CDL’s Senior Manager for Digital Preservation Technology who arrived at CDL in February of 2008. (Members of the University of California Curation Center, UC3 (previously known as the Digital Preservation Program),of which Stephen is a member, also include Patricia Cruse, Scott Fisher, Erik Hetzner, John Kunze, Margaret Low, David Loy, Mark Reyes, Tracy Seneca, Marisa Strong and Perry Willet.)
Stephen provides leadership in guiding the UC3 primarily in 3 areas:
First, the Digital Preservation Repository (DPR). The DPR is the primary technical infrastructure that manages long term retention of digital objects. The DPR is moving to a new generation of software; the earlier software was originally designed nearly 6 years ago. In the intervening years, Stephen points out that we’ve have learned a great deal about the best way to provide preservation services and are at the beginning of a major project to re-conceive and re-implement the repository. One of main goals we’re trying to accomplish is to ensure the new repository will be more responsive to needs of customers, especially as our customers are becoming more varied, both in the types of units that contribute to the repository and the types of contents we’re preserving. Traditionally we have worked closely with campus libraries to preserve cultural heritage texts and images. More recently, we’ve expanded our scope to include new campus constituencies interested in data sets in the social and experimental sciences.
Stephen states that we need to expand our capacity to deal with new content types and an increasingly diverse set of users while still continuing to support our traditional users. One way to do this is by a new conceptualization of the repository. Previously, we thought of the repository as a large monolithic system or place, managed centrally. That concept breaks down when dealing with diverse sets of content with diverse sets of requirements. CDL is now working on devolving our preservation functions into a set of independent, but interoperable micro-services. Since each is small and self-contained, they are collectively easier to develop, maintain, and enhance. Although each is narrow-scoped in function, complex behavior can nevertheless emerge through the strategic combination of the services.
Second, Stephen oversees the Web Archiving Service (WAS), keeping an eye on it to ensure that it remains consistent with our other initiatives. The Web Archiving Service, ably run by Web Archiving Coordinator Tracy Seneca, has been in operation for about a year; recently, we began providing public access to web resources (see http://cdlinfo.cdlib.org/blog/2009/07/08/public-access-to-web-archiving-service-goes-live/).
Third, Stephen serves as lead on the multi-year, multi-institutional, NDIIPP-funded JHOVE2 initiative. In this project, the CDL is collaborating with Stanford and Portico to develop a next- generation open source format-aware characterization system. (At this point, I needed to ask what that was.)
Stephen explained that characterization is an automated process of determining the significant properties of digital objects. Any digital object is a representation governed by rules of format that specify syntactic and semantic requirements. During characterization we can examine an object and, by being cognizant of the underlying format rules, we can extract the significant properties. In a digital document, for example, we want to know the fonts used to be able to ensure that we can properly continue to display the text in the future. For digital images, we need to understand the way in which color is represented to ensure accurate reproduction.
JHOVE1, which Stephen helped create, was widely used in the preservation community; now it’s 5-6 years old and has some inadequacies. One of the goals of JHOVE2 is to remedy that, and to provide new features.
Characterization becomes important when operating a Preservation Repository. Sometimes it’s clear what format you’re expecting to receive—depositors can tell you in great detail; other times you don’t know what you have until it arrives. It’s useful, still, to verify what you did actually receive; people and systems make mistakes. Sometimes you get things you don’t expect. Characterization also helps to categorize items in order to take advantage of efficiencies by automating processes. This can only be done effectively if parallel workflows are properly classified. Characterization is a way to decide which workflow something goes into. Audio files are different from documents; color images are different from bi-tonal ones. This is far more than you may want to know on these subjects, but Stephen is someone who is passionate about what he does and I felt he could have continued to speak rapturously about these subjects.
Immediately before arriving at CDL, Stephen served as Digital Library Program Manager at Harvard University Library. And prior to his work at Harvard, he spent 9 years at MIT working as a research engineer in the Department of Ocean Engineering where he worked on grant-funded software for the design and manufacture for naval vessels. His expertise was on scientific and engineering visualization, where he turned numbers into pictures. As the Cold War wound down in the late eighties, there were fewer funding sources for these projects. He began working on information retrieval problems for the Department of Commerce and Interior. The information retrieval problems lead Stephen to the world of digital libraries.
It was hard for me to imagine that even before this, Stephen spent 9 years at a small company in Pennsylvania: Swanson Analysis Systems—leading developers of finite element analysis used in structural analysis. There he also worked on the development of engineering visualization solutions.
Now, back to where we began. Stephen’s undergraduate thesis was on a problem in celestial mechanics — the Three-body problem (I encourage you to look this up in Wikipedia, or elsewhere). One aspect of his research was to develop a graphics display system, in which he had to program the math involved and program for visualization. With an undergraduate degree in mathematics from Boston University and a Master’s Degree in art and architecture from Harvard, Stephen went looking for work on the scientific side of the two choices “It pays better”, he quipped. The themes that interested him in his undergraduate thesis have followed him throughout his career.
Stephen was aware for some time of the interesting and innovative work going on at the CDL, the University of California, and partner institutions. Coming here provided Stephen with the opportunity to apply himself more deeply to the “incredibly important” problems in digital preservation. Of course, transplanted easterners always are drawn by the weather, but there were many things professionally and personally that drew him here.
The challenges are real: There is more useful work that could be done than time to do it. The main thing is trying to prioritize appropriately—you put together a multi-year road map so that we can be where we need to be at the end of the day; approaching larger problems through small incremental steps. In addition, he finds there’s such a broad constituency at UC with people working on amazingly innovative things. Attempting to come up with comprehensive and effective solutions for any one thing can be a great challenge–just trying to ensure our services remain responsive to users as their needs are known now and as they change is daunting. We’re so glad Stephen is on board to help tackle these demanding issues.