A growing number of organizations are using the California Digital Library’s Web Archiving Service to build comprehensive archives of large university campuses. The web has become the primary mode of communication and publication for campus events and issues, and web archives preserve both individual publications and the overall ‘presence’ of an organization on the web. The Web Archiving Service offers tools enabling librarians and archivists to easily archive and organize the hundreds of websites that compose the web presence of any large organization, be it a university or a state government.
The archivists and librarians involved in this work are often faced with a daunting task – how does one identify all of the sites that are within the scope of such an archive? While they begin with strong expertise concerning the organization they’re documenting, there is usually no central source one can consult to comprehensively identify the many campus institute sites, student organizations, newsletters and grant project sites that may fall within the domain of a campus web archive.
Michelle Light, UC Irvine’s Head of Special Collections, Archives and Digital Scholarship and intern Christine Kim faced just that challenge when they recently undertook a comprehensive archive of the UC Irvine web presence. UCI’s University Archives used to document campus activities broadly by collecting various units’ printed publications. However, as campus units increasingly only publish information on the web, Light believes that capturing UCI’s web presence is essential to fulfill the University Archives’ mission to preserve campus history.
Fortunately, the Web Archiving Service provides not just tools, but help desk and consultation services to support both the harvesting phase and collection development phase of an archive. The WAS support team at CDL ran several experimental web captures and carefully analyzed the results to see how many individual UC Irvine websites could be identified automatically, without requiring someone to browse multiple directories to compile a list. This research turned up 532 unique UC Irvine “subdomains” – URLs ending in “uci.edu” that might each represent a distinct website. The WAS help team provided this list to the UC Irvine curators, who were then able to check the quality of the list, and augment it with their own research.
The UC Irvine archivists retained 383 sites from the CDL list, and added an additional 111 sites through additional research and consulting UCI’s organizational charts. The first harvest was conducted in January and the resulting archive contains nearly 500 websites. The Irvine archivists also provided standardized descriptive metadata and careful organization to the archive, building upon a set of best practices established by University of Michigan archivists (and WAS users) Michael Shallcross and Nancy Deromedi. * In doing this work, they carefully documented the time invested in the project. According to Michelle Light, Christine spent approximately 45 hours “developing an understanding of UCI’s administrative organization, creating and defining a classification system for the tags, classifying the websites, creating standard names for each site, running test captures, and finalizing the list.” The WAS support staff provided a strong starting point for that work, undoubtedly relieving Irvine staff of many more hours of investigation.
The investment of effort in the archive has already begun to pay off. Later in January, the WAS support staff were contacted by Jim Kreuziger, the Director of Web Communications at UC Irvine, who was interested in using the service to archive UC Irvine websites. He was delighted to hear that the very archive he needed was already under construction! The UC Irvine Libraries are using WAS to provide a very valuable service to the campus, and CDL’s WAS team is glad to provide the tools and support to do so.
Irvine is not the only UC campus building extensive archives of UC web publications. Davis University Archivist John Skarstad along with Collections Manager Sara Gunasekara and Head of Special Collections Daryl Morrison built a comprehensive archive of over 300 UC Davis websites in the late fall of 2011. The WAS support staff also provided Davis curators with initial research on the “ucdavis.edu” domain, which helped to lay the groundwork for that archive. Captures of UC Davis sites were run as campus events related to the Occupy movement made national headlines, and the archive reflects the impact of those events on administrative and departmental websites.
Both the Irvine and Davis archives are currently undergoing curation and quality review, so are not yet publicly available. We very much look forward to announcing public access for these archives when they are ready to be launched, and we thank the archivists at Davis and Irvine for their leadership and investment in documenting the UC’s rich web presence.
*The collection practices behind the University of Michigan Web Archive were described in a Society of American Archivists Case Study: “On the Development of the University of Michigan Web Archives: Archival Principles and Strategies”