Hurtling Toward the Finish Line: Should the Google Books Settlement Be Approved?

February 16, 2010 Author: Ivy AndersonCategories:

Tags:

HathiTrust

Ivy Anderson, Director of Collections
California Digital Library

Sagrada Familia book — (The image presented here is a mashup: for the actual book in HathiTrust, see http://hdl.handle.net/2027/uc1.b543888)

Late last week, Google and the plaintiffs filed their final briefs in defense of the Google Books Amended Settlement Agreement (ASA) that is before the New York Southern Federal District Court. As the rhetoric around the Settlement heats up to white-hot intensity in the final days before the Fairness Hearing on February 18^th, I’d like to offer a few personal thoughts from my vantage point at the California Digital Library.

The University of California Context

CDL and indeed the UC Libraries as a whole bring what is perhaps a unique perspective to this dispute. The University of California Libraries are Google’s second-largest library digitization partner; we are also the second-largest book digitization partner of the Internet Archive, thanks to generous funding in the past from Microsoft, Yahoo, the Alfred P. Sloan and Kahle/Austin foundations, and other sponsors. In all, UC Libraries have now digitized 2.5M books from their collections through these projects, both in- and out of copyright.

CDL also occupies an unusual position in this debate within our own community of scholars at the University of California, where some of our closest faculty colleagues are also among the Settlement’s most prominent critics. While many assume this to be an uncomfortable position, I don’t find it so. Like any complex enterprise, the Google Books project is appropriately viewed from many perspectives. The proposed settlement is hardly perfect; as Google acknowledges in its brief, it’s a compromise among parties with differing agendas and motivations. CDL is a staunch supporter (http://www.universityofcalifornia.edu/news/article/18850) of the underlying aims of the Google Books project to make the knowledge enshrined in the world’s great libraries discoverable and accessible across the globe, and we support the public benefits that will ensue, including the benefits to libraries, if the Settlement is approved. At the same time, public criticism has been good for the Settlement, producing very real improvements in the amended version that is now before the court; improvements that would not have been made without that criticism. Long live democracy!

Digitization Partnerships: The Promise and the Peril

Like many of the objectors, participating libraries went through their own period of outrage and indignation when details of the Settlement first came to light. What! We would have to buy back access to our own books?? Why did Google let us down in abandoning its fair use defense?? Why should the parties be allowed to create an artificial revenue model for works that are long out of print, books that would no longer exist at all outside of used bookstores if the libraries themselves hadn’t purchased and maintained them at great expense over decades and indeed generations?? How can they do this without our agreement as to terms, since it is we who have made these books available to them in the first place?? Hasn’t our stewardship paid for these books many times over?? Isn’t this why copyright law contains unique exceptions for libraries, in recognition of our mission to further the public good?? Wasn’t the appropriate use of our own copies in light of fair use principles our decision to make??

The problem with this view, of course, is that libraries did not initiate this enterprise, and we are not its only beneficiaries. The Google project placed two sets of commercial interests at loggerheads, with copyright law in the middle. Admittedly, libraries took a risk in engaging in a partnership so legally entangled.

But let’s be honest: though few seem willing to admit it, revitalizing the world’s heritage of books for a digital age – a task that many considered impossible only a few short years ago – appears within reach today almost entirely due to Google’s enterprising vision. Even the Open Content Alliance, which CDL joined a year before becoming a Google partner, was in some sense a response to GBS (although it had other important antecedents as well, thanks to Brewster Kahle’s equally inspired vision (http://slate.msn.com/id/2116329/)). When Google’s competitors withdrew their support for that project, no other funders stepped in to fill the breach. The plain fact is that despite the idealistic adjurations of some, the resources required to digitize our cultural book heritage on a grand scale are not likely to be marshaled in the U.S. by libraries and the public sector alone.

At least, not in our lifetimes. At CDL, we’ve done some estimating of what it would take to convert the roughly 15 million unique books in University of California library collections to digital form absent the Google enterprise using the best alternative technology available today. The answer? Half a billion dollars, and one and a half centuries.

And that is just the University of California’s books.

I like to compare this to the building of the great Temple of the Sagrada Familia in Barcelona, a city with which my family has an ancestral connection. When my husband’s grandmother left Barcelona as a young girl in the late 19th century, the Sagrada Familia had barely erected its first stone. In 2006 more than 125 years later, her great-granddaughter traveled to Barcelona for the first time, where she was able to observe Gaudi’s monumental edifice, still under construction. At this writing, completion is projected for 2026.

Like the Sagrada Familia, without the Google Books Project we could still be building the digital library of the future 100 years from now.

The speed at which Google is converting this content is not without costs of its own. Google’s iterative approach to building large-scale services has drawn criticism from some scholars (http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1972/1847) accustomed to work that is honed and polished before it is released. This is in part an argument about means, not ends. Like those progressive JPEG images that start out blurry on the screen and become sharper as the details fill in, Google’s services are improving over time as it continually upgrades and enhances its images and metadata. Over time we will be able to replace those missing or still-blurry pages with better versions. Where the value of the content warrants it, we can selectively invest in more meticulous rendering, textual markup, and other enhancements.

Two cases are illustrative here. CDL has digitized a large number of public domain books with the Internet Archive, some of which have also been digitized in our Google partnership. Although CDL had to suspend its Internet Archive book scanning project earlier this year after Microsoft withdrew its support and additional grant funding proved elusive, we have every expectation that we will take up comparable projects with Internet Archive in future, because its technology is better suited to certain types of uses (better artifactual rendering, for example). The Early English Books Online (EEBO) database marketed by ProQuest is another example in which through an innovative partnership with libraries and scholars basic scans are enhanced with detailed markup for a subset of carefully-selected works.

In the meantime, a great deal of value is already being derived from the Google work as it stands today. Students and scholars report finding much formerly-hidden material, journalists and etymologists are mining its content for historical information, and even some of Google’s severest critics have said that they can no longer imagine life without GBS. This is neither an either-or proposition nor a zero-sum game. All of these services are fulfilling a niche, along with libraries, in a new information ecology that we are only beginning to understand even as we participate in its unfolding.

EEBO, by the way, is also a good example – of which there are countless others that one can point to – of the long history of successful library-vendor partnerships to make content available to a wider audience. Like the much-feared Google Institutional Subscription, most of these products have a monopoly over the particular aggregations they market; it simply makes no economic sense to digitize certain corpora over and over again, particularly when libraries themselves are the primary consumers. Somehow, libraries have survived (and even thrived) through these arrangements, and students and scholars have benefited.

Settlement Pro and Con

So in the long run, is the Google Settlement a good thing, or a bad thing? Before answering that question, let’s look at just a few of the major criticisms that have been levied against the Settlement.

The Institutional Subscription will become too expensive because it has no meaningful competition. Well, it’s hard to know that, of course. In fact, we don’t even know today how many books it will contain, nor what the scan or OCR quality of the content will be given the variability of the overall corpus. But we do know that there are at least three checks on the institutional subscription price that should mitigate price-gouging. First, the broad distribution requirement in the Settlement’s dual objectives means that prices cannot become so high that few choose to subscribe. Second, libraries themselves are savvy evaluators and negotiators of online content who can be expected to evaluate this offering rigorously and skeptically, and to eschew a subscription unless the price is acceptable for the benefit derived. Since none of us knows how our users will engage with this material, these assessments ought to be conservative. Third, the provisions for pricing arbitration built into the agreements between Google and the participating libraries will allow them to challenge price increases that they deem unwarranted; a provision that is intended to be exercised not on the basis of narrow self-interest among a small set of contributing libraries but on behalf of all libraries.

Academic authors want to release their books, not see them locked up. Indeed, no disagreement here; and the amended Settlement now explicitly provides for this (according to Google and the plaintiffs, this was always possible, but in the ASA it is now called out). We intend to work proactively with rights holders who would like to enable broader access to their books and to develop mechanisms that can help to make this straightforward.

The Settlement will give Google a monopoly over orphan works and is anti-competitive. It’s hard for me to see how Google’s activities to date can be viewed as anti-competitive when GBS is almost single-handedly responsible for the ebook explosion that is swirling all around us, with new entrants popping up every day. That may be a controversial assertion, but ebooks and ebook readers were a languishing backwater until Google stimulated the market by putting books online through its library and publisher partner programs. If anything, Google’s entrance into the retail space is likely to engender even fiercer competition. It seems cynical at best for rival behemoths Microsoft and Amazon to decry Google’s impending monopoly over a sliver of the ebook market – much of it of uncertain commercial value – under the noble-sounding rubric of the Open Book Alliance. But then, competition makes strange bedfellows. As to orphan works, the Settlement should if anything goad us all the more toward a legislative solution. It is as irksome to me as it is to other critics that Google should be uniquely empowered to collect royalties on behalf of absent rights holders who may have long ago relinquished any economic interest in their works. Still, the ASA addresses this in a far more satisfying manner than its predecessor. Finding a better long-term solution to the orphan works problem is something we all can get behind.

When the purposes that we first envisioned when embarking on these projects – all arguably fair uses of this content – are reviewed against the Settlement impacts, it’s hard to view the Settlement as anything but a positive development. More books will be available in full view, both to libraries and to consumers. New services will be developed for print-disabled users and for largescale computational analysis, further unlocking digitization’s transformative potential. Disclosure of rights information through a central registry (at least for U.S. books) is likely to have far-reaching impacts, facilitating the eventual orderly release of books into the public domain. Google’s competitors are likely to join the push for orphan works legislation, increasing its chances of success. And with the Settlement behind us, we can all proceed in an environment of greater certainty.

What if the Settlement is not approved?

For libraries, failure of the agreement would hardly be a crisis. The benefits that we initially envisioned – improved discovery and full text search of our vast legacy collections, and broad public availability of works that are out of copyright or otherwise released by their copyright owners – will still be realized. The fears of some Settlement objectors – of monopolistic pricing and the forced commercialization of materials that are long out-of-print – will melt away like the elusive Vancouver snow. Participating libraries may still choose to undertake novel services, without the unwelcome restrictions imposed by the Settlement. As long as Google and others continue to partner with us, we will go forward in reinvigorating our collections for a new digital age.

The Google Settlement is fundamentally about whether Google and rights holders will be allowed to implement a particular set of business models for a certain set of books. I believe the Settlement should be approved, because it will create new and valuable services for libraries as well as consumers. But many of Google’s participating libraries have their own plans for these books, plans that do not ultimately depend on the outcome of the Settlement. The greatest risk for libraries if the Settlement is not approved is that further legal setbacks might lead Google to abandon its interest in library digitization altogether. If that were to happen, a unique opportunity would be lost that is not likely to be repeated in our lifetime.

Life Beyond Google Book Search

What of our relationship to the Google Books project itself? Some of the concerns we hear from faculty have nothing to do with the Settlement per se, but rather with the long-term implications of GBS for library collections and services. Let me close with a few words about some of those concerns.

To our scholars who worry that we are about to throw our physical collections overboard in favor of digital surrogates of sometimes uneven quality, I want to say: not to worry. True, libraries everywhere find themselves having to consign more and more of their physical collections to remote storage as campus space grows increasingly scarce and user preferences migrate online. And some libraries – the UCs far less than others – are addressing the space crunch by de-accessioning low-use materials that are widely held with the knowledge that they can borrow these items from another library if need be. (Many [cooperative initiatives] are now underway to share such information and ensure that enough copies are retained throughout the nation’s system of libraries to protect the integrity of the scholarly record.) That train has already left the station, and it’s happening independently of largescale digitization. What digitization offers is a valuable complementary mitigation strategy: we can now make those remote collections eminently browsable, saving time and expense both for users and for libraries. As a library user, you can now determine whether that book is really what you’re looking for before you request it, not afterward – and in some cases, the digital surrogate may indeed be all that you need. Libraries can promote these ‘hidden’ volumes more effectively to their users, while limiting delivery costs to just those items that are truly wanted. This browsable and/or searchable digital surrogate – which is the quality level that most of the Google mass digitized scans are aimed at – is not a replacement for the original print book, and was never intended to be.

To our scholars who worry that we are outsourcing our library collections and services to Google, again I want to say: please don’t worry on this score either. Far from abrogating our mission as stewards of the cultural record, we who have opened up our collections to digitization are shouldering this role with vigor. While Google and others are making these books discoverable online to a general audience, the University of California along with other peer institutions is creating a robust shared access and preservation service for our mass digitized books, one that adheres to professional standards, through our partnership in a ground-breaking enterprise called the HathiTrust. If you haven’t heard of HathiTrust yet, you soon will. No UC library user need go to Google to search the full text of our books, or to find accurate bibliographic information, or to view and download those that are in the public domain; s/he can go to http://catalog.hathitrust.org/ and be reassured that those books will be there, in ever-improved versions, for the long-term. HathiTrust now numbers 5.4 million volumes from 26 libraries and is growing at a rapid rate, all searchable, all viewable if in the public domain (or otherwise rights-cleared), and all designed to inure to the long-term benefit of the nation’s libraries and their users. The digital library of the future resides not with Google, but with us. And we are building it today.

At the same time, Google, Internet Archive, and others, are providing an invaluable service in bringing the vast holdings of the great research libraries to a worldwide audience and integrating that content with general-purpose internet search services and other content. As one colleague has written, “Who among us has not benefited from a Google search?” In participating in these efforts, we are fulfilling our long-standing public service mission. The Google Settlement, if approved, will further these aims by providing more content, in more ways, to an even wider audience.

But in the end, approval of the Settlement is not a make or break event for libraries. Despite the claim that the Google Settlement promises to build “the greatest library in history,” libraries are not leaving the future of information to Google and these other partners alone. Nor need we wait, Godot-like, for fugitive national legislation to begin the work of serving up our cultural heritage in digital form. Through a combination of efforts, including public-private partnerships such as that of libraries with Google, we can go forward in this transformative enterprise together.