Wednesday, June 04, 2008

Darnton on Libraries

As I mentioned a week or so ago, I wanted to take some time and mull over my thoughts on Robert Darnton's NYRB essay "The Library in the New Age" before I responded to it. I think I've considered it long enough now to offer some coherent thoughts on the subject, but I guess you'll all have to be the judges of that. I also went back and re-read Anthony Grafton's New Yorker piece from last November, "Future Reading," which is very similar in tone and in message on Darnton's.

In the first section of his essay, after offering the standard capsule history of "information technologies" (in which he suggests that there have been four major shifts in human/information interaction: writing, the codex, movable type and electronic communications), Darnton makes the very good point that an element of continuity remains regardless of the information technology: what he calls the "inherent instability of texts." He argues "every age was an age of information, each in its own way, and ... information has always been unstable." He uses the very apt example of news, which, he reminds us, "is not what happened but a story about what happened." His cases in point, taken from his own experiences and a fascinating-sounding project done by one of his graduate students, are apt and useful, and his argument that "newspapers should be read for information about how contemporaries construed events, rather than for reliable knowledge of the events themselves," is right on the money (just a quick glance of some of John's excellent posts over at Boston 1775 will prove Darnton's point instantly).

Because of this instability, Darnton suggests, we should rethink our idea of 'information': "It should not be understood as if it took the form of hard facts or nuggets of reality ready to be quarried out of newspapers, archives, and libraries, but rather as messages that are constantly being reshaped in the process of transmission. Instead of firmly fixed documents, we must deal with multiple, mutable texts. By studying them skeptically on our computer screens, we can learn how read our daily newspaper more effectively - and even how to appreciate old books." Again he offers some very important examples of pre-Internet instability, from the myriad differences between copies of Shakespeare's First Folio to the constant changes Voltaire made to his works (so frequent that booksellers complained, he reports, and "eagerly anticipated" his death, when they could finally sell editions of his works that he couldn't made changes to), to the various pirated editions of early modern "best-sellers" commonly printed by copyright-evaders.

These reflections on the inherent instability of texts are quite relevant to the second major portion of Darnton's piece, in which he examines the role of the research library as an element of today's information culture. He correctly points out that the view of research libraries as "citadels of learning" or storehouses of "everything known" (which he says was held by "students in the 1950s") is of course quite flawed, but that so too is the view (held, he suggests, by "students today") that "knowledge comes online, not in libraries ... [that] information is endless, extending everywhere on the Internet, and to find it one needs a search engine, not a card catalog." To reconcile (if that's possible) these two views, Darnton suggests, requires an examination of "the problems posed by Google Book Search," which "promised to be the ultimate stage in the democratization of knowledge set in motion by the invention of writing, the codex, movable type, and the Internet." It might just do that, but making digital copies of books available online does not prove that Google Books will "make research libraries obsolete. On the contrary, Google will make them more important than ever."

Darnton offers eight points in support of his thesis; I'll tackle each in turn, since I think some may hold more water than others (and other overlap a bit):

1. Google will never put all printed books online, and to think that they will "raises the danger of creating false consciousness, because it may lull us into neglecting our libraries." This is the "if it's not online, it doesn't exist" mindset that drives teachers, professors, and librarians (among others, I'm sure) absolutely bonkers, and it's absolutely true that no digital book-scanning project will ever include every book printed. However, a second element of this argument seems to make the exact opposite point from that which Darnton intends. He argues that "criteria of importance change ... so we cannot know what will matter to our descendants. They may learn a lot from studying our Harlequin novels or computer manuals or telephone books."

It's true: they might, but these things aren't being systematically kept by research libraries anyway (see this post from January for more on the topic of changing sensibilities and library retention), just as the ephemeral items Darnton mentions from earlier periods (chap-books, almanacs, &c.) are scarce today because they weren't saved during their own time. Digitization of ephemeral items may in fact be the optimal method for long-term retention and preservation of such items, which in these times of scarce library resources are hard to justify keeping on the shelves once they've outlived their primary usefulness. So I agree with Darnton that Google Book Search (or any other digital system) will never be complete, but I think that in some cases, for certain subsets of books, it might actually make more sense to retain reliable digital surrogates than to retain paper copies (this does not necessarily mean relying on Google to do so; see point 3 below).

2. The combined holdings of all U.S. research libraries are some 543 million volumes. Google's initial plan for digitization was 15 million volumes. Of the five major research libraries initially involved with the project (NYPL, Harvard, Michigan, Stanford, Bodleian), fully 60 percent of the books being digitized exist in only one of the locations. And none of those are from special collections. I think this second point really is a corollary to the first (building on the idea that the capacity for complete digitization simply doesn't exist), but I thought it worth mentioning just for the starkness of the raw numbers.

3. Copyright will continue to pose a problem. This we knew, but Darnton adds an element I hadn't seriously considered before: even if copyright issues can be resolved for books published in the past, how will Google keep up with future publications? "Better to increase the acquisitions of our research libraries than to trust Google to preserve future books for the benefit of future generations. Google defines its mission as the communication of information - right now, today; it does not commit itself to conserving texts indefinitely." Indeed, and one of key takeaway points for librarians: just because Google's scanned a book and you can get to it today doesn't mean it will be there tomorrow.

4. Google may disappear. "Electronic enterprises come and go. Research libraries last for centuries. Better to fortify them than to declare them obsolete, because obsolescence is built into the electronic media." Correct, of course; but what Darnton fails to suggest is that part of fortifying research libraries may mean involving them with digitization projects (along the Internet Archive model, perhaps?) as part of their current duties.

5. "Google will makes mistakes." And they do. Here Darnton's referring to image quality ("skip pages, blur images, and fail in many ways to reproduce text perfectly"). Digitization isn't perfect, and it's much less so when it's being done on as massive and industrial a scale as Google's efforts are. Having good metadata so you can track down the imperfectly-scanned book at a library or elsewhere is key, and of course as we know Google Books often fails in this regard as well.

6. Permanence. Here Darnton's roots as a book historian show: "the best preservation system ever invented was the old-fashioned, pre-modern book" - that is, one printed on good rag paper, sturdily bound. Those'll hold up indefinitely, if treated right. Digital files, not so much.

7. Bibliographic control. This, to me, is probably the most important of Darnton's criticisms. How will Google display search results which include different editions of the same book? "... [W]ill it make all of them available? If so, which one will it put at the top of its search list?" Presumably, Darnton notes, Google will develop (or has developed) an algorithm to display book search results. "But nothing suggests that it will take account of the standards prescribed by bibliographers, such as the first edition to appear in print or the edition that corresponds most closely to the expressed intention of the author. Google employs hundreds, perhaps thousands, of engineers but, as far as I know, not a single bibliographer. Its innocence of any visible concern for bibliography is particularly regrettable in that most texts, as I have just argued, were unstable throughout most of the history of printing" [emphasis mine]. Without good metadata and better search methods, casual users don't know whether the edition they've found via Google is a bibliographically-sound edition or something less, while more serious scholars will, for the near future at least, still have to turn to hard copies.

8. Digitization "will fail to capture crucial aspects of a book ... the texture of its paper, the quality of its printing, the nature of its binding." &c. Indeed. For most people, as Darnton notes, this won't matter since their interest is with the text and not the packaging - but for bibliographers, book historians, and other students of book culture, these things matter and matter a great deal.

Darnton ends his essay with a paean to the old-fashioned rare book room and with an apt argument about the traditional book's "effectiveness for ordinary readers" before concluding, as Grafton did, that traditional libraries and digitization projects can and should work in tandem, but with the library was the ultimate backup: "long live Google, but don't count on it living long enough to replace that venerable building with the Corinthian columns. As a citadel of learning and as a platform for adventure on the Internet, the research library still deserves to stand at the center of the campus, preserving the past and accumulating energy for the future."

He's right, but we all knew that going in. While I disagree slightly with some of his arguments (or at least how he worded them here), Darnton's got the fundamental message just about right.