We may never know if Shakespeare had a sister, but we can be certain he didn’t have a hard drive. What if he had? Details of his writing process and his life currently a mystery might be pitilessly exposed.
As scholars will tell you, there are no manuscripts of the plays surviving in the Bard’s own hand. The text of King Lear, for example, comes to us from two published quartos and the First Folio (1623), with hundreds of lines and thousands of words differing between them. In the so-called “bad quarto” of Hamlet, a certain soliloquy begins: “To be or not to be. Aye, there’s the point, /To Die, to sleep, is that all, aye all.” The speech is also placed differently, in Act II, Scene 2, rather than its accustomed place in Act III, Scene 1. Today it is typically thought that the bad quarto is a memorial reconstruction of the play by an actor or spectator, but we can’t be sure. In any case, the texts are rife with ambiguities. Which versions are right? What was closest to Shakespeare’s own original (or final) intentions?
If Shakespeare had had a hard drive, if the plays had been written with a word processor on a computer that had somehow survived, we still might not know anything definitive about Shakespeare’s original or final intentions — these are human, not technological, questions — but we might be able to know some rather different things. We might be able to know, for example, the precise date on which he began composing Hamlet indeed the precise minute and hour, time-stamped to the second. We would be able to know how long he had spent working on it, or at least how long the file containing the play had remained open on his desktop. We would very likely have access to multiple versions and states of the file, and if Shakespeare had “track changes” turned on while he wrote, we would be able to follow the composition of a soliloquy keystroke by keystroke, each revision also date- and time-stamped to the second. We might discover the play had originally been called GreatDane.doc instead of Hamlet.doc. We might also be able to know what else he had been working on that same day, or what Internet content he had browsed the night before (since we’ll assume Shakespeare had Web access too). While he was online, he might have updated his blog or tagged some images in his Flickr account, or perhaps edited a Wikipedia entry or two. He might even have spent some time interacting with others by performing with an avatar in Second Life, an online place where all the world is truly a shared virtual stage.
This scenario will strike some readers as naïve, and not only because of the proposition of Shakespeare surfing the Internet. More significantly, the kind of data about electronic documents I have been describing — known as “metadata” in the trade — is by no means unimpeachable. Date and time stamps can be spoofed, and Mac and Windows systems handle them differently. (On a Macintosh, when a file is copied, its original creation date remains intact, whereas on a Windows machine, the creation date becomes the date of the copy.) The simple act of renaming a file will make it look like an entirely new creation, severing its relation to any earlier version saved under a different file name. A computer’s internal clock can be wrong, and files can be erased beyond any practical hope of recovery.
But technical complications notwithstanding, while we know Shakespeare didn’t have a hard drive, almost all writers working today do. Today nearly all literature is “born digital” in the sense that at some point in its composition, probably very early, the text is entered with a word processor, saved on a hard drive, and takes its place as part of a computer operating system. Often the text is also sent by e-mail to an editor, along with ancillary correspondence. Editors edit electronically, inserting suggestions and revisions and e-mailing the file back to the author to approve. Publishers use electronic typesetting and layout tools, and only at the very end of this process almost arbitrarily and incidentally, one might say is the electronic text of the manuscript (by now the object of countless transmissions and transformations) made into the static material artifact that is a printed book.
This new technological fact about writing is already having an impact, from office work to government and academe to literature and the creative arts. Sending a file as an attachment to an unwitting recipient without having first accepted or rejected track changes is a common workplace gaffe, since the recipient can view every step of the composition process. The Modern Language Association recently published a note in its newsletter informing its readership that the owner of the software on which a Microsoft Word file originates is identified in the document’s easily accessible properties window, thereby jeopardizing blind peer review. Libraries have already been faced with the question of how to accession a hard drive as part of a literary estate (Emory University’s acquisition last year of Salman Rushdie’s papers, which include several of his laptops, is a good example), and an essay in The New York Times Book Review lamented the loss of literary heritage as correspondence among authors, editors, and publishers takes the form of evanescent e-mail or even instant messaging (or cellphone text messages).
To grasp what is at stake, we first have to understand a little about computers themselves. Computers are universal machines, meaning that they are machines designed to imitate other machines. This is the essence of software: Open one program and your computer is a 21st-century typewriter, open another and it is a video-production studio. It’s the same physical machine, but a totally different virtual environment. There is therefore no natural state for electronic text on a computer. Often we’re attracted to its perceived fluidity and flexibility, but that is an outcome of a particular way of modeling a document, not the inherent properties of the medium — as anyone who has ever tried to edit a file for which they lack appropriate permissions will know, as suddenly all that malleable text becomes maddeningly resolute and unyielding.
Likewise, we can edit and change an electronic document without leaving a trace, but we can also arrange to have our every keystroke recorded. We can save the same file over and over again, thereby overwriting any changes and edits, but we can also check documents in and out of a secure repository that will safeguard and maintain all versions and allow users to reconstruct the branching paths between them. (Wikipedia works in much this way, something often overlooked in discussions of its pros and cons; for every entry in Wikipedia, you can access a page history, allowing you to see how it has been edited and including full access to every earlier version, thereby helping a user to determine the stability and reliability of the information.)
Computers can even be programmed to display wear and tear on electronic documents, with frequently used files graphically altered to reflect their more frequent handling, in the same way the pages of a book become worn with use. Reading from a screen will never be the same as reading from the page, but that isn’t the point; what we call an electronic document is actually a complex assemblage that is the constructed outcome of a set of assumptions and desires about how we want documents and text to behave in this environment. Often when attempting to make a definitive statement about the way computers work, we are really just commenting on an inherited set of conventions that are subject to change. If we don’t like the way our electronic documents work today, in other words, we can decide to make them work otherwise tomorrow.
At the outset of the personal-computer era, authors tended to be drawn to the experimental aspect of the medium, and what emerged were the incunabula of the electronic age. In 1984, Robert Pinsky, the poet, wrote a piece of interactive fiction called Mindwheel, and a few years later, the author Michael Joyce, long fascinated by the prospect of “a story that changes every time you read it,” wrote Afternoon, a Story after first helping to design the specialized authoring software on which it runs. Nowadays, however, photographs of an author at work often include a desktop computer or laptop in the same way that medieval portraits of scribes and saints show them in their studies, at their steep-sloped writing desks, with pens, inkwells, and other writing paraphernalia scattered about. Some writers, of course, still compose in longhand, and a few no doubt maintain a Rolodex of secondhand vendors from whom they can procure spare parts for their typewriter.
But by and large there has been a massive shift in the technological foundation of our writing, literary and otherwise; in the particular realm of literature and literary scholarship, this means that a writer working today will not and cannot be studied in the future in the same way as writers of the past, since the basic material evidence of their authorial activity — manuscripts and drafts, working notes, correspondence, journals — is, like all textual production, increasingly migrating to the electronic realm.
The implications here extend beyond scholarship to a need to reformulate our understanding of what becomes part of the public cultural record. If an author donates her laptop to a library, what are the boundaries of the collection? Old e-mail messages, financial records, Web-browser history files? Overwritten or erased data that is still recoverable from the hard drive? Since computers are now ground zero for so many aspects of our daily lives, the boundaries between our creative endeavors and more mundane activities are not nearly as clear as they might once have been in a traditional set of author’s “papers.” Indeed, what are the boundaries of authorship itself in an era of blogs, wikis, instant messaging, and e-mail? Is an author’s blog part of her papers? What about a chat transcript or an instant message stored on a cellphone? What about a character or avatar the author has created for an online game? The question is analogous to Foucault’s famous provocation about whether Nietzsche’s laundry list ought to be considered part of his complete works, but the difference is not only in the extreme volume and proliferation of data but also in the relentless way in which everything on a computer operating system is indexed, stamped, quantified, and objectified.
With terabyte-scale drives due to start shipping in the next 18 months, it will soon be possible, indeed the norm, to save every version of every file by default because it will take more attention and energy to go to the trouble of locating and deleting it than to simply leave it on a disk whose storage capacity, at least for textual information, is for all practical purposes infinite. Future literary analysis may depend as much on data mining and visualization as on scholarly judgment and critical instinct. “To be or not to be. Aye, there’s the point” might be shown to be statistically out of step with hundreds of other megabytes of textual data by dint of computational pattern recognition. More exotically, perhaps, literary editors may need to acquaint themselves with techniques of forensic information recovery, so as to help restore fragments of deleted or overwritten files that could prove vital in reconstructing a manuscript.
The issues raised here are not merely speculative. In early 2006, I spent a week at the Harry Ransom Humanities Research Center at the University of Texas at Austin, where the Michael Joyce papers are in the process of being cataloged. The physical part of the Joyce collection resides in acid-free manila folders in turn housed within Hollinger boxes, some 50 of them, which I was able to request by exchanging handwritten paper slips with the collection staff. The first accession of virtual materials, however, has been lifted from the almost 400 diskettes that make up their original storage media and uploaded to an electronic repository system known as DSpace. They are online but can be accessed only from a dedicated laptop located in the center’s reading room. To actually work with the files, I had to download them to the desktop of the machine, where I used what means and know-how I could to get the cranky old binaries to execute on the up-to-date operating system. Sometimes I was unsuccessful. I suggested the addition of a hex editor, emulators, and other forensic tools to the utilities available on the computer. Since DSpace maintains the integrity of a master copy of every file, I could do what I pleased with the derivative I downloaded to my local desktop — hack at it, tweak it, break it. (This is not covered in the instructional video all new users of the collections are required to watch before they are admitted to the reading room.)
My experience at the Ransom center testifies to the extent to which our future understanding of our present (and future) literary imagination will depend on effective digital preservation. The challenges here are indeed momentous. Famously, in 1986, the British Broadcasting Corporation produced a digital edition of the Domesday Book on laser disc. The format was rendered unusable in just a few years, while the original has survived in legible form since the 11th century. What often goes unacknowledged in that story, however, is that the original has survived not only because of the inherent physical properties of ink and parchment and paper — it has also survived because we evolved the social practices necessary to recognize its significance and keep it safe, in a climate-controlled, limited-access vault.
There’s nothing inherent in the technology that makes e-mail or other forms of electronic writing especially susceptible to vanishing into the electronic ether. On the contrary, as Oliver North and other malefactors have learned, the stuff is remarkably pesky and pernicious. A single e-mail message may leave traces of itself on a dozen different servers as it makes its way across the Internet, a potential for proliferation that is further exacerbated by backup services at each site. While I don’t mean to minimize the very real challenges in the realm of digital preservation, in my view those challenges are best understood as at least as much social as technological. This brings me back to the fundamental nature of computers, which is that they really have no fundamental nature; to the extent that our current electronic records are fragile, unreliable, potentially misleading, and so forth, this is at least partly the result of implicit decisions made in how those records are constructed.
If we are worried that some modern-day Shakespeare isn’t keeping early electronic drafts of her work, then we should build the capability to do so into the tools she is now working with. If we are worried that popular file formats are proprietary and hopelessly corporatized, then we should educate people about the benefits of standards and open source. This point is especially crucial for individual writers and authors, since effective preservation begins on the end-user’s desktop. If the widespread perception is that electronic documents and records have no hope of surviving for posterity, then that will become a self-fulfilling prophecy as we all, individually and collectively, fail to take the steps necessary to ensure that they do survive.
The wholesale migration of literature to a born-digital state places our collective literary and cultural heritage at real risk. But for every problem that electronic documents create — problems for preservation, problems for access, problems for cataloging and classification and discovery and delivery — there are equal, and potentially enormous, opportunities. What if we could use machine-learning algorithms to sift through vast textual archives and draw our attention to a portion of a manuscript manifesting an especially rich and unusual pattern of activity, the multiple layers of revision captured in different versions of the file creating a three-dimensional portrait of the writing process? What if these revisions could in turn be correlated with the content of a Web site that someone in the author’s MySpace network had blogged?
Literary scholars are going to need to play a role in decisions about what kind of data survive and in what form, much as bibliographers and editors have long been advocates in traditional library settings, where they have opposed policies that tamper with bindings, dust jackets, and other important kinds of material evidence. To this end, the Electronic Literature Organization, based at the Maryland Institute for Technology in the Humanities, is beginning work on a preservation standard known as X-Lit, where the “X-" prefix serves to mark a tripartite relationship among electronic literature’s risk of extinction or obsolescence, the experimental or extreme nature of the material, and the family of Extensible Markup Language technologies that are the technical underpinning of the project. While our focus is on avant-garde literary productions, such literature has essentially been a test bed for a future in which an increasing proportion of documents will be born digital and will take fuller advantage of networked, digital environments. We may no longer have the equivalent of Shakespeare’s hard drive, but we do know that we wish we did, and it is therefore not too late — or too early — to begin taking steps to make sure we save the born-digital records of the literature of today.
Matthew Kirschenbaum is an associate professor of English at the University of Maryland at College Park and associate director of the Maryland Institute for Technology in the Humanities. He is also a vice president of the Electronic Literature Organization. His book, Mechanisms: New Media and the Forensic Imagination, will be published by MIT Press later this year.
Comments
Aden Evens Monday, August 13, 2007 Kirschenbaum’s call for standards and methods to preserve and ease access to digital archives is well argued and essential. I worry only that the task is not as straightforward as this article might make it seem. For example, “track changes” in MS Word does not reveal the details of the editing process, for only the final version of a given change is ultimately reflected in the saved document; one could not reproduce from the saved document the various edits that were eventually rejected or overwritten. Likewise, while Word automatically saves the author’s name in the metadata for the file, this saved data lacks the indelibility and singularity of a signature on a handwritten MS. If I write a document on my wife’s computer, its metadata will likely be stamped with her name unless I go to the trouble of changing it explicitly (which most of us do not do). The same malleability that weighs against a “natural state” of digital artifacts also works against authentication, for even the materiality of a digital object tends to be fluid and “flickering,” as Kate Hayles points out. The threat to the preservation of digital artifacts and their histories lies not only in the enormous amount of data produced but also in the transitory character of the digital which tends to leave forensic evidence but also to fragment, distort, and ultimately erase that evidence. Technical tools to archive artifacts and track the processes of their creation will be extremely helpful but the digital will always also defy such tools, fading into the ephemeral that is part of its essential nature. Dennis G. Jerz Monday, August 13, 2007 True, “track changes” does not reveal the *same* details as a marked up manuscript, but when we are talking about modern authors who compose and edit digitally, the comparison is moot. The underlying point that Kirschenbaum makes is not that the new-fangled horseless carriages will or should do everything that horses can do. Fewer digital artifacts will fade irretrievably if the academic culture changes in the direction of placing greater value on the preservation of such artifacts. At the very least, we’ll learn a lot about ourselves as we discuss which works should be the first to go into the digital lifeboats. K.G. Schneider Monday, August 13, 2007 “While I don’t mean to minimize the very real challenges in the realm of digital preservation, in my view those challenges are best understood as at least as much social as technological.” Well said. We are talking about choices. Carolyn Mingmei Wu Tuesday, August 14, 2007 Remind me to have all my hard drive completely destroyed upon my death. There are some things future ‘experts’ really need not know and an increase in our body of ‘knowledge’ about someone may actually deter scholarship by focusing on trivialites that have nothing to do with the creative process. For example, someone might mistake an idea that came into an author’s head weeks before to have been ‘inspired’ by an e-mail that they received that morning when, in fact, that e-mail merely retriggered the earlier memory or might only be conincidentally there. There is actually too much information out there about everyone and we need to take back privacy if we are ever to actually be able to regain control over out lives. Why isn’t this discussed as well in the commentaries over how wonderful it is to have this data? What would your IRB (Instititional Review Board) say about the glee that scholars approach destroying privacy? Matthew Kirschenbaum Tuesday, August 14, 2007 These are important concerns. As I briefly indicated in the article, decisions will have to be made about what constitutes the boundaries of authorship on a hard drive and in a computer operating system. As with traditional paper archives, which vary enormously in their scope and content, what is well within the comfort zone of one individual will be an intolerable invasion of privacy for another. Perhaps most distressing to me is the possibility that authors will destroy their digital records out of wholesale fear of intrusion. Authors need to understand the workings of their virtual desktops in order to make informed choices about what kinds of records they may leave for posterity, inadvertently or otherwise. This is a responsibility that authors, editors, publishers, scholars, and archivists must assume together. While I hope my article conveys some of the excitement I feel for new kinds of literary research, I would regret it if the tone came across as unbridled glee. I’m not insensitive to the issues at stake, but I do want writers to become educated about their tools and technologies so that both the preservation *and* the destruction of personal electronic records is a conscious choice rather than an accident of circumstance. Thank you for the comment. V. V. Raman Wednesday, August 15, 2007 Computers and Literature From paper and the printing press to mechanical and the electric typewriters, every writing-related invention has had impacts on authors. This piece has some fascinating reflections on the impact of computers on literature and authorship. But now a few reactions: It seems to me that the vast majority of the people who enjoy Shakespeare are not that interested in his original drafts. Were there or were there not other versions of ? That is seldom the question. The bard may well have named Halmet the Prince of Scotland making something rotten in that countries. Will that make any difference? Most of us are interested in the final version, not in its failed or faltering drafts. So I am unable to grasp the importance of the concerns expressed. To authors, the value of the computer lies in the ease of editing manuscripts. Retaining rough drafts may be interesting for a future-inquirer, but authors may oblige or not, irrespective of whether they use an old mechanical typewriter or a modern DELL. The cyber-permanence of computerized documents is useful primarily in keeping records of exchanges: criminal or casual, sinful or innocuous, secretive or sinister, that could come to light if and when the long arm of the law probes into them. V. V. Raman August 15, 2007 Michael Scott Thursday, August 16, 2007 In a sense, this problem is not so different than when an author’s archives consisted only of ink on paper. I’m a cataloging librarian, not an archivist, but I’m guessing that most archives do not collect *everything* that an author collected over the years (certainly large libraries, even the Library of Congress, do not try to collect everything printed.) Libraries and archives will continue to select and choose as they have always done, regardless of format. However, the conservation and preservation of electronic materials is a different story, and an enormous problem for all of us. Isabella Usher Wednesday, August 22, 2007 The whole concept of archiving an author’s errata begs the assumption that it adds a dimension to a literary work. A literary work should be able to wow without further comment. Commentary has this nasty habit of killing the very magic that births that wow. We know nothing about the Beowulf author, save the era in which he/she lived. I think we need to question whether knowing the sordid details of a work’s creation will diminish it’s ultimate future value. If the point of a written work is stand alone and suspend disbelief, shouldn’t we curb our obsession to disect, and leave authorial neurosis out of the mix? Matthew Kirschenbaum Friday, August 24, 2007 Isabella, As a mentor of mine has written, “Scholarship is a service vocation. Not only are Sappho and Shakespeare primary, irreducible concerns for the scholar, so is any least part of our cultural inheritance that might call for attention. And to the scholarly mind, every smallest datum of that inheritance has a right to make its call. When the call is heard, the scholar is obliged to answer it accurately, meticulously, candidly, thoroughly.” Scholarship isn’t just about commentary and dissection; literary criticism is not the same as literary appreciation. There are things we want to know about the past; this is human nature. There’s no such thing as total recall. In any medium. But I believe we have to become more pro-active and self-aware about the digital records we save, as well as the digital records we may--very deliberately--choose not to save. Robert Sharp Monday, August 27, 2007 Umberto Eco wrote a similar article a few years ago: ‘The Literary Game of Drafts’ http://books.guardian.co.uk/departments/politicsphilosophyandsociety/
story/0,6000,676112,00.html Matthew Kirschenbaum Monday, August 27, 2007 Thanks very much for this reference, which I had not been aware of. Eco and I differ on a fundamental point: he seems content to romanticize the machine and assume traces of prior activity will usually not survive. On the contrary, I think they will, more often than not--sometimes in places where the author/user is not even aware, like caches and temp files, not to mention more esoteric portions of the operating system like the registry. Technology has also changed in the last five years, notably storage capacity (which will increasingly encourage users to, to borrow Google’s phrase, “archive, don’t delete” and the growing use of repository software which requires users to check documents in and out of secure collections which maintain prior versions.
(Commenting is closed for this article.)
http://chronicle.com Section: The Chronicle Review Volume 53, Issue 50, Page B8