Consider Revision Control Methods for Documents

finder litterBoth the concept and practice of revision control (also known as version control) are near and dear to my heart; a body of work as a technical writer, programmer, and project manager before moving over to academia made sure that particular personality trait was deeply ingrained. But during my time as a graduate student—when one might argue my sole purpose was to produce documents of one type or another—I completely lost touch with roots. For some reason, it didn’t occur to me that everything I had learned and valued with regards to revision control would actually work in an academic setting. While I made it through an MA and then PhD without (much of) a hitch, looking back I can clearly see times when implementing revision control methods would have saved myself a lot of trouble—mental trouble if not actual writing troubles.

A few months ago, when I wrote A Gentle Introduction to Version Control, my intention was merely to introduce the concept (hence the title) in anticipation of writing several pieces on using specific software in the service of a project—using GitHub to safely store and share some open source code, for example, then perhaps posts about Subversion, Mercurial, and any number of other interesting and popular systems for such processes.

But when I start talking about version control methods in general academic company (read: “non coders”), by far the biggest request is: explain why I might care as someone who rarely programs…is there a value beyond programming?. That’s the question I’m going to begin to answer in this post (the short answer is “yes”).

Note that the remainder of this post is about personal/individual/local revision control and not shared/external/decentralized repositories as would be appropriate for a team. For a great start to that discussion, please see Jeremy Boggs’s “Participating in the Bazaar: Sharing Code in the Digital Humanities”, which is a piece I’ll return to in a later post.

Revision Control is More Than Track Changes

Although true that revision control systems track changes to written documents, there is much more to a revision control system than the visual cue of a strikethrough in a Microsoft Word document (or OpenOffice, etc). In those sorts of documents, when you accept changes (or reject them), once you save that document the revision history is gone (reset, if you will).

An Example! Suppose you send a document off to your committee chair/editor/writing group and you receive the document back full of suggested edits. Unless you manually saved a copy of the original (such as “article_submitted.doc”) or the editor changed the name before sending it back, the new document will replace the old one of the same name. For a moment, assume that you did save a copy under a different name, so now you have “article_submitted.doc” and “article_edits.doc”. If you open “article_edits.doc” and begin to work in it—accepting changes, making edits, and so on—what happens when you simply save it and close the file? In this case, “article_edits.doc” has lost all of its original editorial comments; what you might have done is saved a file called “article_edited.doc” or some such name, indicating you’ve begun editing the document based on the editorial suggestions.

“Well,” you might say, “I have the original file in my e-mail, so what does it matter? I can always get whatever version I need.” I would say, “Yes, you do. You also have at least three other files, which you have probably backed up in multiple places—and what, exactly, is managing those versions besides your own memory?” Nothing, is the answer.

While this process might work very well for you, and that’s great, it doesn’t really work for me—too many copies of files, too much duplication, too much to keep track of in my own pea brain. That’s when I remembered my technical writing background and began to apply it to my academic work.

Simply Documenting Changes

Perhaps you have seen business documents such as organizational charters or software manuals (or really anything in between) in which the first page of the document (or an appendix) includes a table like the following:

Revision Number Revision Date Revision Notes Owner
2.0 14 July 2010 - final formatting changes
- added Appendix C
Jane Doe
1.9 10 July 2010 - added coverage of Widget X
- removed paragraph about Widget Z
Jane Doe

Typically, these revision notes will go hand in hand with a revision control system employed by the writer or organization. With such a system in place, there would only ever be one document in the repository, regardless of how many revisions had been performed. And, more importantly, all of the versions would be accessible at least for viewing, if not checking out or reverting completely. By that I mean if the mythical Jane Doe’s company decided to resume production of Widget Z, Jane could simply view version 1.8 of the document through her version control software, find the chunk that had been removed (either by looking, or by performing a file comparison (diff)), and add it back into the current of the document without wrecking the revision history or creating duplicate files “littering” her hard drive.

But I’m an academic! I don’t need to document widgets! While perhaps true, I’ll bet you’ve revised and/or repurposed a document before. Some of you might do such things on a regular basis, especially if you’re involved in long-term projects involving groups of people and funding—there are activities, results, and plans to document on a regular basis. Perhaps you can keep all these document revisions in your head, and even know what text you’ve added or cut, and when, and know from where you can reclaim that text if something goes horribly wrong (or someone just says “let’s add that back in…”). I can’t.

With a single document and a version control process in place, you could work continuously/incrementally on, say, “Year 1 Plan” until it has been finalized; the version control software and notes made with each version would provide a snapshot of changes and a quick way to view, verify, and recover content that has been added or modified. If a team member says something like “didn’t we define that task in the plan? I don’t see it” and you have to counter with “yes, but then we took it out after a meeting in April,” the conversation doesn’t have to stop there—and you don’t have to try to recreate that task description from memories of snippets of meetings gone by. Instead, you can simply navigate through your repository, find the version that has with it a note to the effect of “removed task X description”, view it, grab the text, and either insert it back into the document or send it forward for more discussion. When it’s time to begin work on the “Year 2 Plan” you can just start with the final version of “Year 1 Plan” and start a new branch of the “project”.

The point is this: nothing is lost—not a document, not a suggested edit to a document, and not a piece of text once added then removed. You can recover anything in a few clicks because you employed revision control methods from the start. So, when your dissertation advisor says “do you remember that section of Chapter 3 that we talked about three months ago, that we decided to remove entirely? Let’s put that back in and expand the examples,” you don’t have a heart attack and begin the process of recreating from memory something that you stopped thinking about months ago (or, you don’t begin the process of sifting through manually created versions of documents called “diss_ch3_1.doc,” “diss_ch3_5.doc,” “diss_ch3_9.doc,” and so on).

Create a Personal Repository

You can begin the process of creating a personal document repository simply by installing a Subversion client on your own machine—no external server necessary (this also means you’re the only one who can checkin/checkout documents, but this post has been about you and your own work anyway, so we’ll save the rest for another day). For Windows users, TortoiseSVN (free, open source) is very easy to install and use; you can create a repository in just a few minutes. For Mac users (actually, Windows and Linux too), RapidSVN (free, open source) is a good client; like TortoiseSVN it offers a pleasant (read: not a command-line interface which tends to scare people away) graphical user interface to the underlying version control system.

I will save the “Installing and Using SVN on Your Own Machine” post for another day, as the goal here was to get you thinking about the processes and how revision control might fit into your own writing practices, but if you would like to get started and fiddle around creating repositories on your own machine I recommend reading “Subversion for Writers” (Mac examples) and ” Getting Started With Subversion—Part 1: The Basics” (Windows examples).

Version Control in OpenOffice

If you use OpenOffice, you have quick access to a rudimentary version control system just by accessing “Versions” under the File menu. In the dialog box shown below, you can see the notes I have left for myself in the revision history of a particular document. I can perform a diff (compare) between two versions, open a previous version as a read-only document, or open a previous version for writing (and thereby possibly create a fork in my document creation).

OpenOffice versioning

Opening and working with revisions in this way in OpenOffice might help the concepts crystallize before moving on to installing additional software or working with a decentralized model used by a team. If you already use OpenOffice, it sure can’t hurt.

Use Revision Control For Good

In this case, I’m using “good” to mean “productive”. I recently had occasion to review some of my own writing, and I knew in the back of my mind that somewhere along the line I had cut large swaths of text yet couldn’t for the life of me remember (or find) the file called “stuff_i_cut_might_be_important_but_not_now.doc”—if ever I had made one to begin with. Had I used version control, I could simply have gone through my notes and found it in a previous version of the document. The time I wasted first looking for the content and then trying to recreate the content certainly wasn’t productive.

Before I write the next post related to version control, in which other cases can you imagine document revision control being useful to you? Related to this topic, which programs, services, and processes interest you most? Let us know in the comments. Or, just work through your own scenario and get feedback as to whether or not implementing revision control could help you navigate a roadblock or other hurdle.

[Image by Flickr user rubberpaw / Creative Commons licensed]

Return to Top