A Gentle Introduction to Version Control

Here at ProfHacker we’ve written a lot about backups, but never about version control. In fact, when I recently wrote “A Few Ways to Back Up Your Website”, I specifically said “I’m not going into things like version control software.” You see, for a lot of people there’s something about the phrase “version control” that makes it sound all super high tech, possibly scary, and definitely something only software developers would need to use. Well, it is pretty high tech on the back end, and software developers do use it, but it’s not all that scary—look at the tree-eating mascot for the free and open source, distributed version control system called Git, used in this post. It’s not all that scary, is it?

So maybe it is scary and high-tech to some, but that doesn’t mean non-software developers shouldn’t or can’t use it. In fact, a lot of you (non-software developers) probably have used version control before, and with positive results, as the processes are built into many popular tools.

For example, Google Docs features revision history for all documents. When I first wrote about using Google Docs in the classroom I mentioned that the students and I used the revision history while working through drafts of their paper—sometimes to step through the drafts to visualize the changes as they occurred over time, and sometimes to compare one revision to another (side by side).

Compare the Google Docs method of version control—in which Google force-saves (or you can manually save) revisions as you work, with maintaining a directory of files named doc1.doc, newdoc.doc, revision.doc, paper9.doc and so on.

Or, suppose you don’t have any version control methods in place—suppose you are writing a dissertation and you have a file called diss.doc in which you keep writing and writing and writing. You store multiple copies on external drives and perhaps even in the cloud. But what if your dissertation director says “I really liked that section you wrote last August but we decided to cut out. Let’s put that back in there.” What do you do? With version control software and processes in place, you can say “oh sure, I’ll go grab that version and work it back in” instead of “[gulp]” and “[expletive]“.

If you’ve used Google Docs, you’ve used version control. If you’ve used a wiki of any type, you’ve used version control (the “history” of a page). If you’ve used WordPress or some other blogging platforms, you’ve used version control (when editing a post, you can see all the auto-saves and manually-saved versions).

There’s More to Version Control than Just Revision History, Isn’t There?

Yes, there is. But the title says “gentle introduction” after all, and I didn’t want to jump right in with terms like “branching” or “atomic operations”.

I’ve described some tools that integrate version control by maintaining backups of your digital objects such that you can retrieve versions of them at any point in their creation. But full-fledged version control software works more functionality and terminology into the mix, such as:

  • commit/checkin and checkout: when you put an object into the repository (there are nuances to this, but go with it for now), you are committing that file; when you checkout a file, you are grabbing it from the repository (where all the current and historical versions are stored) and working on it until you are ready to commit or checkin the file again.
  • branch: the files you have under version control can branch or fork at any point, thus creating two or more development paths. Using a non-software example, suppose you are creating an assignment for the subject that you teach. Suppose you teach a lower-division course and an upper-division course in the same subject. You might have started with one master document of information but then forked it for the lower division class and the upper division class, continuing to develop them independently. If you continued developing the master document, the one you started with, that would be working with the trunk.
  • change/diff: this is just the term (change OR diff) for a modification made under version control. You might also hear “diff” used as a verb, as in “I diffed the files,” to refer to the action of comparing two versions of an object (there is an underlying UNIX command called “diff”).

There are many more terms than just these few listed above, but if you can conceptualize the repository, the (local) working copy, and the process of checking in and checking out files, then you are well on your way to implementing version control for your digital objects.

Two Open Source Version Control Systems: Subversion and Git

Although there are several different version control systems available for use—some free and open source and some proprietary—two of the most popular systems are Subversion and Git.

If you have a web hosting service that allows you to install Subversion, then you can create your own repository and use a Subversion client to connect to it.

But an increasingly popular tool is Git, which is a decentralized approach to version control and also offers numerous tools and hosting options for users who want to get started with a repository but don’t necessarily want/need/understand all the extra installation and maintenance overhead that goes with it.

For anyone wanting to get started with version control, I recommend Git. But first I recommend viewing the slideshow below:

You may also find these Git tutorials helpful.

As I was preparing this gentle introduction—which will lead to more specific discussions and how-to posts if there is interest—@benwbrum (Ben Brumfield) offered the following comments regarding version control:

Based on my own experience as a developer and two years of work in a university helpdesk, I’m convinced that researchers need to be comfortable with source control tools.

Think of the non-programming use cases: I can see the differences
between the four revisions to my THATCamp application. And since the differences live in the repository, my letter is a text file—no worries that the eventual recipient will turn on “track changes” and see the embarassing typos in the first rev.

Why fool with key fobs you might run through the laundry? I can get at my source code from our laptop, our desktop, AND from the server I’m deploying it on.

It’s easier to collaborate via an SCM system than by passing emails around, or the like. Systems like GitHub (which arrived after the original email) integrate social tools into the process, which help you track forking and merging.

All true! Ok, folks—now what do you want to know about integrating version control into your professional life (developer or not)?

Return to Top