I have experience with four version control systems. Let’s look at the pros and cons of each.
CVS
CVS may have been good 15 or 20 years ago. But today it is fragile and has a weak feature set:
- It does not have atomic commits or even “commit sets”—commits consisting of changes to multiple files or directories—and so without meticulous coordination between committers it is easy to get a corrupted repository.
- It does not handle binary files well. By default CVS messes with newlines in the files it stores, and performs substitution for things that look like keywords. These are problems for non-text files. CVS was designed for storing text files and because of this extra steps have to be taken to allow proper treatment of binary files. Also, delta compression is not used for binary files, so repository size may balloon rapidly when tracking binary files.
- Administering a CVS repository is painful at best. pserver mode requires running a daemon or messing with xinetd, and one has to go through extra hoops to ensure security. One can also use ssh, but this requires new user accounts and a common group for the developers.
- Using CVS is painful. One must either set an environment variable for the current repository being worked on, or must pass extra parameters to the CVS commands.
- It doesn’t handle renames well. History is tracked on a per-name basis, so when you rename, history becomes very difficult to work with (you will probably have to poke around “attic”).
- Directories cannot be entirely deleted from a CVS repository without mucking in its internals.
- CVS does not offer any kind of merge tracking.
It’s been a very long time since I used CVS and it was also the very first version control system I used, so I don’t have more specific complaints. It is at least better than nothing at all.
Subversion
Subversion is referred to by its developers as “CVS done right”. It offers many improvements:
- Atomic commits and “commit sets” (I avoid calling them changesets because this has a more precise meaning regarding storage model, which svn does not have). Changes to multiple paths can be bundled up as one commit. If for some reason the commit fails midway through (e.g. network troubles), the changes are rolled back—the repository has much less chance of becoming corrupted.
- Revisions are stored compactly, even for binary files: instead of storing a complete copy of a versioned object for each revision, only a delta is stored (this might not be entirely accurate, e.g. for performance reasons each nth revision may be stored in its entirity—but it is mostly accurate).
- Administration is slightly easier. There is an svn protocol which has most of the same issues as pserver for CVS. There is also HTTP(S) mode; svn can integrate with a webserver. It can also be set up to use SSH, but this is as much a pain in the ass as with CVS, and runs a greater risk of repository corruption than any of svn’s web-based modes.
- Using Subversion is nicer than using CVS. Instead of having to set an environment variable or specify an extra repository parameter, most commands can figure out what to do when run within a working copy of a repository.
- Renames are handled more elegantly than with CVS—`svn mv’ is implemented as a copy and delete.
Other advantages:
- Good integration with IDEs. CVS has this too. I don’t use IDEs, so I don’t care about this very much.
- Many people are familiar with Subversion already, so if you have to work with others who are not willing to learn new tools, it may be a reasonable choice.
Subversion has many issues, however:
- No support for merge tracking. Merging is hokey and painful.
- It’s very slow! When it was young it was at least an order of magnatude slower than CVS for most operations. That has surely improved by now, but it is still comparatively quite slow. Mercurial is much faster, and git is supposedly even faster than Mercurial for many operations. Perhaps I’ll benchmark version control systems one day.
- It is difficult to identify the differences between branches. One has to resort to scripting this or performing a pretend merge.
- It is more difficult than it should be to revert to older versions. One has to either merge with the old revision (which svn doesn’t make easy), or cat the changes to each file you want to roll back and then commit.
- The supplied script for sending commit messages lacks desired features—like putting the first line of the commit message as the email subject.
- It is not too difficult to get a working copy completely screwed up. If you manage to delete the .svn directory in a versioned directory, you are toast—probably the only way to recover is to do a brand new working-copy checkout. I hope you commit your changes before you make this mistake.
- Did I mention that it is slow?
- Many commands are not available from the main svn program, and instead one must execute svn-PROGNAME or svnPROGNAME. These commands also do not all accept arguments in a uniform, coherent way (I ran into this the other day but don’t remember the specific case).
- The commands that operate on a repository (rather than a working copy) do not accept raw paths: for example, if the repository is at ~/repos/silly_svn_repo, one cannot use that as an argument. One must type instead file:///PATH-TO-HOME/repos/silly_svn_repo. This is stupid. Paths without “file://” should be treated as though they do by default.
- Shitty/nonexistent man pages. Instead one must type “COMMAND help”. It also bothers me that when one enters a command with bad parameters, you get just a pity error message followed by “Type ’svn help’ for usage”. I would rather have it print help by default when a bad command/bad parameters are entered.
- It is difficult or impossible to split up an existing repository or combine multiple repositories using the supplied tools. For example, for splitting, the svndumpfilter tool has issues with renames/deletions. This is a known flaw in the tool. Combining repositories only works properly if the histories do not overlap in time at all. Otherwise the history in the resulting repo will be borked.
- Tags and branches are not entities that Subversion knows about; instead they are merely convention of the users.
- Its commands don’t pipe into a pager by default if they output more than one terminal screen. This is an annoyance.
I sympathize with Linus Torvalds’ opinion of Subversion—it is broken by design.
Rational ClearCase
This is a classic example of overengineering, feature accretion, and poor design. I can’t say anything good about ClearCase. Unfortunately it seems to be used in many large corporations (this is one reason not to work for such a company). I would rather use CVS or even no version control than ClearCase.
- It is much more bloated and probably even slower than Subversion, especially if you use dynamic views.
- Integration between operating systems is very painful.
- There are different UIs for each OS.
- Its reference manual is a book of over 1000 pages. That’s bigger than most Robert Jordan novels. And that is only the users manual; there is an administrator’s manual and others. WTF. And these aren’t even freely available. I’m not sure if they are even included with each user license of ClearCase.
- It is absurdly expensive. Each license is over $4000 per year.
- There seems to be no way from the command line to list all files that you have made changes to with a single command. Instead you must resort to writing a shell script or piping between programs and using backticks or shell loops. So instead of saying something like “ct status” and getting a list of all modified files, you must type something like “ct lsco -r -cview|xargs ct diff -pred -quiet”.
- You have to explicitly list files when you check in using the command line tools.
- Hardly anyone (or perhaps no one) fully understands it. It’s too massive and poorly designed.
- The GUI tools on Unix/Linux are ancient—no tooltips, no mousewheel support, etc.
- Did I mention that it is incredibly slow?
- It is difficult to import a directory tree. On the first occasion I needed to do this, I didn’t know of a command to do it, so I ended up scripting `clearcase add’ for each file in a tree then waiting for a couple hours. When I had to do this again later, I tried to use the single command that adds a tree, but it required ClearCase administrator privileges. Go figure.
- The command line program’s name is “cleartool”. That’s an awful lot of typing.
- With dynamic views, there are a couple different syntaxes for specifying a window of time to look at. These have undocumented differing semantics (I scoured the Jordanesque documentation for this and saw no mention. So it’s probably a bug). For the curious, see my previous post.
- Coders spend lots of time fighting with the VCS rather than coding.
- Google “clearcase evil twin”.
- It tends to require full-time administrators. That’s a lot of overhead.
- Atomic commits and “commit sets” are not supported (?). Only single files can be checked in at a time. This is unacceptable in a modern version control system.
- It’s not open source.
Common Limitations
There are also common problems to all three of the previously mentioned VCS systems due to their centralized nature.
- One must have access to the repository server to view or diff older revisions or (probably) view commit messages.
- Only privileged people have the ability to commit.
- If the system uses a lock/unlock concurrency model rather than an edit/merge model, you cannot do any development unless you have access to the repository server. ClearCase is this way, or at least was set up in such a way when I used it.
- They are slower than decentralized VCSs because most operations are non-local.
- A centralized repository is a single point of failure: if the server dies, you better have good backups; if the server goes down for a day, development will come to a standstill.
- Centralized development does not scale well. As the number of developers increases, so do lock contention (in a lock/unlock system) and merge necessity (in an edit/merge system).
Mercurial
I have been using Mercurial for my own projects for several months now. Because I have only been using it for my own stuff, I have not exercised its merge capabilities very well. Nevertheless, I still identify many advantages:
- Being decentralized, every operation save pushing and pulling from other repositories is local. You have access to all the capabilities of the system even without a network connection.
- Mercurial is fast (probably second fastest on most operations, with only git being faster). For example… need some benchmarks.
- By its decentralized nature, history is nonlinear and merges are tracked.
- It ships with tools to convert subversion, git, darcs, and CVS repositories.
- A Mercurial repository is typically more compact than a Subversion repository of the same stuff.
- It is easy to split or combine existing Mercurial repositories, and it ships with tools to do this (and they actually work).
- It is easy to write extensions. Much of the program is written in Python.
- It has good support for email. Revisions can be emailed directly and emailed revisions can be imported without much trouble.
- By its distributed nature, one can do work as normal on one feature, and once that feature is ready to be merged into another repository, the changesets can be rewritten into one “final draft”. How often do you commit a bunch of changes only to immediately after realize you forgot one file? Or you made a typo in what you just checked in? With Mercurial you can merge those revisions into one (at least before you have merged with other repositories, at which point it gets messy) and make sure no “garbage” revisions appear in the project’s history.
- Although it is decentralized, it can be used in a centralized fashion (one “master” repository that everyone checks into). This is how I use it with my own projects.
- It ships with some graphical tools for viewing history and such.
- It can be easily set up to use graphical merge tools if they are available.
- It has built-in patch capabilities, a la quilt.
- “Pulling” between repositories scales well. I understand the Linux kernel is developed in this way: Linus at the top, who pulls from a small number of committers he trusts, who pull from a small number of committers they trust… and so on. Merging changes gets distributed throughout this tree of contributors. The Linux kernel currently has over 1000 contributors.
- All contributors have access to all the features of the version control system. There is no technical distinction between those who can commit and those who cannot. Instead, the distinction is social and just defines whose changes get incorporated into the “official” project.
- By its decentralized nature, each working copy of a repository is in itself a repository. So there is not a single point of failure. Explicit backups are less important than with a centralized VCS.
- It can supposedly interoperate with git repositories.
One slight disadvantage of Mercurial compared to Subversion is that in the former, one cannot check out just a piece of a repository—one must check out the entire thing.
Here’s how I rank these version control systems that I have experience with:
- Mercurial
- Subversion
- CVS
- ClearCase
Understand that the difference for me between ranks 1 and 2 is immense—I quite dislike the three besides Mercurial that I have used. Many of its benefits come from its decentralized nature.
I would almost always recommend Mercurial or probably any other free, distributed VCS (git and bazaar come to mind) over Subversion, ClearCase, or CVS—but it depends whom you will be working with, development platform, and how hard it would be for them to learn new concepts and new tools.