Has this ever happened to you? I’m implementing a new feature on a project which would be the perfect application of a new third party library. I download the library and reference it from my project in my IDE. I write some code that uses the new library and like the good little developer that I am, I run all the tests in the project’s regression suite. I add the library to my source control system or perhaps some other shared repository and I check the whole thing in. The next thing I know I’ve got a big ugly e-mail sitting in my inbox with “BUILD FAILED - Compilation Error” in the subject line. Doh! I forgot to update the build scripts with a reference to the new library.
The Problem
So as it turns out, the broken build has nothing to do with a syntax error or faulty logic that fails a test, it’s simply because the build script is out of sync with my development environment. Arguably I should have caught this error by running a full build from the command line before I checked in, but why should I have to leave my IDE to build the code? And more importantly, why should I have to add a reference to the new library in my build scripts when I’ve already added it to my IDE’s project definition? At this point I have two different metadata files that describe the dependencies of my project in two different formats and I have to maintain both of them. That’s repetition and that’s a clear violation of the Don’t Repeat Yourself or DRY principle.
The problem in the story above is just a manifestation of the problems that occur when information is duplicated in different parts of a system. There are other problems that can arise from this duplication as well. The story described what can happen when a new library is added, but what if we’re removing an external dependency from our application. We might remove the reference in our IDE project, but forget to remove it from the build scripts. This can cause problems by leading to a bloated deployment package full of unnecessary library binaries. Also, class names from the unused library could end up conflicting with legitimate classes from your application or other libraries you add in the future. The probability of either of these problems becoming a major issue is relatively low, but when these types of bugs do surface, they are generally extremely difficult to resolve.
The Solutions
So the question becomes how do we avoid this duplication? There are a couple of approaches that I’ve seen.
The first way to keep your build system DRY is to use only one file (or set of files.) Instead of maintaining an IDE project and a build script combine the two into one metadata file which describes the structure and dependencies of your project and can be consumed by both your IDE and build system. This is the approach that Microsoft has taken with Visual Studio 2005 and MSBuild. The IDE uses the same solution and project files as the build tool. I had been using a system like this in the example I gave at the beginning of this article, my build would have never broken because the update I made to my IDE project would have also applied to my build system.
The drawback to this method is that until there is some project definition standard, your projects have to be coupled to a particular IDE and build tool. In the Microsoft arena this isn’t too much of a problem, but if your project is built on a platform that isn’t dominated by a single tool vendor, the flexibility to switch IDEs easily could be valuable.
The other option for avoiding duplication in your build infrastructure is to generate one set of files from the other. On previous projects I’ve worked on we’ve used scripts to parse our VS solution and project files and create NAnt fileset definitions for our build scripts. There are also tools that allow you to create your IDE project definitions based on your build files. The Maven project from Apache does a great job of this. When using Maven, you define your project’s dependencies in addition to other useful metadata in a simple xml file. You maintain all of the information about your project here and then simply use Maven to generate the project files for your IDE of choice. Eclipse, IntelliJ IDEA, and NetBeans in addition to others are supported out of the box.
The downside to this approach is that once your project files are generated any modifications you make to them directly will be lost after you make a change to the build file and regenerate them. This means that any new dependencies have to be defined by hacking xml instead of using the nice GUI provided by your IDE. Maven does have IDE plugins for Eclipse and IDEA which put a more user friendly interface on top of the build files which certainly makes this less of an issue.
So whether you use the same files to drive your IDE and build tool, or you generate one from the other, keep your build system DRY and avoid the duplication that is so often associated with this part of a software system.