Clean pom.xml

Andrey Lebedev
9 min readJul 9, 2024

--

This article, to a certain extent, alludes to Uncle Bob’s “Clean Code” and “Clean Architecture” but focuses specifically on how we manage our dependencies in projects. Clearly, this is not only about Maven and its configuration stored in pom.xml files. The concepts described below can be easily applied to any software project management tool, be it Gradle or Ant.

Let’s face the truth: clean code and clean architecture principles are not widely adopted or applied in practice. As such, dependency management is swept under the carpet even by those who create decent clean code in the rest of the project. For some reason, probably due to the limitations of the expressive power of the languages used in tools like Maven, producing clean project configuration files is considered as a second-class job, an unimportant task. In the meantime, neglecting to keep your pom.xml in order increases your overall technical debt like a snowball rolling down the hill. The amount of effort required to clean up things later (because inevitably the point in time when this becomes a necessary task arrives sooner or later) is considerably higher than investing some effort from the very beginning of creating a new project or module of your application and keeping things tidy every day.

In this article, I will consider three main topics:

  1. Malpractices related to dependencies and build configuration.
  2. Best practices for dependency management.
  3. Automated tools (primarily Maven plugins) that can help keep things in order.

Preface Blabbering

But first, before we start, let me fantasise a bit and bring your attention to the following statement I firmly believe: your build and dependency configuration (like pom.xml files) is the code. Hence, one ought to apply the same philosophy and best practices as for the rest of your project. Yet, as I mentioned earlier, the very XML used for Maven configuration has a number of limitations that prevent certain things from being expressed in a clearer and tidier way. Take, for example, code duplication. In certain cases it is inevitable that the same dependency, even with a parameterised version, is reused in multiple pom.xml files of your project. How cool would it be if we could define some dependency as a tailor-made tag, which is not only concise form of dependency declaration, but also a self-explanatory piece of code, for example:

in one place we could redefine the commons-lang dependency

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.14.0</version>
</dependency>

as simple as

<commons-lang3/>

I saw many cases where much more obscure dependencies were used in the project. Sometimes their presence is absolutely unclear. They have runtime scope, and discovering their need happens only when you run your application. In many cases, what we all do is add a comment above such a dependency:

<!--
Don't delete this dependency, otherwise logging won't work
-->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.11</version>
</dependency>

It would be great to be able to replace it with just

<logging-implementation/>

or even

<logging-implementation-do-not-delete/>

An alluring idea, don’t you think??

Well, these are just dreams. Quick research didn’t provide any examples of such an add-on to Maven. The main problem with this idea is that a custom Maven plugin doesn’t solve the issue of custom tags. The schema of the pom.xml document is hard-coded in the Maven engine, so it seems that without forking and recompiling the entire project this is not possible to do.

Never Do These Things

Bloody copy-paste

The most common root of an untidy Maven configuration is copy-paste. Yes, we all have been there. I myself have often been caught by an instinctive desire to copy an existing pom.xml file or its contents to create a new Maven module. We know, configuration can sometimes be a boring challenge, so why not take something that already does its job well?

Well, the problem is that by doing that you would inevitably drag a lot of things that you won’t need. YAGNI, KISS — it’s all about that.

Just-in-case Dependencies

The second major reason for cluttering pom.xml files is adding dependencies in advance. Often following “just-in-case” principle. A frequently seen situation is when we first ponder about what kind of libraries we would possibly need. For example, let’s add some Apache Commons library just in case, what if we need it? It is difficult to get rid of this habit if one, as a developer, is used to add some methods that nobody is using, generally speaking, pieces of code that are not directly required by the business requirements.

How can we do it in another way? The answer is simple: follow the TDD principles. First, write a piece of test code. Does it compile? Do you have all needed dependencies? No? Go and add those dependencies that are absolutely needed to make your code compilable. Exactly the same technique as with main code that you put in place to make your test pass.

Fat Dependencies

Consider adding the smallest possible dependencies. What I mean by that, is that some libraries, especially frameworks, have a variety of libraries that intersect with each other or include each other. When you have a choice, try to find a library that does one thing needed for your purposes. Try to avoid general-purpose fat libraries that include everything. There exceptions to this rule, of course. For example, Spring ecosystem libraries tend to be well-designed and have a sensible decomposition. Some of them may look like general purpose libraries, but generally they are dedicated to a specific domain (like, Spring Data, or Spring Security).

Hidden Transitive Dependencies

Another harmful practice is to rely on transitive dependencies. Say, you need some class from apache-lang3 library (let’s call it library “A”). Instead of pulling this library directly and explicitly specifying it in your pom.xml, one may find that it is already in the classpath and is used transitively by some other library (library “B”). The problems with this approach are:

  • it creates coupling between your specific class and library “B”. When you need to refactor your code and move library “B” around, it will be difficult to understand what and why binds this library to your module’s code (about specific tools that one would use to find whether a library is used or not — see below in the next chapter);
  • it hides the actual version of the library “A” from developers. Of course, one can run maven dependency:tree and see the version, but that’s not efficient;
  • last but not least: a transitive dependency can disappear any moment after an update of library “B”. Needless to say that it brings an additional fuss in the project — finding out what’s happened, why it is not compilable anymore can waste the team’s resources.

Having Different Versions of the Same Library in the Classpath

This sort of case can happen for multiple reasons:

  • multiple “fat” libraries in the classpath bearing various versions of the same library;
  • neglected sanity of the project when different modules contain various versions of the same library (i.e. they don’t use a shared version parameter or a Bill of Materials module);
  • a combination of the two latter cases.

The aftermath of this untidiness is manifold: from weird runtime errors when the signatures of the classes do not match linked expectations, to unpredictable behaviour due to the differences between the versions: which version is used? (a hint: the nearest in the dependency tree, but until you analyse that you don’t know).

Tolerate Circular Dependencies

Funny enough, in the projects where circular dependencies exist, the engineers are often super defensive about this solution. This approach is advocated by “impossibility” to do that configuration differently. Whereas it is always, bold capital letters ALWAYS, the result of a bad design of the modules’ decomposition and always can be solved. The other problem is it will sometimes require a great deal of refactoring, but this is such a nasty and huge technical debt, so postponing it makes things worse every day.

Do That Instead

Test-Driven Dependency Development

As mentioned above, add a dependency only when your code strictly needs it. It can be either for compilation or for a test to become green. Of course, it is not a panacea. If some test and the corresponding main code are deleted, it will require some tooling to find out which libraries are not needed anymore (see the chapter below). Nevertheless it will surely simplify dependency management and reduce the amount of “ballast-” libraries.

Explicitly Define Needed Dependencies

One objection that one may be raised at this point goes like that.

What if a library “B” contains a library “A” as a transitive dependency. And let’s assume that library “B” is already in the classpath, i.e. explicitly defined in the pom.xml. What in the newly added code we want to explicitly use classes from the library “A”? Our code will compile and our tests will eventually pass without any changes on the dependencies list. Hence, one may say: TDD doesn’t really allow following this principle.

The answer to this remark is “yes” and “no”. “Yes” in the sense that this sort of situations can happen and some dependencies might lurk as transitive and never be pulled at the highest level of the dependency tree. “No” in the sense that there is always design decisions that can help you mitigate this sort of problem. In this particular case it is modularisation. If you follow the single responsibility principle which is applicable not only to classes, you will have a multi-module application where there is a clear separation of concerns, which helps reduce the scope size and minimise overlap between fat libraries.

Keep Your Dependencies Up-to-date

This is something controversial and I have encountered different practices in different teams. The main argument against updating dependencies is the principle “it works — don’t touch it”. And it is clearly a justifiable approach for projects where there is no solid acceptance-driven test coverage — when all user stories lack corresponding integration tests. Here, however we are talking about sane and healthy approach when the project is thoroughly covered by automated tests (and, equally important, automated integration tests). In this case, you can update the dependencies without any fear — your tests either pass or fail. The benefit of it is clear: newer versions of libraries are more secure, contain fewer vulnerabilities, often have better performance etc.

Manage Dependencies Through a Bill of Materials (BOM)

A seemingly obvious but still not-widely-adopted approach to keep the dependency management in one module, usually called a “BOM”. Practices may vary whether this module is reused by all projects in the organisation or is managed separately per a multi-module project. My two cents, based on my experience, maintain the same Bill of Materials at least for all applications deployed in the same landscape (such as in microservices architecture) and managed by the same team.

Tools to Keep Things Tidy

Version Collision Detection

A good way of preventing at least some of the problems listed above is an automatic check for dependency version conflicts. There is a nice plugin for that called “Apache Maven Enforcer” which offers a rule called “Dependency Convergence”. Setting up this plugin is a no-brainer, so I won’t elaborate here about these details and copy examples from the documentation. I will just reiterate that this is a very good practice having this plugin enabled, since it can save you a good deal of time.

Unused Dependencies Detection

It won’t harm to perform a sanity check of your project every now and then. However, the most common case would be cleaning and tidying up a project that doesn’t yet meet the best practices standards. To perform a proper design change and reconfigure modules a good starting point would be detecting unused dependencies. Here the Maven Dependency plugin comes to play. It’s “dependency:analyze” goal will do the trick and again, we won’t discuss the setup details. One important note, however, is to warn that this tool can give false positive results, i.e. detect some dependencies as unused, while they are actually used at runtime (a typical example would be JDBC drivers). This can make things a little bit more complex, if not dangerous, if your project lacks sufficient integration tests to detect a missing runtime dependency.

Conclusion

Keeping dependency management in order and in a tidy way is often an overlooked topic as it is considered a low-return-value investment. Even projects with 100% test code coverage have sometimes some “dirty” configuration in the Maven pom.xml files. The fact is that an untidy configuration is a delayed b0mb, a nasty lurking technical debt that can undermine your plans and quality after a few years of the product development lifecycle.

Keep it clean from the very beginning. Don’t postpone cleaning it up in an inherited legacy project. Don’t neglect it and don’t treat your poor little pom.xml files as second-class citizens!

--

--

Andrey Lebedev

PhD in CS, a Software engineer with more than 20 years of experience. Check my LinkedIn profile for more information: https://www.linkedin.com/in/andremoniy/