Demystifying the Pursuit of 100% Test Code Coverage

Andrey Lebedev
5 min readNov 14, 2023

Full test coverage is not a goal

The topic of 100% test code coverage has long been a subject of controversy for myself and nearly every colleague in the IT industry I have known. The main objection I have encountered countless times is: “Achieving 100% test code coverage is impossible, overly difficult, and a waste of time.” In this article, I aim to summarise my thoughts and experiences on this matter and present several theses.

Let’s begin with this premise:

Achieving 100% code coverage is not a goal, but a by-product of applying the correct development methodology.

Indeed, without practicing Test-Driven Development (TDD) — writing tests before the production code — achieving 100% code coverage becomes an exceedingly arduous and, importantly, almost pointless task. Throughout my over 20-year career as a software engineer, I have repeatedly attempted and abandoned this goal due to the sheer tedium of writing tests solely to achieve high coverage, which ultimately added no value. Why? The quality of tests written for existing code is generally low. I have seen many pointless tests, written by myself included, that did nothing but ensure the production code did not result in an exception. This brings us to a key principle of TDD:

Your test should initially fail in the development cycle, confirming that it truly tests what it is meant to.

A method to assess test quality is to deliberately corrupt your production code (while keeping it compilable) and see if the test still passes. Many post-hoc tests fail this check: they still pass, whereas they shouldn’t. For simplicity, let’s term this measure “Sensitivity”: the more sensitive a test, the better.

Another quality metric is how quickly and easily a test can identify which part of the production code is faulty in the event of a regression. It is obvious that small unit tests excel in this respect, unlike integration or end-to-end tests. However, writing small unit tests in advance considerably slows down development. Let’s label this measure “Traceability”: the higher the Traceability, the better.

The third dimension of test quality I wish to discuss is robustness regarding production code changes, specifically during code refactoring. This measure inversely correlates with Traceability: high-level integration or end-to-end tests should remain unaffected by code refactoring, whereas unit tests often require complete redesign, casting doubt on their necessity: they must be rewritten post-refactoring, reducing their quality.

This leads me to believe that the conventional testing pyramid is flawed if the goal is to develop the right product in the right way while keeping costs reasonable. I am convinced that the ideal testing model resembles not a pyramid but an inverted pear — a topic deserving its own discussion.

The Testing Pear

From a business standpoint, the ultimate measure of test quality is the cost of development and maintenance: the lower the cost, the higher the quality.

In summary: the overall quality of a test suite is a tensor in a four-dimensional space. Finding the balance and maximising overall value is understandably challenging.

The methodology that best balances these measures is BDD (Behaviour-Driven Development), also known as Specification by Example or Acceptance-Test Driven Development.

In BDD, you start with high-level integration tests that directly reflect user story acceptance criteria. Then, through outbound-inner development, you delve deeper into production code implementation until your integration tests pass.

Proper BDD practice naturally leads to 100% code coverage.

Once achieved keep it

My second thesis addresses the importance of maintaining 100% code coverage.

Imagine language-enforced constructions intruding into your code during production, like Java’s checked exceptions. Sometimes, the logic branch introduced by catch blocks is never reached at runtime, such as when reading a static resource. Testing these scenarios can be tedious, tempting developers to leave this code untested and lower the overall coverage threshold. However, allowing even a small gap between an acceptable threshold, say 95%, and the maximum 100% opens the door for regression bugs.

Having less than 100% code coverage is akin to sleeping under a short blanket: either your feet or your chest will be cold.

Yes, this image is generated by DALL-E, that’s why the guy on the picture has 6 toes :D

In my experience, even a 2% reduction in code coverage can signify a significant bug, rendering a critical business logic branch in the application unreachable. This may seem to contradict the above statement about pre-written tests detecting such issues, but no one is infallible: even the most meticulous developers can err, and their tests can be flawed.

Returning to the previous example of a difficult-to-test side branch not linked to any user story scenario, it’s vital to remember a key clean code and architecture metric: testability. If something is untestable, it signals the need for refactoring. In the case of the checked exception, introducing a testable helper service to read the resource by path could be a solution.

Not a panacea, but better than nothing

Last but certainly not least, it must be acknowledged that achieving full test coverage does not eliminate the risk of bugs in the code; however, it significantly mitigates this risk. In my experience, any bugs that are uncovered upon closer inspection typically stem from a deviation from strict adherence to BDD/TDD methodologies: such as not fully realising the specified scenarios, or overlooking discrepancies between the values outlined in the story and those used in the tests. To minimise this risk, peer code review proves invaluable. A key responsibility for a peer reviewer is to verify that the implemented automated tests precisely match the acceptance criteria laid out in the user stories. BDD and TDD are not merely practices but disciplines of code development, and full test coverage is a testament to how rigorously these protocols are followed.

Summarise

In conclusion, striving for 100% test coverage is not an aim in itself but the byproduct of diligently applying BDD/TDD methodologies. Yet, once this level of coverage is attained, it becomes crucial to maintain it to prevent the introduction of bugs. Importantly, while 100% coverage does not guarantee an absence of bugs, any that are discovered are likely to be the result of deviations from BDD/TDD practices. Thus, complete test coverage serves as both a shield against defects and a reflection of our commitment to rigorous development disciplines.

--

--

Andrey Lebedev

PhD in CS, a Software engineer with more than 20 years of experience. Check my LinkedIn profile for more information: https://www.linkedin.com/in/andremoniy/