Lean Testing, or Why Unit Tests are Worse than You Think
An economic perspective on testing
An economic perspective on testing
Testing is a controversial topic. People have strong convictions about testing approaches. Test Driven Development is the most prominent example. Clear empirical evidence is missing, which invites strong claims. I advocate for an economic perspective towards testing. Secondly, I claim that focussing too much on unit tests is not the most economic approach. I coin this testing philosophy “Lean Testing.”
The main argument is as follows: different kinds of tests have different costs and benefits. You have finite resources to distribute into testing. You want to get the most out of your tests, so use the most economic testing approach. For many domains (e.g. GUIs), tests other than unit tests give you more bang for your buck.
Confidence and Tests
The article ' Write tests. Not too many. Mostly integration' and the related video by Kent C. Dodds express the ideas behind Lean Testing well. He introduces three dimensions with which to measure tests:
- Cost (cheap vs. expensive)
- Speed (fast vs. slow)
- Confidence (low vs. high) (click doesn't work vs. checkout doesn't work)
The following is the 'Testing Trophy' suggesting how to distribute your testing resources.
Compared to Fowler's Testing Pyramid, confidence as a dimension is added. Another difference is that unit tests do not cover the largest area.
One of Kent C. Dodds' major insights is that you should actually consider the confidence a test gives you: "The more your tests resemble the way your software is used, the more confidence they can give you."
Return on Investment of Tests
The Return on investment (ROI) of an end-to-end test is higher than that of a unit test. This is because an end-to-end test covers a greater area of the code base. Even taking into account higher costs, it provides disproportionally more confidence.
Plus, end-to-end tests test the critical paths that your users actually take. Whereas unit tests may test corner cases that are never or very seldomly encountered in practice. The individual parts may work but the whole might not. The previous points can be found in ' Unit Test Fetish' by Martin Sústrik.
Further, Kent C. Dodds claims that integration tests provide the best balance of cost, speed and confidence. I subscribe to that claim. We don't have empirical evidence showing that this is actually true, unfortunately. Still, my argument goes like this: End-to-end tests provide the greatest confidence. If they weren't so costly to write and slow to run we would only use end-to-end tests. Although better tools like Cypress mitigate these downsides. Unit tests are less costly to write and faster to run but they test only a small part that might not even be critical. Integration tests lie somewhere between unit tests and end-to-end tests so they provide the best balance.
As an aside: The term “integration test,” and even more so “end-to-end test,” seems to generate intense fear in some people. Such tests are supposed to be brittle, hard-to-setup and slow-to-run. The main idea is to just not mock so much.
In the React context of Kent C. Dodd’s article integration testing refers to not using shallow rendering. An integration test covers several components at once. Such a test is easier to write and more stable since you do not have to mock so much and you are less likely to test implementation details.
In the backend world, an integration test would run against a real database and make real HTTP requests (to your controller endpoints). It is no problem to spin up a Docker database container beforehand and have its state reset after each test. Again, these tests run fast, are easy to write, reliable and resilient against code changes.
Code Coverage
Another point is that code coverage has diminishing returns. In practice, most agree as most projects set the lower bound for coverage to around 80%. There is actually supporting research such as ' Exploding Software-Engineering Myths.' What follows are general arguments.
Even with 100% code coverage you trust your dependencies. They can, in principle, have 0% code coverage.
For many products, it is acceptable to have the common cases work but not the exotic ones ( Unit Test Fetish). If you miss a corner case bug due to low code coverage that affects 0.1% of your users you might survive. If your time to market increases due to high code coverage demands you might not survive. And "just because you ran a function or ran a line does not mean it will work for the range of inputs you are allowing" ( source).
Code Quality and Unit Tests
There is the claim that making your code unit-testable will improve its quality. Many arguments and some empirical evidence in favor of that claim exist so I will put light on the other side.
The article ‘ Unit Test Fetish’ states that unit tests are an anti-architecture device. Architecture is what makes software able to change. Unit tests ossify the internal structure of the code. Here is an example:
"Imagine you have three components, A, B and C. You have written extensive unit test suite to test them. Later on you decide to refactor the architecture so that functionality of B will be split among A and C. you now have two new components with different interfaces. All the unit tests are suddenly rendered useless. Some test code may be reused but all in all the entire test suite has to be rewritten."
This means that unit tests increase maintenance liabilities because they are less resilient against code changes. Coupling between modules and their tests is introduced! Tests are system modules as well. See ‘ Why Most Unit Testing is Waste’ for these points.
There are also some psychological arguments. For example, if you value unit-testability, you would prefer a program design that is easier to test than a design that is harder to test but is otherwise better, because you know that you'll spend a lot more time writing tests. Some further points can be found in ' Giving Up on Test-First Development'.
The article ' Test-induced Design Damage' by David Heinemeier Hansson claims that to accommodate unit testing objectives, code is worsened through otherwise needless indirection. The question is if extra indirection and decoupled code is always better. Does it not have a cost? What if you decouple two components that are always used together. Was it worth decoupling them? You can claim that indirection is always worth it but you cannot, at least, dismiss harder navigation inside the code base and during run-time.
Conclusion
An economic point of view helps to reconsider the Return on Investment of unit tests. Consider the confidence a test provides. Integration tests provide the best balance between cost, speed and confidence. Be careful about code coverage as too high aspirations there are likely counter-productive. Be skeptical about the code-quality improving powers of making code unit-testable.
To make it clear, I do not advocate to never write unit tests. I hope that I provided a fresh perspective on testing. As a future article, I plan to present how to concretely implement a good integration test for both a frontend and backend project.
If you desire clear, albeit unnuanced, instructions, here is what you should do: Use a typed language. Focus on integration and end-to-end tests. Use unit tests only where they make sense (e.g. pure algorithmic code with complex corner cases). Be economic. Be lean.
Sources
- Write tests. Not too many. Mostly integration
- Unit Test Fetish
- Test-induced Design Damage
- Exploding Software-Engineering Myths
- Why Most Unit Testing is Waste
Additional Notes
One of the problems of discussing the costs and benefits of unit tests is that the boundary between unit and integration tests is fuzzy. The terminology is not completely unambiguous so people tend to talk at cross purposes.
To make it clear, low code coverage does not imply fewer bugs. As the late Dijkstra said (1969): “Testing shows the presence, not the absence of bugs.”
There is research that didn’t find Test Driven Development (TDD) improving coupling and cohesion metrics. TDD and unit tests aren’t synonyms but in the context of this article it’s still interesting: ‘ Does Test-Driven Development Really Improve Software Design Quality?’ Another article ‘ Unit Testing Doesn’t Affect Codebases the Way You Would Think’ analyzes code bases and finds that code with more unit tests has more cyclomatic complexity per method, more lines of code per method and similar nesting depth.
This article focussed on which kinds of tests you should distribute your automated testing budget. Let's take a step back and consider reducing the automated testing budget altogether. Then we'd have more time to think about the problems, find better solutions and explore. This is especially important for GUIs as often there is no 'correct' behavior but there is 'good' behavior. Paradoxically, reducing your automated testing budget might lead to a better product. See also ‘ Hammock Driven Development’.
There is a difference between library and app code. The former has different requirements and constraints where 100% code coverage via unit tests likely makes sense. There is a difference between frontend and backend code. There is a difference between code for nuclear reactors and games. Each project is different. The constraints and risks are different. Thus, to be lean, you should adjust your testing approach to the project you're working on.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Software Engineer!