Tests - Why & How To Make Them Better¶

Almost everyone agrees that testing is important, but it isn't always easy to explain why testing is important. Often we explain the need in terms of correctness - that "tests mean we can prove that the system is bug free". Unfortunately, with the exception of trivially small systems this is never true. All a comprehensive, automated test suite can do is reduce the likelihood of defects, it cannot eliminate them entirely. So why bother?

To talk about tests solely in terms of defects and correctness really only covers small part of their value. There are at least 2 other areas that tests help address that are likely to have a much greater impact. Both of which are about the future not the present.

The lions share of the cost of a software system is not in the initial development but in the on going development and maintenance. After all, software is never finished, only abandoned. Therefore anything that can be done to help minimise the on-going cost of maintenance is extremely valuable. A test suite itself does this by helping identify regressions in the system behaviour as it evolves over time. The faster a regression is identified the faster it can be fixed and smaller the impact it has.

This is one of the reasons that it is important to write tests that not only follow the happy path but also test the unhappier paths too, where things are not as expected. Testing with incorrect or unexpected parameters to ensure the system handles them graceful. Simulating network outages or inconsistent data to verify the system can recover or at least makes such anomalies observable. These paths, the edges, are where it is mostly likely the things have been overlooked and therefore the most likely source of defects and regressions. Ensuring they are covered in your test suite means that should regressions appear in these areas, they can be efficiently dealt with before an end user discover that the features that were working before don't any more.

Some take these ideas to more aggressive extremes with things with fuzz and mutation testing. The former bombards the system under test with random and often incorrect input in order that one can say that it is robust and deals with an inconsistent world view gracefully. Whilst the later helps ensure that a system's test suite itself is comprehensive by making random changes, mutations, to the behaviour of the system itself, which should be caught by the test suite.

Another key source of value in your tests is that they should be seen as a form of documentation. They give other developers an opportunity to see how a given function, module or system is supposed to be used. In other words they provide context. This is invaluable to new developers coming onto a project. Sure they can look at the code itself and try to understand how it all fits together... But sometimes that is not altogether straight forward, particularly in large or complex systems. High level tests can often demonstrate how it all fits together and what the expected behaviours are much more succinctly than the code itself.

If you think about tests in terms of documentation is becomes clear fairly quickly thatd how you structure you tests is important. There is little obvious value in a test called minurl_works. What is minurl? What does it do? What defines works? We all know naming things is hard but tests benefit hugely from a more literate approach. Terse should not be a word that even enters your thoughts. For example what if we had named that test given_a_full_and_valid_url_minurl_produces_a_shortened_one. That one test, a happy path one says everything you need to know about the test in that one sentence. And yes, it is a sentence, one you can read verbatim and understand. And yes I did use underscores between words, and no - it's not just because I'm a Python developer. Sentences (in English as least) are just easier to parse quickly whenTheWordsAreMoreVisuallySeparatedThatCamelCasingThemWouldAllowFor.

This also leads on to another key way of thinking about test structure. Test should be laid out in a formulaic manner following a repeatable pattern of:

Arrange
Act
Assert

The first part of the test should be to arrange all the piece. For example, create the data structures and objects you need. Perhaps using factories to ease the creation of more complex data structures.

This should be followed by performing the actual action that you want the test to check on. This might be making a request and collecting the response. Or it could be triggering some kind of action that you expect to mutate data.

Finally you come to assert that the expected behaviour actually occurred. The response was a 200 OK or that a new user was added to the system.

Those who work in the front end more often may more readily recognise the words given, when, then. This really amounts to the same thing as the triple-A approach described above. I'm not only talking behavioural tests either - the same structure is equally valid for unit tests and integration tests as it is for functional and UI testing.

By using this repeated structural pattern for your tests it will make it much easier for those not familiar with the code base to understand what is going on. I'm fairly sure it will also be easier for you to come back to it a week or so later and slide right back into your own mental model of how it all fits together too!

This pattern will often mean you end up repeating yourself over and over again. In our normal system code it is considered good practice to extract code that is needed in a few places to avoid repeating yourself. Maintenance can happen in one place and everything that depends on it is able to take advantage immediately. But this does come at a cost. Abstractions, which these extraction inevitably become, reduce our ability to understand everything that is going on at that one point in the code. We need to build up internal mental models of how the various layers, abstractions and systems interact with each other in a less localised way. In considering tests as a form of documentation the pressures are almost reversed. You want the tests to be a simple and straightforward to understand as possible. You don't want people to have to jump all around the code base to work out what is going on. So for tests, repetition can be a good thing. Reducing readability and thereby the readers ability to fully understand what a test is doing to save yourself a handful of keystrokes is not a good trade off if you look beyond the simple expression of a test as an indicator of correctness.

Correctness is only a small part of the over all value of a comprehensive test suite. Significant additional value comes from making on-going development and maintenance smoother and cheaper, as well as in providing better context to developers starting on your product or coming back to it after time spent away.