Hypothesis will speed up your testing process and improve your software quality, but when first starting out people often struggle to figure out exactly how to use it.
Until you’re used to thinking in this style of testing, it’s not always obvious what the invariants of your code actually are, and people get stuck trying to come up with interesting ones to test.
Fortunately, there’s a simple invariant which every piece of software should satisfy, and which can be remarkably powerful as a way to uncover surprisingly deep bugs in your software.
That invariant is simple: The software shouldn’t crash. Or sometimes, it should only crash in defined ways.
There is then a standard test you can write for most of your code that asserts this invariant.
It consists of two steps:
- Pick a function in your code base that you want to be better tested.
- Call it with random data.
This style of testing is usually called fuzzing.
This will possibly require you to figure out how to generate your domain objects. Hypothesis has a pretty extensive library of tools (called ‘strategies’ in Hypothesis terminology) for generating custom types but if you can, try to start somewhere where the types you need aren’t too complicated to generate.
Chances are actually pretty good that you’ll find something wrong this way if you pick a sufficiently interesting entry point. For example, there’s a long track record of people trying to test interesting properties with their text handling and getting unicode errors when text() gives them something that their code didn’t know how to handle.
You’ll probably get exceptions here you don’t care about. e.g. some arguments to functions may not be valid. Set up your test to ignore those.
So at this point you’ll have something like this:
from hypothesis import given, reject from hypothesis.strategies import integers, text @given(integers(), text()) def test_some_stuff(x, y): try: my_function(x, y) except SomeExpectedException: reject()
In this example we generate two values - one integer, one text - and pass them to your test function. Hypothesis will repeatedly call the test function with values drawn from these strategies, trying to find one that produces an unexpected exception.
When an exception we know is possible happens (e.g. a ValueError because some argument was out of range) we call reject. This discards the example, and Hypothesis won’t count it towards the ‘budget’ of examples it is allowed to run.
This is already a pretty good starting point and does have a decent tendency to flush out bugs. You’ll often find cases where you forgot some boundary condition and your code misbehaves as a result. But there’s still plenty of room to improve.
There are now two directions you can go in from here:
- Try to assert some things about the function’s result. Anything at all. What type is it? Can it be None? Does it have any relation at all to the input values that you can check? It doesn’t have to be clever - even very trivial properties are useful here.
- Start making your code more defensive.
The second part is probably the most productive one.
The goal is to turn faulty assumptions in your code into crashes instead of silent corruption of your application state. You can do this in a couple ways:
- Add argument checking to your functions (Hypothesis uses a dedicated InvalidArgumentException for this case, but you can raise whatever errors you find appropriate).
- Add a whole bunch of assertions into your code itself. Even when it’s hard to reason about formal properties of your code, it’s usually relatively easy to add local properties, and assertions are a great way to encode them. John Regehr has a good post on this subject if you want to know more about it.
This approach will make your code more robust even if you don’t find any bugs in it during testing (and you’ll probably find bugs in it during testing), and it gives you a nice easy route into property based testing by letting you focus on only one half of the problem at a time.
Once you think you’ve got the hang of this, a good next step is to start looking for places with complex optimizations or Encode/Decode pairs in your code, as they’re a fairly easy properties to test and are both rich sources of bugs.