These are articles centered on detailed understanding of a particular aspect
of how to use Hypothesis. They’re not in depth looks at the internals, but
are focused on practical questions of how to use it.
If you’re new to Hypothesis we recommend skipping this section for now and
checking out the intro section instead.
So this is that piece. I’ll try to give a from scratch introduction to the why and what of Hypothesis. It’s primarily
intended for potential PhD supervisors, but should be of general interest as well (especially if you
work in this field).
Why should I care about Hypothesis from a research point of view?
The short version:
Hypothesis takes an existing effective style of testing (property-based testing) which has proven highly effective in practice
and makes it accessible to a much larger audience. It does so by taking several previously unconnected ideas from the existing
research literature on testing and verification, and combining them to produce a novel implementation that has proven very effective
Hypothesis has a very different underlying implementation to any other
property-based testing system. As far as I know, it’s an entirely novel
design that I invented.
Central to this design is the following feature set which every
Hypothesis strategy supports automatically (the only way to break
this is by having the data generated depend somehow on external
All generated examples can be safely mutated
All generated examples can be saved to disk (this is important because
Hypothesis remembers and replays previous failures).
All generated examples can be shrunk
All invariants that hold in generation must hold during shrinking (
though the probability distribution can of course change, so things
which are only supported with high probability may not be).
(Essentially no other property based systems manage one of these claims,
let alone all)
The initial mechanisms for supporting this were fairly complicated, but
after passing through a number of iterations I hit on a very powerful
underlying design that unifies all of these features.
It’s still fairly complicated in implementation, but most of that is
optimisations and things needed to make the core idea work. More
importantly, the complexity is quite contained: A fairly small kernel
handles all of the complexity, and there is little to no additional
complexity (at least, compared to how it normally looks) in defining
new strategies, etc.
This article will give a high level overview of that model and how
In writing it though I forgot that there was a halfway house which is
also somewhat bad (but significantly less so) that you see in a couple
This is when the shrinking is not type based, but still follows the
classic shrinking API that takes a value and returns a lazy list of
shrinks of that value. Examples of libraries that do this are
This works reasonably well and solves the major problems with type
directed shrinking, but it’s still somewhat fragile and importantly
does not compose nearly as well as the approaches that Hypothesis
or test.check take.
Ideally, as well as not being based on the types of the values being
generated, shrinking should not be based on the actual values generated
This may seem counter-intuitive, but it actually works pretty well.
One of the big differences between Hypothesis and Haskell QuickCheck is
how shrinking is handled.
Specifically, the way shrinking is handled in Haskell QuickCheck is bad
and the way it works in Hypothesis (and also in test.check and EQC) is
good. If you’re implementing a property based testing system, you should
use the good way. If you’re using a property based testing system and it
doesn’t use the good way, you need to know about this failure mode.
Unfortunately many (and possibly most) implementations of property based
testing are based on Haskell’s QuickCheck and so make the same mistake.
This is one of the most common first questions about Hypothesis.
People generally assume that the number of tests run will depend on
the specific strategies used, but that’s generally not the case.
Instead Hypothesis has a fairly fixed set of heuristics to determine
how many times to run, which are mostly independent of the data
But how many runs is that?
The short answer is 200. Assuming you have a default configuration
and everything is running smoothly, Hypothesis will run your test
The longer answer is “It’s complicated”. It will depend on the exact
behaviour of your tests and the value of some settings. In this article
I’ll try to clear up some of the specifics of which
affect the answer and how.