These are articles centered on detailed understanding of a particular aspect of how to use Hypothesis. They’re not in depth looks at the internals, but are focused on practical questions of how to use it.

If you’re new to Hypothesis we recommend skipping this section for now and checking out the intro section instead.

Hypothesis for Computer Science Researchers

I’m in the process of trying to turn my work on Hypothesis into a PhD and I realised that I don’t have a good self-contained summary as to why researchers should care about it.

So this is that piece. I’ll try to give a from scratch introduction to the why and what of Hypothesis. It’s primarily intended for potential PhD supervisors, but should be of general interest as well (especially if you work in this field).

Why should I care about Hypothesis from a research point of view?

The short version:

Hypothesis takes an existing effective style of testing (property-based testing) which has proven highly effective in practice and makes it accessible to a much larger audience. It does so by taking several previously unconnected ideas from the existing research literature on testing and verification, and combining them to produce a novel implementation that has proven very effective in practice.

The long version is the rest of this article.

How Hypothesis Works

Hypothesis has a very different underlying implementation to any other property-based testing system. As far as I know, it’s an entirely novel design that I invented.

Central to this design is the following feature set which every Hypothesis strategy supports automatically (the only way to break this is by having the data generated depend somehow on external global state):

  1. All generated examples can be safely mutated
  2. All generated examples can be saved to disk (this is important because Hypothesis remembers and replays previous failures).
  3. All generated examples can be shrunk
  4. All invariants that hold in generation must hold during shrinking ( though the probability distribution can of course change, so things which are only supported with high probability may not be).

(Essentially no other property based systems manage one of these claims, let alone all)

The initial mechanisms for supporting this were fairly complicated, but after passing through a number of iterations I hit on a very powerful underlying design that unifies all of these features.

It’s still fairly complicated in implementation, but most of that is optimisations and things needed to make the core idea work. More importantly, the complexity is quite contained: A fairly small kernel handles all of the complexity, and there is little to no additional complexity (at least, compared to how it normally looks) in defining new strategies, etc.

This article will give a high level overview of that model and how it works.

Compositional shrinking

In my last article about shrinking, I discussed the problems with basing shrinking on the type of the values to be shrunk.

In writing it though I forgot that there was a halfway house which is also somewhat bad (but significantly less so) that you see in a couple of implementations.

This is when the shrinking is not type based, but still follows the classic shrinking API that takes a value and returns a lazy list of shrinks of that value. Examples of libraries that do this are theft and QuickTheories.

This works reasonably well and solves the major problems with type directed shrinking, but it’s still somewhat fragile and importantly does not compose nearly as well as the approaches that Hypothesis or test.check take.

Ideally, as well as not being based on the types of the values being generated, shrinking should not be based on the actual values generated at all.

This may seem counter-intuitive, but it actually works pretty well.

Integrated vs type based shrinking

One of the big differences between Hypothesis and Haskell QuickCheck is how shrinking is handled.

Specifically, the way shrinking is handled in Haskell QuickCheck is bad and the way it works in Hypothesis (and also in test.check and EQC) is good. If you’re implementing a property based testing system, you should use the good way. If you’re using a property based testing system and it doesn’t use the good way, you need to know about this failure mode.

Unfortunately many (and possibly most) implementations of property based testing are based on Haskell’s QuickCheck and so make the same mistake.

How many times will Hypothesis run my test?

This is one of the most common first questions about Hypothesis.

People generally assume that the number of tests run will depend on the specific strategies used, but that’s generally not the case. Instead Hypothesis has a fairly fixed set of heuristics to determine how many times to run, which are mostly independent of the data being generated.

But how many runs is that?

The short answer is 200. Assuming you have a default configuration and everything is running smoothly, Hypothesis will run your test 200 times.

The longer answer is “It’s complicated”. It will depend on the exact behaviour of your tests and the value of some settings. In this article I’ll try to clear up some of the specifics of which settings affect the answer and how.

Anatomy of a Hypothesis Based Test

What happens when you run a test using Hypothesis? This article will help you understand.

