Hypothesis

The Hypothesis continuous release process

Tuesday, February 27, 2018

development-process

If you watch the Hypothesis changelog, you’ll notice the rate of releases sped up dramatically in 2017. We released over a hundred different versions, sometimes multiple times a day.

This is all thanks to our continuous release process. We’ve completely automated the process of releasing, so every pull request that changes code gets a new release, without any human input. In this post, I’ll explain how our continuous releases work, and why we find it so useful.

Smarkets's funding of Hypothesis

Monday, January 08, 2018

python

Happy new year everybody!

In this post I’d like to tell you about one of the nice things that happened in 2017: The Hypothesis work that was funded by Smarkets Smarkets are an exchange for peer-to-peer trading of bets but, more importantly for us, they are fairly heavy users of Hypothesis for the Python part of their stack.

The Threshold Problem

Thursday, September 28, 2017

details python technical

In my last post I mentioned the problem of bug slippage: When you start with one bug, reduce the test case, and end up with another bug.

I’ve run into another related problem twice now, and it’s not one I’ve seen talked about previously.

The problem is this: Sometimes shrinking makes a bug seem much less interesting than it actually is.

When multiple bugs attack

Tuesday, September 26, 2017

details python technical

When Hypothesis finds an example triggering a bug, it tries to shrink the example down to something simpler that triggers it. This is a pretty common feature, and most property-based testing libraries implement something similar (though there are a number of differences between them). Stand-alone test case reducers are also fairly common, as it’s a useful thing to be able to do when reporting bugs in external projects - rather than submitting a giant file triggering the bug, a good test case reducer can often shrink it down to a couple of lines.

But there’s a problem with doing this: How do you know that the bug you started with is the same as the bug you ended up with?

This isn’t just an academic question. It’s very common for the bug you started with to slip to another one.

Consider for example, the following test:

from hypothesis import given, strategies as st

def mean(ls):
    return sum(ls) / len(ls)


@given(st.lists(st.floats()))
def test(ls):
    assert min(ls) <= mean(ls) <= max(ls)

This has a number of interesting ways to fail: We could pass NaN, we could pass [-float('inf'), +float('inf')], we could pass numbers which trigger a precision error, etc.

But after test case reduction, we’ll pass the empty list and it will fail because we tried to take the min of an empty sequence.

This isn’t necessarily a huge problem - we’re still finding a bug after all (though in this case as much in the test as in the code under test) - and sometimes it’s even desirable - you find more bugs this way, and sometimes they’re ones that Hypothesis would have missed - but often it’s not, and an interesting and rare bug slips to a boring and common one.

Historically Hypothesis has had a better answer to this than most - because of the Hypothesis example database, all intermediate bugs are saved and a selection of them will be replayed when you rerun the test. So if you fix one bug then rerun the test, you’ll find the other bugs that were previously being hidden from you by that simpler bug.

But that’s still not a great user experience - it means that you’re not getting nearly as much information as you could be, and you’re fixing bugs in Hypothesis’s priority order rather than yours. Wouldn’t it be better if Hypothesis just told you about all of the bugs it found and you could prioritise them yourself?

Well, as of Hypothesis 3.29.0, released a few weeks ago, now it does!

If you run the above test now, you’ll get the following:

Falsifying example: test(ls=[nan])
Traceback (most recent call last):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
    print_example=True, is_final=True
  File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
    return test(*args, **kwargs)
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
    result = test(*args, **kwargs)
  File "broken.py", line 9, in test
    assert min(ls) <= mean(ls) <= max(ls)
AssertionError

Falsifying example: test(ls=[])
Traceback (most recent call last):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 671, in run
    print_example=True, is_final=True
  File "/home/david/hypothesis-python/src/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 120, in run
    return test(*args, **kwargs)
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 531, in timed_test
    result = test(*args, **kwargs)
  File "broken.py", line 9, in test
    assert min(ls) <= mean(ls) <= max(ls)
ValueError: min() arg is an empty sequence

You can add @seed(67388524433957857561882369659879357765) to this test to reproduce this failure.
Traceback (most recent call last):
  File "broken.py", line 12, in <module>
    test()
  File "broken.py", line 8, in test
    def test(ls):
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 815, in wrapped_test
    state.run()
  File "/home/david/hypothesis-python/src/hypothesis/core.py", line 732, in run
    len(self.falsifying_examples,)))
hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.

(The stack traces are a bit noisy, I know. We have an issue open about cleaning them up).

All of the different bugs are minimized simultaneously and take full advantage of Hypothesis’s example shrinking, so each bug is as easy (or hard) to read as if it were the only bug we’d found.

This isn’t perfect: The heuristic we use for determining if two bugs are the same is whether they have the same exception type and the exception is thrown from the same line. This will necessarily conflate some bugs that are actually different - for example, [float('nan')], [-float('inf'), float('inf')] and [3002399751580415.0, 3002399751580415.0, 3002399751580415.0] each trigger the assertion in the test, but they are arguably “different” bugs.

But that’s OK. The heuristic is deliberately conservative - the point is not that it can distinguish whether any two examples are the same bug, just that any two examples it distinguishes are different enough that it’s interesting to show both, and this heuristic definitely manages that.

As far as I know this is a first in property-based testing libraries (though something like it is common in fuzzing tools, and theft is hot on our tail with something similar) and there’s been some interesting related but mostly orthogonal research in Erlang QuickCheck.

It was also surprisingly easy.

A lot of things went right in writing this feature, some of them technical, some of them social, somewhere in between.

The technical ones are fairly straightforward: Hypothesis’s core model turned out to be very well suited to this feature. Because Hypothesis has a single unified intermediate representation which defines a total ordering for simplicity, adapting Hypothesis to shrink multiple things at once was quite easy - whenever we attempt a shrink and it produces a different bug than the one we were looking for, we compare it to our existing best example for that bug and replace it if the current one is better (or we’ve discovered a new bug). We then just repeatedly run the shrinking process for each bug we know about until they’ve all been fully shrunk.

This is in a sense not surprising - I’ve been thinking about the problem of multiple-shrinking for a long time and, while this is the first time it’s actually appeared in Hypothesis, the current choice of model was very much informed by it.

The social ones are perhaps more interesting. Certainly I’m very pleased with how they turned out here.

The first is that this work emerged tangentially from the recent Stripe funded work - Stripe paid me to develop some initial support for testing Pandas code with Hypothesis, and I observed a bunch of bug slippage happening in the wild while I was testing that (it turns out there are quite a lot of ways to trigger exceptions from Pandas - they weren’t really Pandas bugs so much as bugs in the Pandas integration, but they still slipped between several different exception types), so that was what got me thinking about this problem again.

Not by accident, this feature also greatly simplified the implementation of the new deadline feature that Smarkets funded, which was going to have to have a lot of logic about how deadlines and bugs interacted, but all that went away as soon as we were able to handle multiple bugs sensibly.

This has been a relatively consistent theme in Hypothesis development - practical problems tend to spark related interesting theoretical developments. It’s not a huge exaggeration to say that the fundamental Hypothesis model exists because I wanted to support testing Django nicely. So the recent funded development from Stripe and Smarkets has been a great way to spark a lot of seemingly unrelated development and improve Hypothesis for everyone, even outside the scope of the funded work.

Another thing that really helped here is our review process, and the review from Zac in particular.

This wasn’t the feature I originally set out to develop. It started out life as a much simpler feature that used much of the same machinery, and just had a goal of avoiding slipping to new errors all together. Zac pushed back with some good questions around whether this was really the correct thing to do, and after some experimentation and feedback I eventually hit on the design that lead to displaying all of the errors.

Our review handbook emphasises that code review is a collaborative design process, and I feel this was a particularly good example of that. We’ve created a great culture of code review, and we’re reaping the benefits (and if you want to get in on it, we could always use more people able and willing to do review…).

All told, I’m really pleased with how this turned out. I think it’s a nice example of getting a lot of things right up front and this resulting in a really cool new feature.

I’m looking forward to seeing how it behaves in the wild. If you notice any particularly fun examples, do let me know, or write up a post about them yourself!

Moving Beyond Types

Sunday, July 16, 2017

alternatives details python technical

If you look at the original property-based testing library, the Haskell version of QuickCheck, tests are very closely tied to types: The way you typically specify a property is by inferring the data that needs to be generated from the types the test function expects for its arguments.

This is a bad idea.

Solving the Water Jug Problem from Die Hard 3 with TLA+ and Hypothesis

Wednesday, April 05, 2017

intro python technical

This post was originally published on the author’s personal site. It is reproduced here with his permission.

In the movie Die Hard with a Vengeance (aka Die Hard 3), there is this famous scene where John McClane (Bruce Willis) and Zeus Carver (Samuel L. Jackson) are forced to solve a problem or be blown up: Given a 3 gallon jug and 5 gallon jug, how do you measure out exactly 4 gallons of water?

(The video title is wrong. It's Die Hard 3.)

Hypothesis for Computer Science Researchers

Thursday, March 09, 2017

details python technical

I’m in the process of trying to turn my work on Hypothesis into a PhD and I realised that I don’t have a good self-contained summary as to why researchers should care about it.

So this is that piece. I’ll try to give a from scratch introduction to the why and what of Hypothesis. It’s primarily intended for potential PhD supervisors, but should be of general interest as well (especially if you work in this field).

Why should I care about Hypothesis from a research point of view?

The short version:

Hypothesis takes an existing effective style of testing (property-based testing) which has proven highly effective in practice and makes it accessible to a much larger audience. It does so by taking several previously unconnected ideas from the existing research literature on testing and verification, and combining them to produce a novel implementation that has proven very effective in practice.

The long version is the rest of this article.

How Hypothesis Works

Saturday, December 10, 2016

details python technical

Hypothesis has a very different underlying implementation to any other property-based testing system. As far as I know, it’s an entirely novel design that I invented.

Central to this design is the following feature set which every Hypothesis strategy supports automatically (the only way to break this is by having the data generated depend somehow on external global state):

All generated examples can be safely mutated
All generated examples can be saved to disk (this is important because Hypothesis remembers and replays previous failures).
All generated examples can be shrunk
All invariants that hold in generation must hold during shrinking ( though the probability distribution can of course change, so things which are only supported with high probability may not be).

(Essentially no other property based systems manage one of these claims, let alone all)

The initial mechanisms for supporting this were fairly complicated, but after passing through a number of iterations I hit on a very powerful underlying design that unifies all of these features.

It’s still fairly complicated in implementation, but most of that is optimisations and things needed to make the core idea work. More importantly, the complexity is quite contained: A fairly small kernel handles all of the complexity, and there is little to no additional complexity (at least, compared to how it normally looks) in defining new strategies, etc.

This article will give a high level overview of that model and how it works.

Compositional shrinking

Thursday, December 08, 2016

alternatives details python technical

In my last article about shrinking, I discussed the problems with basing shrinking on the type of the values to be shrunk.

In writing it though I forgot that there was a halfway house which is also somewhat bad (but significantly less so) that you see in a couple of implementations.

This is when the shrinking is not type based, but still follows the classic shrinking API that takes a value and returns a lazy list of shrinks of that value. Examples of libraries that do this are theft and QuickTheories.

This works reasonably well and solves the major problems with type directed shrinking, but it’s still somewhat fragile and importantly does not compose nearly as well as the approaches that Hypothesis or test.check take.

Ideally, as well as not being based on the types of the values being generated, shrinking should not be based on the actual values generated at all.

This may seem counter-intuitive, but it actually works pretty well.

Integrated vs type based shrinking

Monday, December 05, 2016

alternatives details python technical

One of the big differences between Hypothesis and Haskell QuickCheck is how shrinking is handled.

Specifically, the way shrinking is handled in Haskell QuickCheck is bad and the way it works in Hypothesis (and also in test.check and EQC) is good. If you’re implementing a property based testing system, you should use the good way. If you’re using a property based testing system and it doesn’t use the good way, you need to know about this failure mode.

Unfortunately many (and possibly most) implementations of property based testing are based on Haskell’s QuickCheck and so make the same mistake.

3.6.0 Release of Hypothesis for Python

Monday, October 31, 2016

news non-technical python

This is a release announcement for the 3.6.0 release of Hypothesis for Python. It’s a bit of an emergency release.

Hypothesis 3.5.0 inadvertently added a dependency on GPLed code (see below for how this happened) which this release removes. This means that if you are running Hypothesis 3.5.x then there is a good chance you are in violation of the GPL and you should update immediately.

Apologies for any inconvenience this may have caused.

Another invariant to test for encoders

Monday, October 17, 2016

intro properties python technical

The encode/decode invariant is one of the most important properties to know about for testing your code with Hypothesis or other property-based testing systems, because it captures a very common pattern and is very good at finding bugs.

But how do you go beyond it? If encoders are that common, surely there must be other things to test with them?

Seeking funding for deeper integration between Hypothesis and pytest

Saturday, October 01, 2016

news non-technical python

Probably the number one complaint I hear from Hypothesis users is that it “doesn’t work” with py.test fixtures. This isn’t true, but it does have one very specific limitation in how it works that annoys people: It only runs function scoped fixtures once for the entire test, not once per example. Because of the way people use function scoped fixtures for handling stateful things like databases, this often causes people problems.

I’ve been maintaining for a while that this is impossible to fix without some changes on the pytest end.

The good news is that this turns out not to be the case. After some conversations with pytest developers, some examining of other pytest plugins, and a bunch of prototyping, I’m pretty sure it’s possible. It’s just really annoying and a lot of work.

So that’s the good news. The bad news is that this isn’t going to happen without someone funding the work.

I’ve now spent about a week of fairly solid work on this, and what I’ve got is quite promising: The core objective of running pytest fixtures for every examples works fairly seamlessly.

But it’s now in the long tail of problems that will need to be squashed before this counts as an actual production ready releasable piece of work. A number of things don’t work. For example, currently it’s running some module scoped fixtures once per example too, which it clearly shouldn’t be doing. It also currently has some pretty major performance problems that are bad enough that I would consider them release blocking.

As a result I’d estimate there’s easily another 2-3 weeks of work needed to get this out the door.

Which brings us to the crux of the matter: 2-3 additional weeks of free work on top of the one I’ve already done is 3-4 weeks more free work than I particularly want to do on this feature, so without sponsorship it’s not getting finished.

I typically charge £400/day for work on Hypothesis (this is heavily discounted off my normal rates), so 2-3 weeks comes to £4000 to £6000 (roughly $5000 to $8000) that has to come from somewhere.

I know there are a number of companies out there using pytest and Hypothesis together. I know from the amount of complaining about this integration that this is a real problem you’re experiencing. So, I think this money should come from those companies. Besides helping to support a tool you’ve already got a lot of value out of, this will expand the scope of what you can easily test with Hypothesis a lot, and will be hugely beneficial to your bug finding efforts.

This is a model that has worked well before with the funding of the recent statistics work by Jean-Louis Fuchs and Adfinis-SyGroup, and I’m confident it can work well again.

If you work at such a company and would like to talk about funding some or part of this development, please email me at [email protected].

3.5.0 and 3.5.1 Releases of Hypothesis for Python

Friday, September 23, 2016

news non-technical python

This is a combined release announcement for two releases. 3.5.0 was released yesterday, and 3.5.1 has been released today after some early bug reports in 3.5.0

Changes

3.5.0 - 2016-09-22

This is a feature release.

fractions() and decimals() strategies now support min_value and max_value parameters. Thanks go to Anne Mulhern for the development of this feature.
The Hypothesis pytest plugin now supports a –hypothesis-show-statistics parameter that gives detailed statistics about the tests that were run. Huge thanks to Jean-Louis Fuchs and Adfinis-SyGroup for funding the development of this feature.
There is a new event() function that can be used to add custom statistics.

Additionally there have been some minor bug fixes:

In some cases Hypothesis should produce fewer duplicate examples (this will mostly only affect cases with a single parameter).
py.test command line parameters are now under an option group for Hypothesis (thanks to David Keijser for fixing this)
Hypothesis would previously error if you used function annotations on your tests under Python 3.4.
The repr of many strategies using lambdas has been improved to include the lambda body (this was previously supported in many but not all cases).

3.5.1 - 2016-09-23

This is a bug fix release.

Hypothesis now runs cleanly in -B and -BB modes, avoiding mixing bytes and unicode.
unittest.TestCase tests would not have shown up in the new statistics mode. Now they do.
Similarly, stateful tests would not have shown up in statistics and now they do.
Statistics now print with pytest node IDs (the names you’d get in pytest verbose mode).

Notes

Aside from the above changes, there are a couple big things behind the scenes of this release that make it a big deal.

The first is that the flagship chunk of work, statistics, is a long-standing want to have that has never quite been prioritised. By funding it, Jean-Louis and Adfinis-SyGroup successfully bumped it up to the top of the priority list, making it the first funded feature in Hypothesis for Python!

Another less significant but still important is that this release marks the first real break with an unofficial Hypothesis for Python policy of not having any dependencies other than the standard library and backports. This release adds a dependency on the uncompyle6 package. This may seem like an odd choice, but it was invaluable for fixing the repr behaviour, which in turn was really needed for providing good statistics for filter and recursive strategies.

Hypothesis vs. Eris

Sunday, September 04, 2016

intro python technical

Eris is a library for property-based testing of PHP code, inspired by the mature frameworks that other languages provide like QuickCheck, Clojure’s test.check and of course Hypothesis.

Here is a side-by-side comparison of some basic and advanced features that have been implemented in both Hypothesis and Eris, which may help developers coming from either Python or PHP and looking at the other side.

How many times will Hypothesis run my test?

Wednesday, August 31, 2016

details faq python technical

This is one of the most common first questions about Hypothesis.

People generally assume that the number of tests run will depend on the specific strategies used, but that’s generally not the case. Instead Hypothesis has a fairly fixed set of heuristics to determine how many times to run, which are mostly independent of the data being generated.

But how many runs is that?

The short answer is 200. Assuming you have a default configuration and everything is running smoothly, Hypothesis will run your test 200 times.

The longer answer is “It’s complicated”. It will depend on the exact behaviour of your tests and the value of some settings. In this article I’ll try to clear up some of the specifics of which settings affect the answer and how.

Generating recursive data

Friday, August 19, 2016

intro python technical

Sometimes you want to generate data which is recursive. That is, in order to draw some data you may need to draw some more data from the same strategy. For example we might want to generate a tree structure, or arbitrary JSON.

Hypothesis has the recursive function in the hypothesis.strategies module to make this easier to do. This is an article about how to use it.

How do I use pytest fixtures with Hypothesis?

Tuesday, August 09, 2016

faq python technical

pytest is a great test runner, and is the one Hypothesis itself uses for testing (though Hypothesis works fine with other test runners too).

It has a fairly elaborate fixture system, and people are often unsure how that interacts with Hypothesis. In this article we’ll go over the details of how to use the two together.

What is Hypothesis?

Sunday, July 24, 2016

intro python

Hypothesis is a library designed to help you write what are called property-based tests.

The key idea of property based testing is that rather than writing a test that tests just a single scenario, you write tests that describe a range of scenarios and then let the computer explore the possibilities for you rather than having to hand-write every one yourself.

In order to contrast this with the sort of tests you might be used to, when talking about property-based testing we tend to describe the normal sort of testing as example-based testing.

Property-based testing can be significantly more powerful than example based testing, because it automates the most time consuming part of writing tests

coming up with the specific examples - and will usually perform it better than a human would. This allows you to focus on the parts that humans are better at - understanding the system, its range of acceptable behaviours, and how they might break.

You don’t need a library to do property-based testing. If you’ve ever written a test which generates some random data and uses it for testing, that’s a property-based test. But having a library can help you a lot, making your tests easier to write, more robust, and better at finding bugs. In the rest of this article we’ll see how.

3.4.2 Release of Hypothesis for Python

Wednesday, July 13, 2016

news non-technical python

This is a bug fix release, fixing a number of problems with the settings system:

Test functions defined using @given can now be called from other threads (Issue #337)
Attempting to delete a settings property would previously have silently done the wrong thing. Now it raises an AttributeError.
Creating a settings object with a custom database_file parameter was silently getting ignored and the default was being used instead. Now it’s not.

Notes

For historic reasons, _settings.py had been excluded from the requirement to have 100% branch coverage. Issue #337 would have been caught by a coverage requirement: the code in question simply couldn’t have worked, but it was not covered by any tests, so it slipped through.

As part of the general principle that bugs shouldn’t just be fixed without addressing the reason why the bug slipped through in the first place, I decided to impose the coverage requirements on _settings.py as well, which is how the other two bugs were found. Both of these had code that was never run during tests - in the case of the deletion bug there was a __delete__ descriptor method that was never being run, and in the case of the database_file one there was a check later that could never fire because the internal _database field was always being set in __init__.

I feel like this experiment thoroughly validated that 100% coverage is a useful thing to aim for. Unfortunately it also pointed out that the settings system is much more complicated than it needs to be. I’m unsure what to do about that - some of its functionality is a bit too baked into the public API to lightly change, and I’m don’t think it’s worth breaking that just to simplify the code.

Hypothesis for Python 3.4.1 Release

Saturday, July 09, 2016

news non-technical python

This is a bug fix release for a single bug:

On Windows when running two Hypothesis processes in parallel (e.g. using pytest-xdist) they could race with each other and one would raise an exception due to the non-atomic nature of file renaming on Windows and the fact that you can’t rename over an existing file. This is now fixed.

Notes

My tendency of doing immediate patch releases for bugs is unusual but generally seems to be appreciated. In this case this was a bug that was blocking a py.test merge.

I suspect this is not the last bug around atomic file creation on Windows. Cross platform atomic file creation seems to be a harder problem than I would have expected.

Calculating the mean of a list of numbers

Monday, July 04, 2016

intro properties python technical

Consider the following problem:

You have a list of floating point numbers. No nasty tricks - these aren’t NaN or Infinity, just normal “simple” floating point numbers.

Now: Calculate the mean (average). Can you do it?

It turns out this is a hard problem. It’s hard to get it even close to right. Lets see why.

Testing as a Complete Specification

Thursday, June 30, 2016

into properties python technical

Sometimes you’re lucky enough to have problems where the result is completely specified by a few simple properties.

This doesn’t necessarily correspond to them being easy! Many such problems are actually extremely fiddly to implement.

It does mean that they’re easy to test though. Lets see how.

Testing Configuration Parameters

Monday, June 13, 2016

properties python technical

A lot of applications end up growing a complex configuration system, with a large number of different knobs and dials you can turn to change behaviour. Some of these are just for performance tuning, some change operational concerns, some have other functions.

Testing these is tricky. As the number of parameters goes up, the number of possible configuration goes up exponentially. Manual testing of the different combinations quickly becomes completely unmanageable, not to mention extremely tedious.

Fortunately, this is somewhere where property-based testing in general and Hypothesis in particular can help a lot.

Evolving toward property-based testing with Hypothesis

Sunday, June 05, 2016

intro properties python technical

Many people are quite comfortable writing ordinary unit tests, but feel a bit confused when they start with property-based testing. This post shows how two ordinary programmers started with normal Python unit tests and nudged them incrementally toward property-based tests, gaining many advantages on the way.

Guest Posts Welcome

Sunday, May 29, 2016

news non-technical

I would like to see more posters on the hypothesis.works blog. I’m particularly interested in experience reports from people who use Hypothesis in the wild. Could that be you?

Testing Optimizers

Sunday, May 29, 2016

example properties python technical

We’ve previously looked into testing performance optimizations using Hypothesis, but this article is about something quite different: It’s about testing code that is designed to optimize a value. That is, you have some function and you want to find arguments to it that maximize (or minimize) its value.

As well as being an interesting subject in its own right, this will also nicely illustrate the use of Hypothesis’s data() functionality, which allows you to draw more data after the test has started, and will introduce a useful general property that can improve your testing in a much wider variety of settings.

Exploring Voting Systems with Hypothesis

Thursday, May 26, 2016

example python technical

Hypothesis is, of course, a library for writing tests.

But from an implementation point of view this is hardly noticeable. Really it’s a library for constructing and exploring data and using it to prove or disprove hypotheses about it. It then has a small testing library built on top of it.

It’s far more widely used as a testing library, and that’s really where the focus of its development lies, but with the find function you can use it just as well to explore your data interactively.

In this article we’ll go through an example of doing this, by using it to take a brief look at one of my other favourite subjects: Voting systems.

Announcing Hypothesis Legacy Support

Thursday, May 19, 2016

news python

For a brief period, Python 2.6 was supported in Hypothesis for Python. Because Python 2.6 has been end of lifed for some time, I decided this wasn’t a priority and support was dropped in Hypothesis 2.0.

I’ve now added it back, but under a more restrictive license.

If you want to use Hypothesis on Python 2.6, you can now do so by installing the hypothesislegacysupport package. This will allow you to run Hypothesis on Python 2.6.

Note that by default this is licensed under the GNU Affero General Public License 3.0. If you want to use it in commercial software you will likely want to buy a commercial license. Email us at [email protected] to discuss details.

What is Property Based Testing?

Saturday, May 14, 2016

non-technical philosophy

I get asked this a lot, and I write property based testing tools for a living, so you’d think I have a good answer to this, but historically I haven’t. This is my attempt to fix that.

Historically the definition of property based testing has been “The thing that QuickCheck does”. As a working definition this has served pretty well, but the problem is that it makes us unable to distinguish what the essential features of property-based testing are and what are just accidental features that appeared in the implementations that we’re used to.

As the author of a property based testing system which diverges quite a lot from QuickCheck, this troubles me more than it would most people, so I thought I’d set out some of my thoughts on what property based testing is and isn’t.

This isn’t intended to be definitive, and it will evolve over time as my thoughts do, but it should provide a useful grounding for further discussion.

Generating the right data

Wednesday, May 11, 2016

intro python technical

One thing that often causes people problems is figuring out how to generate the right data to fit their data model. You can start with just generating strings and integers, but eventually you want to be able to generate objects from your domain model. Hypothesis provides a lot of tools to help you build the data you want, but sometimes the choice can be a bit overwhelming.

Here’s a worked example to walk you through some of the details and help you get to grips with how to use them.

You Don't Need Referential Transparency

Monday, May 02, 2016

non-technical

It’s a common belief that in order for property based testing to be useful, your code must be referentially transparent. That is, it must be a pure function with no side effects that just takes input data and produces output data and is solely defined by what input data produces what output data.

This is, bluntly, complete and utter nonsense with no basis in reality.

Testing performance optimizations

Friday, April 29, 2016

intro properties python technical

Once you’ve flushed out the basic crashing bugs in your code, you’re going to want to look for more interesting things to test.

The next easiest thing to test is code where you know what the right answer is for every input.

Obviously in theory you think you know what the right answer is - you can just run the code. That’s not very helpful though, as that’s the answer you’re trying to verify.

But sometimes there is more than one way to get the right answer, and you choose the one you run in production not because it gives a different answer but because it gives the same answer faster.

Rule Based Stateful Testing

Tuesday, April 19, 2016

intro python technical

Hypothesis’s standard testing mechanisms are very good for testing things that can be considered direct functions of data. But supposed you have some complex stateful system or object that you want to test. How can you do that?

In this article we’ll see how to use Hypothesis’s rule based state machines to define tests that generate not just simple data, but entire programs using some stateful object. These will give the same level of boost to testing the behaviour of the object as you get to testing the data it accepts.

QuickCheck in Every Language

Saturday, April 16, 2016

alternatives technical

There are a lot of ports of QuickCheck, the original property based testing library, to a variety of different languages.

Some of them are good. Some of them are very good. Some of them are OK. Many are not.

I thought it would be worth keeping track of which are which, so I've put together a list.

The Purpose of Hypothesis

Saturday, April 16, 2016

non-technical principles writing-good-software

What is Hypothesis for?

From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

The Encode/Decode invariant

Saturday, April 16, 2016

intro properties python technical

One of the simplest types of invariant to find once you move past just fuzzing your code is asserting that two different operations should produce the same result, and one of the simplest instances of that is looking for encode/decode pairs. That is, you have some function that takes a value and encodes it as another value, and another that is supposed to reverse the process.

This is ripe for testing with Hypothesis because it has a natural completely defined specification: Encoding and then decoding should be exactly the same as doing nothing.

Lets look at a concrete example.

Anatomy of a Hypothesis Based Test

Saturday, April 16, 2016

details python technical

What happens when you run a test using Hypothesis? This article will help you understand.

The Economics of Software Correctness

Friday, April 15, 2016

non-technical writing-good-software

You have probably never written a significant piece of correct software.

That’s not a value judgement. It’s certainly not a criticism of your competence. I can say with almost complete confidence that every non-trivial piece of software I have written contains at least one bug. You might have written small libraries that are essentially bug free, but the chance that you have written a non-trivial bug free program is tantamount to zero.

I don’t even mean this in some pedantic academic sense. I’m talking about behaviour where if someone spotted it and pointed it out to you you would probably admit that it’s a bug. It might even be a bug that you cared about.

Why is this?

Getting started with Hypothesis

Friday, April 15, 2016

intro properties python technical

Hypothesis will speed up your testing process and improve your software quality, but when first starting out people often struggle to figure out exactly how to use it.

Until you’re used to thinking in this style of testing, it’s not always obvious what the invariants of your code actually are, and people get stuck trying to come up with interesting ones to test.

Fortunately, there’s a simple invariant which every piece of software should satisfy, and which can be remarkably powerful as a way to uncover surprisingly deep bugs in your software.