Skip to content

Proptests

V0ldek edited this page Apr 6, 2023 · 4 revisions

Property testing is a fascinating subject, and our crate lends itself really well to proptests. We use the proptest crate for them.

What is property testing?

In informal terms, property testing is a more abstract way of testing, where instead of explicitly defining an input and the expected output, and then asserting actual == expected, we define rules for how input should look like and what properties the output for that input should satisfy. The testing framework then takes care of generating a wide coverage of the possible inputs.

More formally, we test an algorithm $f: I \rightarrow O$ by defining a set of possible inputs, $X \subseteq I$, and a property that we expect the output of our algorithm to have on inputs from $X$, $p: 2^O$. The framework then generates $k$ inputs, $x_1, \ldots, x_k \in X$, and for each asserts that $p(f(x_i))$ holds. Sort of a fuzzy model checking.

This is an extremely potent testing technique, that can fish out bugs that would be difficult to test for otherwise. To test against a given case classically the developer has to think of that case and create the input.

Example

First, read the proptest book up to a point where you get bored.

A good case study is the small_test proptest suite we have. SmallSet256 is a heavily optimised set that holds u8 values in two 128-bit wide bitmasks.

Let's consider this test:

#[test]
fn contains(btree_set in collection::btree_set(any_elem(), 0..=MAX_SET_SIZE)) {
    let vec: Vec<u8> = btree_set.iter().copied().collect();
    let slice: &[u8] = &vec;
    let small_set: SmallSet256 = slice.into();

    for elem in 0..=MAX_ELEM {
        assert_eq!(btree_set.contains(&elem), small_set.contains(elem));
    }
}

The any_elem function is a generation strategy that returns any u8 value. So in this case the set $X$ is all BTreeSet<u8> instances. The function we test here is the From<&[u8]> impl for SmallSet256. The property is that after calling into each value of u8 it is in the result set if and only if it was in the source.

Designing good proptests

While testing doesn't usually sound fun to developers, designing proptests is actually a great little challenge.

Let us take classification proptests as an example, located in classifier_correctness_test.rs. The goal here is to proptest the overall classifier pipeline. The domain of all the valid inputs here is the domain of all possible strings – the classifiers shouldn't care about correctness of the JSON, but we do require valid UTF-8.

Overwhelming domain

Designing a proptest for $X = \mathit{String}$ right off the bat would be very hard. We know nothing about the structure of an $x \in X$ and so what properties could we test apart from trivial ones? Say the classifier returns Opening(7); we can check that input[7] == b'{' or input[7] == b'[', but that's not enough – it could be within a string, and then it wouldn't have been correct to classify as an opening! So we'd need to see if it is delimited by double quotes, which is a very non-local property – we need to count how many quotes from the start there are; moreover quotes can be escaped with backslashes, which can themselves be escaped with backslashes...

We quickly run into an issue where we need an equivalent of $f$ to test $f$. Which is sometimes fine, because we have multiple implementations of the same function, e.g. the AVX2 and the nosimd classifiers. This kind of differential testing is also valuable, especially when testing independent implementations – if our engine returns the same answer as 10 different JSONPath engines written by different people, then the probability that they're all wrong is miniscule (we're basically employing Bayesian reasoning here). However, testing one rsonpath implementation with another rsonpath implementation is much more precarious – the prior probability of "we at rsonpath screwed the same thing twice" is much higher.

Technique 1. – simplifying the domain/iterative design

To feel less overwhelmed let's just simplify the domain. Instead of considering all the strings we'll exclude quotes and backslashes from our inputs completely. On this limited domain the family of properties "if the classifier returned Structural(idx), then input[idx] is a byte representing that structural" is enough to test correctness.

This has the obvious downside of not testing on the entire domain but is a start and is much easier than getting the entire proptest suite right the first time around. Next iterations of design can introduce more complex inputs to the domain, e.g. quotes but without escapes, and expand the suite.

Technique 2. – testing via isomorphism

We can decrease the prior probability of writing the same $f$ incorrectly twice by choosing an isomorphic domain for which an implementation is simpler. In other words, if we have $f: X \rightarrow O$, pick an isomorphism $h: X \rightarrow X'$ for which there exists a relatively simple $f': X' \rightarrow O$ such that $f'(x') = f(h^{-1}(x))$. Then we generate random inputs from $X'$, get the expected answer from $f'$ (which is by assumption hard to get wrong), and then run $f(h^{-1}(x))$ to compare the output.

As an example consider $f: \mathbf{N} \rightarrow \mathcal{P}(\mathbf{N})$ that is supposed to factorize a natural number. It's hard to write $f$, but if we take the isomorphism $h = f$ then $f'$ is simply identity. The inverse $h^{-1}$ is taking a product of the input numbers. So now our job is just to generate random inputs from $\mathcal{P}(\mathbf{N})$, multiply them together, and run $f$ on the result. We expect to get the same answer.

In our classifier example, instead of generating random strings, we generate random streams of tokens: either one of the structural characters, or "garbage", which is any string that doesn't contain any structural characters, quotes, or escapes. On such input there is an obvious $f'$ which simply ignores "garbage" – it's a trivial function that is hard to get wrong. Now $h^{-1}$ is simply stringification of the input. We therefore run the classifier on such stringified input and expect to get the same stream of structural tokens, without the garbage. The prior probability of getting our setup wrong is just the probability of $f'$ or $h^{-1}$ having a bug, but they are trivial, short functions that are easy to verify. With that we replaced the dependance on some other classifier, which takes hundreds of lines of relatively complex logic, so we can have much higher confidence in our test results.

Next steps

Now we can expand the suite by considering a larger domain. Using technique 2. we can add a token of "doubly-quoted string" that won't contain any quotes or backslashes inside, and is expected to be ignored. Such small iterative improvements can now be used to arrive at a comprehensive test suite. This is an open issue, #20, and a good starter for learning proptests!