-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate most shrinker functions to the ir #3962
Conversation
@@ -2292,12 +2327,16 @@ def _pop_ir_tree_node(self, ir_type: IRTypeName, kwargs: IRKWargsType) -> IRNode | |||
# (in fact, it is possible that giving up early here results in more time | |||
# for useful shrinks to run). | |||
if node.ir_type != ir_type: | |||
# needed for try_shrinking_nodes to see what node was *attempted* | |||
# to be drawn. | |||
self.invalid_at = (ir_type, kwargs) | |||
self.mark_invalid(f"(internal) want a {ir_type} but have a {node.ir_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've moved to a stronger typed representation in the ir, we do pretty badly on a class of problems: crossing typed boundaries. In particular shrinking one_of(A, B) where A has a different type/kwargs than B. Before, we would shrink the integer for one_of causing it to choose A instead of B. A interprets the bits differently than B, but we would get lucky most of the time and find an counterexample; and now that we're in A we do more robust shrinking. In the ir we're much more strict and mark_invalid as soon as we have a misalignment, so the initial coincidental slip from B to A never happens.
One thing that works here is drawing randomly instead of giving up when we hit a misalignment, which I would say is basically equivalent to master behavior.
Concretely:
# condition avoids trivial counterexamples
assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 0]) == 102
fails on this branch and passes on master (I'll add this as a test case as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idle thoughts: I wonder if a short "randomly generate ir trees smaller than our current value" phase when the shrinker would otherwise exit as stuck would get us out of many of these coincidental slips. Possibly not, since if we didn't have to exploit some fixed structure while shrinking, we likely would have found this counterexample during the generation phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven’t touched Hypothesis in a few years, so take this with a grain of salt, but I vaguely recall that having the shrinker introduce randomness into the data stream has tended to be a bad idea.
Normally, the shrinker tends to make its input both shorter and simpler over time, despite the fact that (999 random bytes) is technically “smaller” than (1000 zero bytes).
But if the shrinker starts reintroducing randomness, this falls apart, and the shrinker will get obsessed with some high-entropy input because it happens to be shorter the current low-entropy input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seconded on this. Specifically the problem that tends to result is that the performance becomes terrible, because you end up repeatedly reintroducing lots of expensive-to-shrink regions in order to make minor size gains.
My guess is that you could possibly fix this with some sort of low-entropy generation mode (e.g. always generate one of the two smallest values for a given type) but I'm not sure how robust that would be and it potentially still has the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That intuition makes sense to me. I think performance may still be ok if we only introduce randomness as a last-resort phase in the shrinker when we are otherwise at a fixpoint. iirc the shrinker used to have some dynamic enabling logic similar to this. Of course we're now complicating the shrink pass logic and have to justify the tradeoff in complexity as worthwhile. Alternatively low-entropy generation also seems reasonable.
It just feels quite bad to not be able to shrink st.integers() | st.floats()
because we happened to hit an error in the floats region first, and we are seeing a few regressions in the shrink quality test suite due to this (though how much that maps to reality is unclear to me).
# try shrinking from both sides towards shrink_towards | ||
shrink_towards = kwargs["shrink_towards"] | ||
Integer.shrink( | ||
abs(shrink_towards - value), | ||
lambda n: self.try_shrinking_nodes(nodes, shrink_towards + n), | ||
) | ||
Integer.shrink( | ||
abs(shrink_towards - value), | ||
lambda n: self.try_shrinking_nodes(nodes, shrink_towards - n), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty brute force and does ~double the shrinks necessary in the worst case.
I think master behavior for integer shrinking is something like "shrink towards 0 on the current sign (only positive or only negative) and occasionally try and flip the sign bit". I think master gets a bit luckier/smarter than this in practice, because a straight-line translation of that here shrinks minimal(st.integers(), lambda n: not (-1 < n < 2))
to 2 instead of -1; to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1.
A complicating factor here is shrinking behavior is different for bounded and unbounded integers:
assert minimal(st.integers(), lambda n: not (-1 < n < 2)) == -1
assert minimal(st.integers(-5, 5), lambda n: not (-1 < n < 2)) == 2
assert minimal(st.integers(max_value=5), lambda n: not (-1 < n < 2)) == -1
assert minimal(st.integers(min_value=-5), lambda n: not (-1 < n < 2)) == -1
which we can't change until we move the ordering to the ir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to find -1 it needs to pursue the worse shrink of 2 to -2 and then shrink -2 to -1
I suspect it's getting lucky with some sort of punning here, as the integers
strategy is a mixture under the hood. I don't think it can reliably do this strategy - there's no backtracking in the shrinker - but it might be doing something that causes previous bytes for a larger integer width to be reinterpreted as a negative integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good point. master fails on a trivial extension of this problem:
assert minimal(st.integers(101, 200) | st.integers(0, 100), lambda n: n not in [101, 102, 0]) == 103
so it's just getting lucky.
Can we get a secondary (log) y-axis on the plot showing proportional change, in addition to the absolute number? +800 calls is quite different on a base of 50 to 2,000! |
hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py
Outdated
Show resolved
Hide resolved
I've updated the graph to include an |
Fwiw this looks fine to me - the large-negative multipliers are relative to a small baseline, and the tails with large absolute changes aren't that large in relative terms. Might as well look into it, but don't stress too much + consider doing so as a follow-up later. |
agree it doesn't look hugely out of line, but I won't feel confident in this pull until I look into it, for my own sake more than anything 😄 I updated the benchmark for 50b2b13 + d81d50e. It's a bit misleading because we're basically getting a free constant shrink boost by disabling the dfa, so I would mentally downweight the improvements a bit. That said, it's looking quite a bit better. More investigation of where we do badly to come. |
Nice! You could get a buffer-without-dfas baseline if you wanted to, but I don't think this is really necessary - you're already measuring the change as we're going to ship it, and it's looking good. I guess in principle the downside might be some loss of shrink quality, but I don't expect anyone to notice that. |
this fixes a nasty flaky error that I spent many hours tracking down
@Zac-HD how important would you say the |
I'm OK with taking a regression on these, since I'm confident that we can fix it later and we're seeing a substantial win overall. Just make sure to track them in an issue, along with the other shrinker issues noted in comments / review below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the whole diff, and it looks good to me - I've tagged everywhere that I thought we might want to revisit later, but overall I'll be happy with minimal changes to get the tests passing.
And- this is a huge contribution. Thanks, Liam 😍
hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
ah good, now we get to the fun step of debugging newly flaky tests 🙂 |
looks like an actual failure: https://github.com/HypothesisWorks/hypothesis/actions/runs/9240572106/job/25421012169?pr=3962. We're shrinking to |
I pushed two fixes:
Updated graphs in description. Here's the cause of the above failure
We could potentially solve a large number of these occurrences by removing discarded portions of the buffer before caching. A more general approach which also handles "larger because drawing from different size buckets" is not caching interesting buffer datas at all and relying on the ir tree corresponding to the smallest buffer via reproducer@given(st.integers(min_value=0))
@settings(database=None)
def f(x):
assert x <= 100
for i in range(500):
print("-" * 50, i)
try:
f()
except Exception as e:
assert "x=101" in "".join(traceback.format_exception(e)) |
oops, I read the wrong number — shrinking is a mean of 2.12x faster, but a median of 1.38x faster. I fixed the release notes. I think the median is the more useful and realistic number to report, though getting big wins on the tail end is of course also important. |
I looked into the stateful regression and I'm confident we can improve it in the near future. The short of it is we'll need to support slips from the integer 0 to the boolean False, and realign pseudo choices. ie it should be allowed to draw min_value=0 max_value=0 from an ir tree with a node of value=2. I've completely skipped the problematic tests for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎊
part of #3921.
This is a big one, apologies! Hard to avoid; the remaining functions had cascading dependencies on each other.
I've benchmarked these changes by hooking
minimal
withderandomize=False
and 5 trials.latest (866256e)
old (a9f50c6)
old (d9a077d)
old (f7d0d29)
old (d81d50e)
old (901d636)
I think there are a few places we are still not optimal:
Integer.shrink
from the negative and positive direction (see review comment)sort_key
the buffers, which costs a call. This will be improved when we move our ordering function to the ir tree.Conversations of note: