Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate pass_to_descendant and redistribute_block_pairs shrinker passes #3929

Merged
merged 10 commits into from
Mar 19, 2024

Conversation

tybug
Copy link
Member

@tybug tybug commented Mar 18, 2024

I think these are the last remaining directly translatable passes. The remaining two groups are "block programs" (roughly corresponding to new "node programs" with a single instruction X, no -) and "everything else".

@tybug tybug requested a review from Zac-HD as a code owner March 18, 2024 04:46
Comment on lines +1266 to +1267
if next_node.ir_type == "integer" and bits_to_bytes(
node.value.bit_length()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to keep this condition, but otherwise the test for this fails: the ir tree for @given(st.integers(), st.integers() looks like [integer {"max_value": 128}, integer, integer {"max_value": 128}, integer], where the 128 draws are for choosing a forced endpoint. Without checking for equal sizes, we try to redistribute the endpoint draw to the actual draw, which of course does nothing.

As noted in the inline comment, there are plenty of other ways this pass can get tripped up! But we can look at improving or removing it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm tempted to modify the forced-endpoint logic so that it doesn't do this, e.g. by using the weights= parameter instead of making two draws. Thoughts?

Copy link
Member Author

@tybug tybug Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, perfect! Let's do that. Having an ir-strategy correspond to two ir nodes felt bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, problem...we quickly run into our self-imposed width limit of 255 for weights. I imagine this was imposed because Sampler performance blows up otherwise. But we also definitely don't want to be constructing the 2**32 element list just to say "uniform everywhere except the endpoints", for memory reasons, among others (potentially float loss).

Maybe we need a better structure for expressing weights, which allows us to specify "upweight just a few elements from this very large range"? Potentially specifying start and end indices where the weighting should apply uniformly to that range, instead of one weight per element.

Copy link
Member

@Zac-HD Zac-HD Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that - new interface idea: weights= takes a mapping of {value: weight}, such that len <= 255 and 0 <= sum-of-weights <= 1. Then we use a Sampler to pick either a value, or the remaining probability mass which means "pick according to unweighted".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right path forward. But it's fiddly to do so while maintaining correct invariants about forced and children count. Would you object to leaving it for a future PR? I suspect we may have some back and forth on this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just write a note in #3921 and we can get to it later 👍

@tybug
Copy link
Member Author

tybug commented Mar 18, 2024

Interesting failures! CI succeeds again as a pseudo fuzzer 🙂. This failure can be addressed by making sure the second node we choose isn't forced. Will have to look at the other failure in more detail tomorrow.

Comment on lines +1266 to +1267
if next_node.ir_type == "integer" and bits_to_bytes(
node.value.bit_length()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm tempted to modify the forced-endpoint logic so that it doesn't do this, e.g. by using the weights= parameter instead of making two draws. Thoughts?

@tybug
Copy link
Member Author

tybug commented Mar 19, 2024

I can't reproduce the above test_finds_multiple_failures_in_generation flake locally (n=9_000). I think there's a good chance it was an unrelated rare flake, especially since the shrinker passes here look unrelated. But I've been bit by subtle relationships before.

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - let's merge this now, and revisit the integers refactoring later 👍

@Zac-HD Zac-HD enabled auto-merge March 19, 2024 04:40
@Zac-HD Zac-HD merged commit 501d2ba into HypothesisWorks:master Mar 19, 2024
51 checks passed
@tybug tybug deleted the shrinker-ir-descendant branch March 19, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants