-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sources of nondeterminism in egglog #439
Conversation
@ezrosent, I ended up having to use indexmap in generic join, which might slow down egglog. Should we just merge it for now? |
Can we instead just use a build-time flag which users can use to opt in for determinism? |
That's a good idea, but perhaps determinism should be opt-out? |
Some thoughts:
|
Reverted the feature, so this should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@oflatt Update to main to get benchmarking! |
3f0507f
to
f10226f
Compare
CodSpeed Performance ReportMerging #439 will degrade performances by 14.5%Comparing Summary
Benchmarks breakdown
|
I imagine some of the changes on shorter running benchmarks might not be significant - FWIW I just opened a PR to reduce our benchmarks to just the longer more meaningful ones (#444). Otherwise, it seems like there is too much uncertainty introduced by the indeterminacy of the memory allocator. The slowdown in the |
We should try:
|
reference to ahash as fast hash map with ability to set seed https://docs.rs/ahash/latest/ahash/ |
|
f10226f
to
334f911
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still depend on rustc-hash
? We might as well use the same hasher for everything right?
Not sure why, but things are still non-deterministic with this change |
Since I updated the hash function in PR#445 Update hashbrown, I'd recommend you rebase onto main. My goal there was to update to the latest versions of the hashing dependencies. I didn't notice that egglog also uses rustc-hash, so that ought to be updated, since they recently did significant improvements improving both performance and hash strength. It's no longer the Firefox-derived hash function; its collision weaknesses, which were problematic for rustc, have been fixed. Now, rustc-hash is a mildly stripped down version of foldhash, making tradeoffs for usage in a compiler (I've written on the differences between them in my notes). It's probably only worth using one of the two. Update: fixed in PR#456 Update symbol_table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Thalia's comment
One thought: Another source of nondeterminism that can happen if you're running more than one instance of egglog in a single process is symbol_table. The numbers symbol_table hands out will not be deterministic (there are various process-global counters being incremented and the like. Different numbers will cause hashmap to store strings or symbols in different spots, hence iteration order will be different too. |
@thaliaarchi thanks for the suggestions @ezrosent hmm yeah, that sounds like it would do it. I assumed wrongly that symbol gen was deterministic. Sounds like I should make a "nondeterministic" feature flag for now, and we can revisit this question at a later date. The best option is probably to somehow sort the resulting serialized egraph, since that would work even with parallel egglog. |
c9727fc
to
798fe04
Compare
8593ad5
to
9f34210
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but please add a regression comment
Regression is caused by the swap from hashmap to index maps, which have a slower insertion and lookup speed. To recover the performance, use the nondeterministic flag. Looking to the future, we'd like to remove the dependency on symbol gen library and use the fixed seed solution. |
No description provided.