In Intern::new compute the hash before acquiring the mutex #28

stepancheg · 2021-11-15T07:01:05Z

So there's smaller lock contention.

This also unlocks an option to do partitioning by hash, which should
reduce contention greatly.

So there's smaller lock contention. This also unlocks an option to do partitioning by hash, which should reduce contention greatly.

droundy · 2021-11-15T15:54:50Z

Just a check: can you confirm that you've run benchmarks and this makes a difference? How big a difference does it make, and under what kind of workload?

stepancheg · 2021-11-15T16:03:05Z

No I didn't do benchmark, sorry. It is not trivial to do testing with unreleased versions in our repo setup.

But there are more reasons to do this change. Another one is this: current implementation relies on Borrow. So it is possible to intern without allocation for example of String from str, but it is not possible to do interning of:

struct Pair(String, String)

from

struct PairRef(&str, &str)

because it is not possible to borrow Pair to PairRef. With switching away from HashSet to RawTable it is possible.

stepancheg · 2021-11-15T18:23:32Z

I reimplemented the library in our project with two PRs I submitted and more changes.

This is the implementation. https://gist.github.com/stepancheg/9d2ebac23b27ff8d21a1bcf494b5ac7c

The speedup in our program (not just internment) is about 1% (edit). However, the speedup is against version 0.4.0.

droundy · 2021-11-16T01:38:19Z

I'm wondering why https://docs.rs/hashbrown/0.11.2/hashbrown/hash_map/struct.HashMap.html#method.raw_entry_mut wouldn't have worked for this purpose? I don't like that the raw API describes itself as "unsafe and experimental". It seems like raw_entry_mut would give the advantages you describe. Am I missing something?

stepancheg · 2021-11-16T02:08:02Z

raw_entry_mut should probably work. I didn't know such function exists.

droundy · 2021-11-24T13:44:12Z

Okay I'm now thinking that this implementation could open users of internment up to hash collision attacks, since it uses a hash with DefaultHasher rather than the hasher for the HashMap, which means the RandomState is predictable and an attached who can generate values that are interned could generate many values with collisions. There are various use cases for internment that I could imagine where this attack path might be open. On the other hand, one might point out that since Intern leaks memory, it's not safe to let attackers generate unlimited Intern values in any case... Not sure if this matters, but I'm reluctant to sidestep the protections built into the std HashMap.

droundy · 2021-11-25T03:32:27Z

I've confirmed from the docs that DefaultHasher::new() generates a SipHasher with zero for the keys, which means we would be vulnerable to DOS attacks. Probably your shouldn't let potential attackers provide data for internment anyways, but I'm also not comfortable forgoing the standard protection provided by the standard library.

I'll think about how to store a RandomState outside a Mutex so we can compute hashes securely before taking the lock.

stepancheg · 2021-11-25T06:20:12Z

I'll think about how to store a RandomState outside a Mutex

static RANDOM_STATE: OnceCell<RandomState> = ... should not have noticeable performance overhead, no?

droundy · 2021-11-29T17:18:33Z

Probably, and that's probably how I'd do this. If you want to give it a shot it might happen sooner. David Roundy

…

On Wed, Nov 24, 2021, 10:20 PM Stepan Koltsov ***@***.***> wrote: I'll think about how to store a RandomState outside a Mutex static RANDOM_STATE: OnceCell<RandomState> = ... should not have noticeable performance overhead, no? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBSKPMG27D62EUC7ER4TLUNXISNANCNFSM5IA542FA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

droundy · 2021-11-29T17:22:57Z

BTW, the reason I'm picky about optimization changes is that I've been bitten in the past by "optimizations" which ended up introducing serious bugs (only discovered months later after they had corrupted many people's data), without even a demonstrated benefit. I doubt that's the case here (among other things, internment is unlikely to affect anyone's on-disk format), but want to ensure that I understand the trade-offs, and that we don't sacrifice my ability to fix any bugs either in this pull request, or that arise in the future.

In Intern::new compute the hash before acquiring the mutex

145813a

So there's smaller lock contention. This also unlocks an option to do partitioning by hash, which should reduce contention greatly.

stepancheg mentioned this pull request Nov 16, 2021

Use 32 per-type container lists instead of one #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Intern::new compute the hash before acquiring the mutex #28

In Intern::new compute the hash before acquiring the mutex #28

stepancheg commented Nov 15, 2021

droundy commented Nov 15, 2021

stepancheg commented Nov 15, 2021

stepancheg commented Nov 15, 2021 •

edited

Loading

droundy commented Nov 16, 2021

stepancheg commented Nov 16, 2021

droundy commented Nov 24, 2021

droundy commented Nov 25, 2021

stepancheg commented Nov 25, 2021

droundy commented Nov 29, 2021 via email

droundy commented Nov 29, 2021

In Intern::new compute the hash before acquiring the mutex #28

Are you sure you want to change the base?

In Intern::new compute the hash before acquiring the mutex #28

Conversation

stepancheg commented Nov 15, 2021

droundy commented Nov 15, 2021

stepancheg commented Nov 15, 2021

stepancheg commented Nov 15, 2021 • edited Loading

droundy commented Nov 16, 2021

stepancheg commented Nov 16, 2021

droundy commented Nov 24, 2021

droundy commented Nov 25, 2021

stepancheg commented Nov 25, 2021

droundy commented Nov 29, 2021 via email

droundy commented Nov 29, 2021

stepancheg commented Nov 15, 2021 •

edited

Loading