-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashMaps get slow over time #17851
Comments
Well this could be caused by chaining in the hash map. If too many entries have the same resulting hash value, collisions can slow down the hash table quite a bit. Which might be whats happening here. I'm not sure as to the specific internals of std.HashMap though |
If the issue really is because of old, now-unused tombstone entries, then I assume we should add a Can you check whether your use case / benchmark would be fixed by periodically recreating the hash map / moving its entries to a new instance? |
@SpexGuy would have the most insight on this |
If you make this change, which includes tombstones as reducing the available capacity, then it's super fast again: diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 40a412bf3..cfc104780 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -1410,7 +1410,7 @@ pub fn HashMapUnmanaged(
self.keys()[idx] = undefined;
self.values()[idx] = undefined;
self.size -= 1;
- self.available += 1;
+ // self.available += 1;
}
/// If there is an `Entry` with a matching key, it is deleted from See this result:
Applied to latest master
|
@mrjbq7 The relevant PR that introduced this behavior: #10337 |
Thanks for the context @rohlem , it's pretty unusable by default. |
This might be a better approach, it compares the capacity to non-empty buckets (meaning filled or deleted): diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 40a412bf3..d35b010b2 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -725,7 +725,7 @@ pub fn HashMapUnmanaged(
// execute when determining if the hashmap has enough capacity already.
/// Number of available slots before a grow is needed to satisfy the
/// `max_load_percentage`.
- available: Size = 0,
+ deleted: Size = 0,
// This is purely empirical and not a /very smart magic constant™/.
/// Capacity of the first grow when bootstrapping the hashmap.
@@ -927,14 +927,14 @@ pub fn HashMapUnmanaged(
if (self.metadata) |_| {
self.initMetadatas();
self.size = 0;
- self.available = @as(u32, @truncate((self.capacity() * max_load_percentage) / 100));
+ self.deleted = 0;
}
}
pub fn clearAndFree(self: *Self, allocator: Allocator) void {
self.deallocate(allocator);
self.size = 0;
- self.available = 0;
+ self.deleted = 0;
}
pub fn count(self: *const Self) Size {
@@ -1041,9 +1041,6 @@ pub fn HashMapUnmanaged(
metadata = self.metadata.? + idx;
}
- assert(self.available > 0);
- self.available -= 1;
-
const fingerprint = Metadata.takeFingerprint(hash);
metadata[0].fill(fingerprint);
self.keys()[idx] = key;
@@ -1113,7 +1110,7 @@ pub fn HashMapUnmanaged(
old_key.* = undefined;
old_val.* = undefined;
self.size -= 1;
- self.available += 1;
+ self.deleted += 1;
return result;
}
@@ -1360,9 +1357,9 @@ pub fn HashMapUnmanaged(
// Cheap try to lower probing lengths after deletions. Recycle a tombstone.
idx = first_tombstone_idx;
metadata = self.metadata.? + idx;
+ // We're using a slot previously a tombstone.
+ self.deleted -= 1;
}
- // We're using a slot previously free or a tombstone.
- self.available -= 1;
metadata[0].fill(fingerprint);
const new_key = &self.keys()[idx];
@@ -1410,7 +1407,7 @@ pub fn HashMapUnmanaged(
self.keys()[idx] = undefined;
self.values()[idx] = undefined;
self.size -= 1;
- self.available += 1;
+ self.deleted += 1;
}
/// If there is an `Entry` with a matching key, it is deleted from
@@ -1453,16 +1450,16 @@ pub fn HashMapUnmanaged(
@memset(@as([*]u8, @ptrCast(self.metadata.?))[0 .. @sizeOf(Metadata) * self.capacity()], 0);
}
- // This counts the number of occupied slots (not counting tombstones), which is
+ // This counts the number of occupied slots including tombstones, which is
// what has to stay under the max_load_percentage of capacity.
fn load(self: *const Self) Size {
const max_load = (self.capacity() * max_load_percentage) / 100;
- assert(max_load >= self.available);
- return @as(Size, @truncate(max_load - self.available));
+ assert(max_load >= self.size + self.deleted);
+ return @as(Size, @truncate(max_load - self.size - self.deleted));
}
fn growIfNeeded(self: *Self, allocator: Allocator, new_count: Size, ctx: Context) Allocator.Error!void {
- if (new_count > self.available) {
+ if (new_count > (self.size + self.deleted)) {
try self.grow(allocator, capacityForSize(self.load() + new_count), ctx);
}
}
@@ -1480,7 +1477,7 @@ pub fn HashMapUnmanaged(
const new_cap = capacityForSize(self.size);
try other.allocate(allocator, new_cap);
other.initMetadatas();
- other.available = @truncate((new_cap * max_load_percentage) / 100);
+ other.deleted = 0;
var i: Size = 0;
var metadata = self.metadata.?;
@@ -1515,7 +1512,7 @@ pub fn HashMapUnmanaged(
defer map.deinit(allocator);
try map.allocate(allocator, new_cap);
map.initMetadatas();
- map.available = @truncate((new_cap * max_load_percentage) / 100);
+ map.deleted = 0;
if (self.size != 0) {
const old_capacity = self.capacity();
@@ -1593,7 +1590,7 @@ pub fn HashMapUnmanaged(
allocator.free(slice);
self.metadata = null;
- self.available = 0;
+ self.deleted = 0;
}
/// This function is used in the debugger pretty formatters in tools/ to fetch the |
If you don't like the addition in |
I was looking at TigerBeetle's Zig Tracking issue tigerbeetle/tigerbeetle#1191 and they say:
Perhaps this is the reason it's so slow. |
Okay, here's a possible fix:
Here's a possible implementation of /// Rehash the map, in-place.
pub fn rehash(self: *Self, ctx: anytype) void {
const mask = self.capacity() - 1;
var metadata = self.metadata.?;
var keys_ptr = self.keys();
var values_ptr = self.values();
var curr: Size = 0;
while (curr < self.capacity()) {
if (!metadata[curr].isUsed()) {
if (!metadata[curr].isFree()) {
metadata[curr].fingerprint = Metadata.free;
assert(metadata[curr].isFree());
}
curr += 1;
continue;
}
var hash = ctx.hash(keys_ptr[curr]);
var fingerprint = Metadata.takeFingerprint(hash);
var idx = @as(usize, @truncate(hash & mask));
while (idx < curr and metadata[idx].isUsed()) {
idx += 1;
}
if (idx < curr) {
assert(!metadata[idx].isUsed());
metadata[idx].fingerprint = fingerprint;
metadata[idx].used = 1;
keys_ptr[idx] = keys_ptr[curr];
values_ptr[idx] = values_ptr[curr];
metadata[curr].fingerprint = Metadata.free;
metadata[curr].used = 0;
keys_ptr[curr] = undefined;
values_ptr[curr] = undefined;
curr += 1;
} else if (idx == curr) {
if (metadata[idx].fingerprint == Metadata.free) {
metadata[idx].fingerprint = fingerprint;
}
curr += 1;
} else {
while (metadata[idx].isUsed() and (idx <= curr or metadata[idx].fingerprint == Metadata.free)) {
idx = (idx + 1) & mask;
}
assert(idx != curr);
if (idx > curr and metadata[idx].isUsed()) {
var tmpfingerprint = metadata[idx].fingerprint;
var tmpkey = keys_ptr[idx];
var tmpvalue = values_ptr[idx];
metadata[idx].fingerprint = Metadata.free;
keys_ptr[idx] = keys_ptr[curr];
values_ptr[idx] = values_ptr[curr];
metadata[curr].fingerprint = tmpfingerprint;
keys_ptr[curr] = tmpkey;
values_ptr[curr] = tmpvalue;
} else {
assert(!metadata[idx].isUsed());
metadata[idx].fingerprint = fingerprint;
metadata[idx].used = 1;
keys_ptr[idx] = keys_ptr[curr];
values_ptr[idx] = values_ptr[curr];
metadata[curr].fingerprint = Metadata.free;
metadata[curr].used = 0;
keys_ptr[curr] = undefined;
values_ptr[curr] = undefined;
curr += 1;
}
}
}
self.available = @as(u32, @truncate((self.capacity() * max_load_percentage) / 100)) - self.size;
} Here's a test case for it: test "std.hash_map rehash" {
var map = AutoHashMap(u32, u32).init(std.testing.allocator);
defer map.deinit();
var prng = std.rand.DefaultPrng.init(0);
const random = prng.random();
const count = 6 * random.intRangeLessThan(u32, 10_000, 100_000);
var i: u32 = 0;
while (i < count) : (i += 1) {
try map.put(i, i);
if (i % 3 == 0) {
try expectEqual(map.remove(i), true);
}
}
map.rehash();
try expectEqual(map.count(), count * 2 / 3);
i = 0;
while (i < count) : (i += 1) {
if (i % 3 == 0) {
try expectEqual(map.get(i), null);
} else {
try expectEqual(map.get(i).?, i);
}
}
} This passes and it also fixes the original performance issue, making Could a maintainer give me some advice on how to move from this to a PR, and how to add things like the context type checking that is in |
|
If |
I've been reading over the history of this issue with interest, and it seems to me like it would be super nice if the Zig standard library's hash map didn't have this kind of performance degradation by default. #19923 is a great partial fix, but it requires the user to periodically call I'm wondering if it might be worth, as a long-term plan, switching over to an implementation that avoids this particular degradation of performance without placing additional requirements on the user. From @mrjbq7's comment above, it sounds like TigerBeetle was looking into Facebook's F14 hash map
but I haven't been able to find out if they've pursued that path further. Maybe the F14 hash map would be a good implementation to explore? I'm still reading about how it works and don't have a strong understanding of its advantages/disadvantages, so I'm hoping someone who does can chime in here. Other than that, I'm curious if anyone has thoughts on the best way to address this issue in the long run. |
I have found this benchmark https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/ |
I agree that long term it should have a different map that does not degrade performance like this. and when that is available, the rehash() PR becomes unnecessary. However, I think waiting to merge rehash() is not a great idea. It’s now been 6 months since this issue was reported and a workaround contributed and it’s not merged yet, and a new release was made with the same issue. If the long term is really far away, I think it does a disservice to users not to provide a possible solution. |
Yep, we're on the same page! I think your PR is a good fix in the short term, I didn't mean to suggest that it not get merged in the meantime. I do think this issue should be kept open until an alternative hash map is implemented, since tombstone-induced performance degradation is something that an alternative hash map should avoid regardless of its other performance characteristics. I've been looking over the ones rofrol posted:
and in the survey of different hashmaps, it looks like the benchmark here is the closest to the one that this issue is based off, if I'm understanding it correctly? |
See: ziglang/zig#17851 Users were noticing that frame render times got slower over time. I believe (thanks to community for pointing it out) that this is the culprit. This works around this issue by clearing and reinitializing the LRU after a certain number of evictions. When the Zig issue has a better resolution (either rehash() as a workaround or a better hash implementation overall) we can change this.
@mrjbq7 did good work which will be useful either way, I hope it gets merged soon. We can get the improved performance without requiring the user to The rehash window R is linear in the map's size; the Linear Probing With Tombstones paper derives a Θ-bound on the optimal factor in terms of the load factor, but since only a rather small range of values for the load factor make any sense in practice, practical concerns will completely dominate that and we can just provide an empirically good default rehash factor. |
My currently favored solution to this problem is to delete the non-array hash map implementation from the standard library. Automatically rehashing after every R insert/deletion operations leads to non-uniform performance characteristics. Needing to rehash manually is also undesirable. I think that makes this hash map implementation inferior to ArrayHashMap, where you additionally get the benefit of having keys and values sequentially in a well-defined order. The use case for a slightly more optimized hash map that does not need this property is rare enough - or so central to an application's core operation, as in the case of TigerBeetle - that it can be satisfied by a third party package, or should be part of the application itself, and have application-specific optimizations. |
I noticed that the HashMap iterator showed up prominently in Instruments when quickly resizing Ghostty. I think this is related to the [tombstone issue](ziglang/zig#17851), where the `next()` function has to skip unused meta-nodes. In that same issue, Andrew is suggesting that the non-array hashmap might get deleted from the standard library. After switching to `AutoArrayHashMapUnmanaged`, iteration barely shows up anymore. Deletion from the pin list should also be fast as swapRemove is used (order does not need to be preserved). Question is if insertion performance is negatively affected, though I'm not seeing anything obvious. Still, checking this PR for any perf regressions might be a good idea. If this pans out, there are more places where this switch might be beneficial.
Zig Version
0.11.0
Steps to Reproduce and Observed Behavior
I'm having some performance issues with HashMaps that seem to degrade quite a lot over time. This test case creates a map of 2 million items, then continues decrementing map values, removing an item every 3rd loop and inserting a new item to remain at 2 million items.
it gets slower and slower over time...
This issue is with both the C allocator and the page allocator.
This performance issue goes away with the ArrayHashMap.
On Discord, Protty gives this comment:
Expected Behavior
I would expect it to not degrade over time, and to perform roughly the same time per block like ArrayHashMap:
The text was updated successfully, but these errors were encountered: