Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMaps get slow over time #17851

Open
mrjbq7 opened this issue Nov 4, 2023 · 19 comments
Open

HashMaps get slow over time #17851

mrjbq7 opened this issue Nov 4, 2023 · 19 comments
Labels
bug Observed behavior contradicts documented or intended behavior optimization standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@mrjbq7
Copy link
Contributor

mrjbq7 commented Nov 4, 2023

Zig Version

0.11.0

Steps to Reproduce and Observed Behavior

I'm having some performance issues with HashMaps that seem to degrade quite a lot over time. This test case creates a map of 2 million items, then continues decrementing map values, removing an item every 3rd loop and inserting a new item to remain at 2 million items.

const std = @import("std");

pub fn main() !void {
    var map = std.AutoHashMap(u64, u64).init(std.heap.page_allocator);
    defer map.deinit();

    var list = std.ArrayList(u64).init(std.heap.page_allocator);
    defer list.deinit();

    // Insert initial items

    var start = std.time.milliTimestamp();
    var i: u64 = 0;
    while (i < 2_000_000) : (i += 1) {
        try map.put(i, 3);
        try list.append(i);
    }
    var end = std.time.milliTimestamp();

    std.debug.print("inserting {} took {} ms\n", .{ map.count(), end - start });

    // Loop, decrementing and inserting more

    var prng = std.rand.DefaultPrng.init(0);
    const random = prng.random();

    start = std.time.milliTimestamp();
    while (i < 250_000_000) : (i += 1) {
        var index = random.uintLessThan(usize, list.items.len);

        var j = list.items[index];

        var k = map.get(j).?;
        if (k == 1) {
            _ = map.remove(j);
            try map.put(i, 3);
            list.items[index] = i;
        } else {
            try map.put(j, k - 1);
        }

        if (i % 1_000_000 == 0) {
            end = std.time.milliTimestamp();
            std.debug.print("{} block took {} ms\n", .{ i, end - start });
            start = std.time.milliTimestamp();
        }
    }

    // Remove all the items

    while (list.items.len > 0) {
        var j = list.pop();
        _ = map.remove(j);
    }
}

it gets slower and slower over time...

➜  zig run -O ReleaseFast src/maptest.zig
inserting 2000000 took 121 ms
2000000 block took 0 ms
3000000 block took 59 ms
4000000 block took 59 ms
5000000 block took 65 ms
6000000 block took 69 ms
7000000 block took 70 ms
8000000 block took 75 ms
9000000 block took 76 ms
10000000 block took 78 ms
11000000 block took 79 ms
12000000 block took 81 ms
13000000 block took 85 ms
14000000 block took 86 ms
15000000 block took 100 ms
16000000 block took 91 ms
17000000 block took 92 ms
18000000 block took 95 ms
19000000 block took 99 ms
20000000 block took 102 ms
21000000 block took 105 ms
22000000 block took 110 ms
23000000 block took 114 ms
24000000 block took 123 ms
25000000 block took 128 ms
26000000 block took 132 ms
27000000 block took 140 ms
28000000 block took 151 ms
29000000 block took 162 ms
30000000 block took 173 ms
31000000 block took 190 ms
32000000 block took 207 ms
33000000 block took 222 ms
34000000 block took 244 ms
35000000 block took 271 ms
36000000 block took 300 ms
37000000 block took 338 ms
38000000 block took 369 ms
39000000 block took 417 ms
40000000 block took 469 ms
41000000 block took 525 ms
42000000 block took 593 ms
43000000 block took 663 ms
44000000 block took 749 ms
45000000 block took 853 ms
46000000 block took 966 ms
47000000 block took 1102 ms
48000000 block took 1293 ms
49000000 block took 1419 ms
50000000 block took 1600 ms
51000000 block took 1848 ms
52000000 block took 2253 ms
53000000 block took 2527 ms
54000000 block took 2918 ms
55000000 block took 3312 ms
56000000 block took 3867 ms
57000000 block took 4437 ms
58000000 block took 5236 ms
59000000 block took 6115 ms
60000000 block took 7031 ms
61000000 block took 8098 ms
62000000 block took 9338 ms
63000000 block took 10516 ms
64000000 block took 12541 ms
65000000 block took 14416 ms
66000000 block took 15888 ms
67000000 block took 18784 ms
...

This issue is with both the C allocator and the page allocator.

This performance issue goes away with the ArrayHashMap.

On Discord, Protty gives this comment:

We actually ran into this in TigerBeetle: zig's HashMap implements remove() by storing a tombstone entry. During lookup/insert, tombstones are skipped when looking for occupied or empty slots to prevent unrelated remove()s from breaking the probe path of existing entries.

Problem being that it doesn't rehash / shrink the table after enough remove()s so if you delete say 2/3 of the maps entries, subsequent lookups/inserts have to skip past 2/3 of the map worth of tombstones to get to their normal probe destination which increasingly reduces perf.

We've actually started looking into implementing facebooks F14 HashMap down the line. Its similar to Zig's in terms of insert/lookup but doesnt use tombstone values and instead cleans them up on remove() without rehashing

Expected Behavior

I would expect it to not degrade over time, and to perform roughly the same time per block like ArrayHashMap:

$ zig run -O ReleaseFast src/maptest.zig   
inserting 2000000 took 126 ms
2000000 block took 0 ms
3000000 block took 116 ms
4000000 block took 124 ms
5000000 block took 132 ms
6000000 block took 142 ms
7000000 block took 146 ms
8000000 block took 151 ms
9000000 block took 148 ms
10000000 block took 145 ms
11000000 block took 146 ms
12000000 block took 145 ms
13000000 block took 144 ms
14000000 block took 145 ms
15000000 block took 144 ms
16000000 block took 145 ms
17000000 block took 144 ms
18000000 block took 145 ms
19000000 block took 145 ms
20000000 block took 144 ms
21000000 block took 144 ms
22000000 block took 145 ms
23000000 block took 145 ms
24000000 block took 144 ms
25000000 block took 144 ms
26000000 block took 146 ms
27000000 block took 146 ms
28000000 block took 145 ms
29000000 block took 143 ms
30000000 block took 145 ms
31000000 block took 143 ms
32000000 block took 144 ms
33000000 block took 145 ms
34000000 block took 148 ms
35000000 block took 145 ms
36000000 block took 144 ms
37000000 block took 144 ms
38000000 block took 145 ms
39000000 block took 149 ms
40000000 block took 148 ms
41000000 block took 145 ms
42000000 block took 144 ms
43000000 block took 144 ms
44000000 block took 152 ms
...
@mrjbq7 mrjbq7 added the bug Observed behavior contradicts documented or intended behavior label Nov 4, 2023
@VisenDev
Copy link

VisenDev commented Nov 4, 2023

Well this could be caused by chaining in the hash map. If too many entries have the same resulting hash value, collisions can slow down the hash table quite a bit. Which might be whats happening here. I'm not sure as to the specific internals of std.HashMap though

@rohlem
Copy link
Contributor

rohlem commented Nov 4, 2023

If the issue really is because of old, now-unused tombstone entries, then I assume we should add a shrink/cleanup function to remove those entries and shrink the map. (I haven't found any such a method currently.)
For now creating a new hash map and moving the entries over should have the same effect.

Can you check whether your use case / benchmark would be fixed by periodically recreating the hash map / moving its entries to a new instance?
I assume a shrink function can be implemented more efficiently than moving all entries, but if the general performance degradation remains then the issue would have to be with some other aspect of your usage scenario.

@nektro
Copy link
Contributor

nektro commented Nov 4, 2023

@SpexGuy would have the most insight on this

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 4, 2023

If you make this change, which includes tombstones as reducing the available capacity, then it's super fast again:

diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 40a412bf3..cfc104780 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -1410,7 +1410,7 @@ pub fn HashMapUnmanaged(
             self.keys()[idx] = undefined;
             self.values()[idx] = undefined;
             self.size -= 1;
-            self.available += 1;
+            // self.available += 1;
         }
 
         /// If there is an `Entry` with a matching key, it is deleted from

See this result:

➜  build git:(master) ✗ ./stage3/bin/zig run -O ReleaseFast ../maptest.zig
inserting 2000000 took 106 ms
2000000 block took 0 ms
3000000 block took 56 ms
4000000 block took 58 ms
5000000 block took 63 ms
6000000 block took 68 ms
7000000 block took 71 ms
8000000 block took 74 ms
9000000 block took 106 ms
10000000 block took 70 ms
11000000 block took 72 ms
12000000 block took 77 ms
13000000 block took 74 ms
14000000 block took 73 ms
15000000 block took 74 ms
16000000 block took 74 ms
17000000 block took 76 ms
18000000 block took 75 ms
19000000 block took 76 ms
20000000 block took 77 ms
21000000 block took 77 ms
22000000 block took 77 ms
23000000 block took 119 ms
24000000 block took 71 ms
25000000 block took 71 ms
26000000 block took 71 ms
27000000 block took 72 ms
28000000 block took 72 ms
29000000 block took 72 ms
30000000 block took 72 ms
31000000 block took 72 ms
32000000 block took 73 ms
33000000 block took 73 ms
34000000 block took 72 ms
35000000 block took 73 ms
36000000 block took 73 ms
37000000 block took 73 ms
38000000 block took 73 ms
39000000 block took 74 ms
40000000 block took 73 ms
41000000 block took 74 ms
42000000 block took 73 ms
43000000 block took 74 ms
44000000 block took 75 ms
45000000 block took 74 ms
46000000 block took 75 ms
47000000 block took 74 ms
48000000 block took 76 ms
49000000 block took 76 ms
50000000 block took 76 ms
51000000 block took 75 ms
52000000 block took 76 ms
53000000 block took 77 ms
54000000 block took 77 ms
55000000 block took 77 ms
56000000 block took 78 ms
57000000 block took 146 ms
58000000 block took 77 ms
59000000 block took 77 ms
60000000 block took 78 ms
61000000 block took 77 ms
62000000 block took 78 ms
63000000 block took 78 ms
64000000 block took 79 ms
65000000 block took 87 ms
66000000 block took 85 ms
67000000 block took 82 ms
68000000 block took 82 ms
69000000 block took 81 ms
70000000 block took 80 ms
71000000 block took 80 ms
72000000 block took 82 ms
73000000 block took 87 ms
74000000 block took 83 ms
75000000 block took 82 ms
76000000 block took 81 ms
77000000 block took 83 ms
78000000 block took 83 ms
79000000 block took 87 ms
80000000 block took 84 ms
81000000 block took 84 ms
82000000 block took 87 ms
83000000 block took 84 ms
84000000 block took 84 ms
85000000 block took 82 ms
86000000 block took 83 ms
87000000 block took 87 ms
88000000 block took 85 ms
89000000 block took 83 ms
90000000 block took 88 ms
91000000 block took 95 ms
92000000 block took 96 ms
93000000 block took 92 ms
94000000 block took 91 ms
95000000 block took 91 ms
96000000 block took 90 ms
97000000 block took 86 ms
98000000 block took 89 ms
99000000 block took 87 ms
100000000 block took 85 ms
...

Applied to latest master

➜  build git:(master) ✗ ./stage3/bin/zig version
0.12.0-dev.1396+f6de3ec96

@rohlem
Copy link
Contributor

rohlem commented Nov 4, 2023

@mrjbq7 The relevant PR that introduced this behavior: #10337
(There's since been a073211 , which afaict only moved code around.)
Based on the discussion back then it seemed to be a space vs performance trade-off.
As I suggested in #10337 (comment) , we could introduce a comptime parameter to the type for controlling whether or not to count tombstones as being available, so the user can pick which metric they value more.

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 4, 2023

Thanks for the context @rohlem , it's pretty unusable by default.

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 4, 2023

This might be a better approach, it compares the capacity to non-empty buckets (meaning filled or deleted):

diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 40a412bf3..d35b010b2 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -725,7 +725,7 @@ pub fn HashMapUnmanaged(
         // execute when determining if the hashmap has enough capacity already.
         /// Number of available slots before a grow is needed to satisfy the
         /// `max_load_percentage`.
-        available: Size = 0,
+        deleted: Size = 0,
 
         // This is purely empirical and not a /very smart magic constant™/.
         /// Capacity of the first grow when bootstrapping the hashmap.
@@ -927,14 +927,14 @@ pub fn HashMapUnmanaged(
             if (self.metadata) |_| {
                 self.initMetadatas();
                 self.size = 0;
-                self.available = @as(u32, @truncate((self.capacity() * max_load_percentage) / 100));
+                self.deleted = 0;
             }
         }
 
         pub fn clearAndFree(self: *Self, allocator: Allocator) void {
             self.deallocate(allocator);
             self.size = 0;
-            self.available = 0;
+            self.deleted = 0;
         }
 
         pub fn count(self: *const Self) Size {
@@ -1041,9 +1041,6 @@ pub fn HashMapUnmanaged(
                 metadata = self.metadata.? + idx;
             }
 
-            assert(self.available > 0);
-            self.available -= 1;
-
             const fingerprint = Metadata.takeFingerprint(hash);
             metadata[0].fill(fingerprint);
             self.keys()[idx] = key;
@@ -1113,7 +1110,7 @@ pub fn HashMapUnmanaged(
                 old_key.* = undefined;
                 old_val.* = undefined;
                 self.size -= 1;
-                self.available += 1;
+                self.deleted += 1;
                 return result;
             }
 
@@ -1360,9 +1357,9 @@ pub fn HashMapUnmanaged(
                 // Cheap try to lower probing lengths after deletions. Recycle a tombstone.
                 idx = first_tombstone_idx;
                 metadata = self.metadata.? + idx;
+                // We're using a slot previously a tombstone.
+                self.deleted -= 1;
             }
-            // We're using a slot previously free or a tombstone.
-            self.available -= 1;
 
             metadata[0].fill(fingerprint);
             const new_key = &self.keys()[idx];
@@ -1410,7 +1407,7 @@ pub fn HashMapUnmanaged(
             self.keys()[idx] = undefined;
             self.values()[idx] = undefined;
             self.size -= 1;
-            self.available += 1;
+            self.deleted += 1;
         }
 
         /// If there is an `Entry` with a matching key, it is deleted from
@@ -1453,16 +1450,16 @@ pub fn HashMapUnmanaged(
             @memset(@as([*]u8, @ptrCast(self.metadata.?))[0 .. @sizeOf(Metadata) * self.capacity()], 0);
         }
 
-        // This counts the number of occupied slots (not counting tombstones), which is
+        // This counts the number of occupied slots including tombstones, which is
         // what has to stay under the max_load_percentage of capacity.
         fn load(self: *const Self) Size {
             const max_load = (self.capacity() * max_load_percentage) / 100;
-            assert(max_load >= self.available);
-            return @as(Size, @truncate(max_load - self.available));
+            assert(max_load >= self.size + self.deleted);
+            return @as(Size, @truncate(max_load - self.size - self.deleted));
         }
 
         fn growIfNeeded(self: *Self, allocator: Allocator, new_count: Size, ctx: Context) Allocator.Error!void {
-            if (new_count > self.available) {
+            if (new_count > (self.size + self.deleted)) {
                 try self.grow(allocator, capacityForSize(self.load() + new_count), ctx);
             }
         }
@@ -1480,7 +1477,7 @@ pub fn HashMapUnmanaged(
             const new_cap = capacityForSize(self.size);
             try other.allocate(allocator, new_cap);
             other.initMetadatas();
-            other.available = @truncate((new_cap * max_load_percentage) / 100);
+            other.deleted = 0;
 
             var i: Size = 0;
             var metadata = self.metadata.?;
@@ -1515,7 +1512,7 @@ pub fn HashMapUnmanaged(
             defer map.deinit(allocator);
             try map.allocate(allocator, new_cap);
             map.initMetadatas();
-            map.available = @truncate((new_cap * max_load_percentage) / 100);
+            map.deleted = 0;
 
             if (self.size != 0) {
                 const old_capacity = self.capacity();
@@ -1593,7 +1590,7 @@ pub fn HashMapUnmanaged(
             allocator.free(slice);
 
             self.metadata = null;
-            self.available = 0;
+            self.deleted = 0;
         }
 
         /// This function is used in the debugger pretty formatters in tools/ to fetch the

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 4, 2023

If you don't like the addition in growIfNeeded you could change available to be filled and make accumulate (size + deleted).

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 4, 2023

In case it wasn't obvious from the issue text, this is the current HashMap behavior:

Screenshot 2023-11-04 at 4 24 08 PM

I can't imagine TigerBeetle wants this as default either.

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 5, 2023

I was looking at TigerBeetle's Zig Tracking issue tigerbeetle/tigerbeetle#1191 and they say:

Stdlib performance. std.HashMap and std.sort are our main perf bottlenecks at the moment, improving their performance directly improves our transactions-per-second and p100.

Perhaps this is the reason it's so slow.

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 6, 2023

Okay, here's a possible fix:

  1. implement HashMapUnmanaged.rehash, which solves the original performance issue when called every loop of 1 million.
  2. change the code to track filled instead of available which will be size + deleted
  3. when filled reaches max load then rehash()
  4. this means it will rarely but sometimes take a bit longer to put() when it has to rehash, but will not have the performance degradation of the original issue above.

Here's a possible implementation of rehash:

        /// Rehash the map, in-place.
        pub fn rehash(self: *Self, ctx: anytype) void {
            const mask = self.capacity() - 1;

            var metadata = self.metadata.?;
            var keys_ptr = self.keys();
            var values_ptr = self.values();
            var curr: Size = 0;

            while (curr < self.capacity()) {
                if (!metadata[curr].isUsed()) {
                    if (!metadata[curr].isFree()) {
                        metadata[curr].fingerprint = Metadata.free;
                        assert(metadata[curr].isFree());
                    }

                    curr += 1;
                    continue;
                }

                var hash = ctx.hash(keys_ptr[curr]);
                var fingerprint = Metadata.takeFingerprint(hash);
                var idx = @as(usize, @truncate(hash & mask));

                while (idx < curr and metadata[idx].isUsed()) {
                    idx += 1;
                }

                if (idx < curr) {
                    assert(!metadata[idx].isUsed());
                    metadata[idx].fingerprint = fingerprint;
                    metadata[idx].used = 1;
                    keys_ptr[idx] = keys_ptr[curr];
                    values_ptr[idx] = values_ptr[curr];

                    metadata[curr].fingerprint = Metadata.free;
                    metadata[curr].used = 0;
                    keys_ptr[curr] = undefined;
                    values_ptr[curr] = undefined;

                    curr += 1;
                } else if (idx == curr) {
                    if (metadata[idx].fingerprint == Metadata.free) {
                        metadata[idx].fingerprint = fingerprint;
                    }

                    curr += 1;
                } else {
                    while (metadata[idx].isUsed() and (idx <= curr or metadata[idx].fingerprint == Metadata.free)) {
                        idx = (idx + 1) & mask;
                    }
                    assert(idx != curr);

                    if (idx > curr and metadata[idx].isUsed()) {
                        var tmpfingerprint = metadata[idx].fingerprint;
                        var tmpkey = keys_ptr[idx];
                        var tmpvalue = values_ptr[idx];

                        metadata[idx].fingerprint = Metadata.free;
                        keys_ptr[idx] = keys_ptr[curr];
                        values_ptr[idx] = values_ptr[curr];

                        metadata[curr].fingerprint = tmpfingerprint;
                        keys_ptr[curr] = tmpkey;
                        values_ptr[curr] = tmpvalue;
                    } else {
                        assert(!metadata[idx].isUsed());
                        metadata[idx].fingerprint = fingerprint;
                        metadata[idx].used = 1;
                        keys_ptr[idx] = keys_ptr[curr];
                        values_ptr[idx] = values_ptr[curr];

                        metadata[curr].fingerprint = Metadata.free;
                        metadata[curr].used = 0;
                        keys_ptr[curr] = undefined;
                        values_ptr[curr] = undefined;

                        curr += 1;
                    }
                }
            }

            self.available = @as(u32, @truncate((self.capacity() * max_load_percentage) / 100)) - self.size;
        }

Here's a test case for it:

test "std.hash_map rehash" {
    var map = AutoHashMap(u32, u32).init(std.testing.allocator);
    defer map.deinit();

    var prng = std.rand.DefaultPrng.init(0);
    const random = prng.random();

    const count = 6 * random.intRangeLessThan(u32, 10_000, 100_000);

    var i: u32 = 0;
    while (i < count) : (i += 1) {
        try map.put(i, i);
        if (i % 3 == 0) {
            try expectEqual(map.remove(i), true);
        }
    }

    map.rehash();

    try expectEqual(map.count(), count * 2 / 3);

    i = 0;
    while (i < count) : (i += 1) {
        if (i % 3 == 0) {
            try expectEqual(map.get(i), null);
        } else {
            try expectEqual(map.get(i).?, i);
        }
    }
}

This passes and it also fixes the original performance issue, making std.HashMap twice as fast as std.ArrayHashMap in the original benchmark.

Could a maintainer give me some advice on how to move from this to a PR, and how to add things like the context type checking that is in getOrPutAssumeCapacityAdapted and maybe needs to be in rehash also?

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 6, 2023

➜  build git:(master) ✗ ./stage3/bin/zig run -O ReleaseFast ../maptest.zig --zig-lib-dir ~/Projects/zig/lib
inserting 2000000 took 102 ms
2000000 block took 0 ms
3000000 block took 58 ms
4000000 block took 59 ms
5000000 block took 63 ms
6000000 block took 66 ms
7000000 block took 67 ms
8000000 block took 70 ms
9000000 block took 72 ms
10000000 block took 70 ms
11000000 block took 70 ms
12000000 block took 69 ms
13000000 block took 69 ms
14000000 block took 73 ms
15000000 block took 70 ms
16000000 block took 68 ms
17000000 block took 69 ms
18000000 block took 69 ms
19000000 block took 72 ms
20000000 block took 72 ms
21000000 block took 69 ms
22000000 block took 70 ms
23000000 block took 69 ms
24000000 block took 69 ms
25000000 block took 70 ms
26000000 block took 68 ms
27000000 block took 69 ms
28000000 block took 69 ms
29000000 block took 69 ms
30000000 block took 72 ms
31000000 block took 71 ms
32000000 block took 71 ms
33000000 block took 69 ms
34000000 block took 70 ms
35000000 block took 69 ms
36000000 block took 69 ms
37000000 block took 69 ms
38000000 block took 68 ms
39000000 block took 69 ms
40000000 block took 69 ms
41000000 block took 72 ms
42000000 block took 69 ms
43000000 block took 69 ms
44000000 block took 70 ms
45000000 block took 69 ms
46000000 block took 69 ms
47000000 block took 69 ms
48000000 block took 70 ms
49000000 block took 69 ms
50000000 block took 69 ms
51000000 block took 71 ms
52000000 block took 71 ms
53000000 block took 70 ms
54000000 block took 69 ms
55000000 block took 69 ms
56000000 block took 69 ms
57000000 block took 69 ms
58000000 block took 69 ms
59000000 block took 69 ms
60000000 block took 69 ms
61000000 block took 69 ms
62000000 block took 72 ms
63000000 block took 69 ms
64000000 block took 70 ms
65000000 block took 70 ms
66000000 block took 70 ms
67000000 block took 70 ms
68000000 block took 72 ms
69000000 block took 71 ms
70000000 block took 70 ms
71000000 block took 71 ms
72000000 block took 71 ms
73000000 block took 73 ms
74000000 block took 69 ms
75000000 block took 69 ms
76000000 block took 69 ms
77000000 block took 69 ms
78000000 block took 70 ms
79000000 block took 70 ms
80000000 block took 70 ms
81000000 block took 69 ms
82000000 block took 69 ms
83000000 block took 69 ms
84000000 block took 73 ms
85000000 block took 70 ms
86000000 block took 71 ms
87000000 block took 70 ms
88000000 block took 70 ms
89000000 block took 70 ms
90000000 block took 70 ms
91000000 block took 69 ms
92000000 block took 70 ms
93000000 block took 73 ms
94000000 block took 72 ms
95000000 block took 72 ms
96000000 block took 70 ms
97000000 block took 70 ms
98000000 block took 70 ms
99000000 block took 80 ms
100000000 block took 85 ms
101000000 block took 72 ms
102000000 block took 77 ms
103000000 block took 72 ms
104000000 block took 72 ms
105000000 block took 79 ms
106000000 block took 73 ms
107000000 block took 71 ms
108000000 block took 71 ms
109000000 block took 72 ms
110000000 block took 72 ms
111000000 block took 86 ms
112000000 block took 71 ms
113000000 block took 70 ms
114000000 block took 71 ms
115000000 block took 78 ms
116000000 block took 78 ms
117000000 block took 78 ms
118000000 block took 74 ms
119000000 block took 71 ms
120000000 block took 76 ms
121000000 block took 78 ms
122000000 block took 71 ms
123000000 block took 71 ms
124000000 block took 71 ms
125000000 block took 75 ms
126000000 block took 71 ms
127000000 block took 74 ms
128000000 block took 72 ms
129000000 block took 80 ms
130000000 block took 83 ms
131000000 block took 74 ms
132000000 block took 72 ms
133000000 block took 74 ms
134000000 block took 75 ms
135000000 block took 73 ms
136000000 block took 74 ms
137000000 block took 74 ms
138000000 block took 73 ms
139000000 block took 70 ms
140000000 block took 69 ms
141000000 block took 70 ms
142000000 block took 69 ms
143000000 block took 69 ms
144000000 block took 69 ms
145000000 block took 69 ms
146000000 block took 73 ms
147000000 block took 69 ms
148000000 block took 74 ms
149000000 block took 70 ms
150000000 block took 72 ms
151000000 block took 69 ms
152000000 block took 72 ms
153000000 block took 69 ms
154000000 block took 70 ms
155000000 block took 70 ms
156000000 block took 71 ms
157000000 block took 70 ms
158000000 block took 69 ms
159000000 block took 69 ms
160000000 block took 72 ms
161000000 block took 71 ms
162000000 block took 69 ms
163000000 block took 75 ms
164000000 block took 73 ms
165000000 block took 71 ms
166000000 block took 69 ms
167000000 block took 73 ms
168000000 block took 70 ms
169000000 block took 69 ms
170000000 block took 69 ms
171000000 block took 69 ms
172000000 block took 71 ms
173000000 block took 69 ms
174000000 block took 69 ms
175000000 block took 75 ms
176000000 block took 69 ms
177000000 block took 70 ms
178000000 block took 72 ms
179000000 block took 71 ms
180000000 block took 73 ms
181000000 block took 72 ms
182000000 block took 69 ms
183000000 block took 70 ms
184000000 block took 69 ms
185000000 block took 68 ms
186000000 block took 69 ms
187000000 block took 69 ms
188000000 block took 70 ms
189000000 block took 72 ms
190000000 block took 69 ms
191000000 block took 71 ms
192000000 block took 72 ms
193000000 block took 69 ms
194000000 block took 69 ms
195000000 block took 69 ms
196000000 block took 69 ms
197000000 block took 69 ms
198000000 block took 77 ms
199000000 block took 76 ms
200000000 block took 73 ms
201000000 block took 71 ms
202000000 block took 71 ms
203000000 block took 73 ms
204000000 block took 73 ms
205000000 block took 72 ms
206000000 block took 72 ms
207000000 block took 68 ms
208000000 block took 70 ms
209000000 block took 72 ms
210000000 block took 73 ms
211000000 block took 69 ms
212000000 block took 70 ms
213000000 block took 71 ms
214000000 block took 69 ms
215000000 block took 70 ms
216000000 block took 82 ms
217000000 block took 72 ms
218000000 block took 69 ms
219000000 block took 72 ms
220000000 block took 75 ms
221000000 block took 72 ms
222000000 block took 68 ms
223000000 block took 69 ms
224000000 block took 72 ms
225000000 block took 69 ms
226000000 block took 72 ms
227000000 block took 69 ms
228000000 block took 72 ms
229000000 block took 70 ms
230000000 block took 69 ms
231000000 block took 72 ms
232000000 block took 72 ms
233000000 block took 69 ms
234000000 block took 69 ms
235000000 block took 74 ms
236000000 block took 75 ms
237000000 block took 76 ms
238000000 block took 76 ms
239000000 block took 77 ms
240000000 block took 76 ms
241000000 block took 78 ms
242000000 block took 77 ms
243000000 block took 76 ms
244000000 block took 76 ms
245000000 block took 77 ms
246000000 block took 77 ms
247000000 block took 76 ms
248000000 block took 76 ms
249000000 block took 75 ms

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented Nov 6, 2023

If rehash is not public, but is called from getOrPutAssumeCapacityAdapted then maybe we don't need the const typechecks, which would be redundant to the ones done earlier?

@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Nov 7, 2023
@andrewrk andrewrk added this to the 0.13.0 milestone Nov 7, 2023
@SilasLock
Copy link

I've been reading over the history of this issue with interest, and it seems to me like it would be super nice if the Zig standard library's hash map didn't have this kind of performance degradation by default.

#19923 is a great partial fix, but it requires the user to periodically call rehash(). It's a non-trivial task for a user to realize that they're encountering performance problems with a hash map in the first place, and then to further realize those performance problems are caused by tombstones gunking up the process for each insert/lookup, and then to further realize that they need to call rehash() to fix the problem. That's a lot to ask of users of the standard library, and I'd be concerned that many people might miss one of those steps.

I'm wondering if it might be worth, as a long-term plan, switching over to an implementation that avoids this particular degradation of performance without placing additional requirements on the user. From @mrjbq7's comment above, it sounds like TigerBeetle was looking into Facebook's F14 hash map

We've actually started looking into implementing facebooks F14 HashMap down the line. Its similar to Zig's in terms of insert/lookup but doesnt use tombstone values and instead cleans them up on remove() without rehashing

but I haven't been able to find out if they've pursued that path further. Maybe the F14 hash map would be a good implementation to explore? I'm still reading about how it works and don't have a strong understanding of its advantages/disadvantages, so I'm hoping someone who does can chime in here.

Other than that, I'm curious if anyone has thoughts on the best way to address this issue in the long run.

@rofrol
Copy link
Contributor

rofrol commented May 12, 2024

@mrjbq7
Copy link
Contributor Author

mrjbq7 commented May 12, 2024

I agree that long term it should have a different map that does not degrade performance like this. and when that is available, the rehash() PR becomes unnecessary.

However, I think waiting to merge rehash() is not a great idea. It’s now been 6 months since this issue was reported and a workaround contributed and it’s not merged yet, and a new release was made with the same issue. If the long term is really far away, I think it does a disservice to users not to provide a possible solution.

@SilasLock
Copy link

However, I think waiting to merge rehash() is not a great idea. It’s now been 6 months since this issue was reported and a workaround contributed and it’s not merged yet, and a new release was made with the same issue. If the long term is really far away, I think it does a disservice to users not to provide a possible solution.

Yep, we're on the same page! I think your PR is a good fix in the short term, I didn't mean to suggest that it not get merged in the meantime.

I do think this issue should be kept open until an alternative hash map is implemented, since tombstone-induced performance degradation is something that an alternative hash map should avoid regardless of its other performance characteristics. I've been looking over the ones rofrol posted:

I have found this benchmark https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/

and in the survey of different hashmaps, it looks like the benchmark here is the closest to the one that this issue is based off, if I'm understanding it correctly?

mitchellh added a commit to ghostty-org/ghostty that referenced this issue May 18, 2024
See: ziglang/zig#17851

Users were noticing that frame render times got slower over time. I
believe (thanks to community for pointing it out) that this is the
culprit.

This works around this issue by clearing and reinitializing the LRU
after a certain number of evictions. When the Zig issue has a better
resolution (either rehash() as a workaround or a better hash
implementation overall) we can change this.
@tau-dev
Copy link
Contributor

tau-dev commented Jul 15, 2024

@mrjbq7 did good work which will be useful either way, I hope it gets merged soon. We can get the improved performance without requiring the user to rehash() manually by simply requiring any adapter context to have an additional hashStored function capable of hashing non-adapted keys. The map could then just rehash automatically every R insert/delete operations.

The rehash window R is linear in the map's size; the Linear Probing With Tombstones paper derives a Θ-bound on the optimal factor in terms of the load factor, but since only a rather small range of values for the load factor make any sense in practice, practical concerns will completely dominate that and we can just provide an empirically good default rehash factor.

@andrewrk andrewrk removed the contributor friendly This issue is limited in scope and/or knowledge of Zig internals. label Aug 8, 2024
@andrewrk
Copy link
Member

andrewrk commented Aug 8, 2024

My currently favored solution to this problem is to delete the non-array hash map implementation from the standard library.

Automatically rehashing after every R insert/deletion operations leads to non-uniform performance characteristics. Needing to rehash manually is also undesirable. I think that makes this hash map implementation inferior to ArrayHashMap, where you additionally get the benefit of having keys and values sequentially in a well-defined order.

The use case for a slightly more optimized hash map that does not need this property is rare enough - or so central to an application's core operation, as in the case of TigerBeetle - that it can be satisfied by a third party package, or should be part of the application itself, and have application-specific optimizations.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Aug 8, 2024
cryptocode added a commit to cryptocode/ghostty that referenced this issue Aug 10, 2024
I noticed that the HashMap iterator showed up prominently in Instruments when quickly
resizing Ghostty.

I think this is related to the [tombstone issue](ziglang/zig#17851),
where the `next()` function has to skip unused meta-nodes.

In that same issue, Andrew is suggesting that the non-array hashmap might get deleted from the
standard library.

After switching to `AutoArrayHashMapUnmanaged`, iteration barely shows up anymore.

Deletion from the pin list should also be fast as swapRemove is used (order does not need to be preserved).

Question is if insertion performance is negatively affected, though I'm not seeing anything obvious.
Still, checking this PR for any perf regressions might be a good idea.

If this pans out, there are more places where this switch might be beneficial.
igor84 pushed a commit to igor84/zig that referenced this issue Aug 11, 2024
SammyJames pushed a commit to SammyJames/zig that referenced this issue Aug 13, 2024
Rexicon226 pushed a commit to Rexicon226/zig that referenced this issue Aug 13, 2024
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior optimization standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

9 participants