Skip to content

Commit

Permalink
MINOR: [C++] Fix the wrong comment in #45104 (#45111)
Browse files Browse the repository at this point in the history
### Rationale for this change

In my last typo fix in #45104 I made a mistake: https://github.com/apache/arrow/pull/45104/files#diff-a1f24396d20cd510ae48838e684fbc5ba7b23475702ae849a2d13bf48e5251aaR284 I was too careless. The size of a block in swiss table is much more subtle than I thought. It turns out the original `12KB` was right.

### What changes are included in this PR?

Removing my last wrong change and elaborate a bit about how the number `12KB` comes.

### Are these changes tested?

### Are there any user-facing changes?

Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
  • Loading branch information
zanmato1984 authored Dec 26, 2024
1 parent 0a00e25 commit 52b3184
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions cpp/src/arrow/compute/key_map_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -281,13 +281,18 @@ void SwissTable::early_filter_imp(const int num_keys, const uint32_t* hashes,
// When we reach this limit, we need to break processing of any further rows and resize.
//
uint64_t SwissTable::num_groups_for_resize() const {
// Resize small hash tables when 50% full (up to 32KB).
// Resize large hash tables when 75% full.
// Consider N = 9 (aka 2 ^ 9 = 512 blocks) as small.
// When N = 9, a slot id takes N + 3 = 12 bits, rounded up to 16 bits. This is also the
// number of bits needed for a key id. Since each slot stores a status byte and a key
// id, then a slot takes 1 byte + 16 bits = 3 bytes. Therefore a block of 8 slots takes
// 24 bytes. The threshold of a small hash table ends up being 24 bytes * 512 = 12 KB.
constexpr int log_blocks_small_ = 9;
uint64_t num_slots = 1ULL << (log_blocks_ + 3);
if (log_blocks_ <= log_blocks_small_) {
// Resize small hash tables when 50% full.
return num_slots / 2;
} else {
// Resize large hash tables when 75% full.
return num_slots * 3 / 4;
}
}
Expand Down

0 comments on commit 52b3184

Please sign in to comment.