During university, I had the opportunity to study 2040S - Data Structures & Algorithms, which sparked my curiosity in how different hashmaps implementations resolve hash collisions (i.e linear probing, chaining using linked lists).
Each approach has its own tradeoffs, but we were taught generally these approaches boasted the same asymptotic time complexity - theoretically at least. I wanted to put such claims to the test - to test if these approaches do translate to similar real-world performance
#include "ChainingHashMap.hpp"
#include <iostream>
int main () {
benn::ChainingHashMap<int, int> x;
//...
}
- Insertion
x[1] = 10;
x.insert({1, 10});
- Deletion
x.remove(1);
- Existence of key
x.count(1);
- Size
x.size();
x.getBucketCount();
- Generic Types
benn::ChainingHashMap<double, bool> x;
benn::OpenAddressingHashMap<std::string, int> y;
This project uses CMake
mkdir build && cd build && cmake .. && make
This project uses Google Test
cd build && ./tests/open_addressing_hashmap
cd build && ./tests/chaining_hashmap
My CPU stats:
Run on (12 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x12)
Load Average: 1.78, 2.28, 2.49
To be very honest, I'm not very sure how to explain this graph.
My initial hypothesis: While resizing the table introduces a temporary performance drop due to a complete rehashing, the cost of resizing is amortized across many find operations invoked later (hence the performance increase).
I'll be happy to be proven wrong though/offered alternative ideas (and how to prove them).
- Quadratic Probing
- Double Hashing
- Implement
.start()
and.end()
- Create my own
LinkedList
instead of usingstd::list
for ChainingHashMap