Capped Hashmap #25

ppoliani · 2024-01-27T16:27:17Z

Implementation of a Hashmap collection with a capped size. Based on the ideas discussed here #2

mimoo

Nice! Thanks for the PR. Added a few comments, IIUC I think the capacity check is not working as intended.

I was also wondering about two different approaches that could have worked: use IndexMap as a dependency instead of the HashMap (as it remembers order of insertion), not sure how it performs but it might just be easier to rely on that dependency.

Another approach could have been to just wipe out the entire hashmap once we reach X entries, it's a bit of a nuclear approach but maybe that would have been fine as well :P

src/committee/node.rs

src/lib.rs

mimoo · 2024-01-28T21:18:32Z

src/capped_hashmap.rs

+            self.last_items.push_front(k);
+        }
+
+        if self.last_items.len() > self.capacity - 1 {


I think we should have logs when we reach a quarter of the capacity, half of the capacity, 90% of the capacity, or something like that (so that we now something is happening and unfinished signatures are piling up

also imagine capacity is set to 5, this means that the hashmap and vecdeque both have capacity 4. So when the insertion above makes inner and last_items full, the check here will be 4 > 4 = false and do nothing. This means that the next insert will resize both the hashmap and vecdeque. I think you made a mistake in the initialization, you should keep the capacity, but have the two structures have capacity + 1 instead. Maybe a test checking for the array capacity would help with making sure that the logic works :)

you should keep the capacity, but have the two structures have capacity + 1 instead

Oh shit, yes that's what I intended to do in the first place. Good point!

I've read about the capacity and how it changes. There is a good explanation here.

I also tested it and it looks like it can double even though the length remains the same.

it has to reserve "slack space", the amount of which depends on the implementation of the hashmap (and specifically its collision resolution algorithm).

I guess the more keys we insert the highest the chance of a collision (even though some keys are removed) so it looks like it follows a conservative approach by doubling the capacity.

I believe this is not a big deal neither a huge performance implication given that we're not gonna be storing several thousand or millions of entries in the hash table. And besides capacity change does not mean entries are moved to a new memory location.

mimoo · 2024-01-28T21:23:53Z

src/capped_hashmap.rs

+
+    /// Inserts an new item to the collection. Return Some(key) where key is the
+    /// key that was removed when we reach the max capacity. Otherwise returns None.
+    pub fn insert(&mut self, k: K, v: V) -> Option<K> {


I don't think we should return the item that was removed, as this might be surprising behavior for something that is supposed to closely mimic a hashmap. If we want this API it should be named something else (insert_and_get_removed_item or something)

I don't think we should return the item that was removed

This is actually quite handy and it can help with us to things like this #2 (comment).

It's similar to what the core HashMap does with their insert fn. It returns the value of the replaced item and they don't call it something like insert_and_get_replaced_item.

I believe we can can this add_entry to avoid any confusion with the HashMap::insert fn.

I'm not saying this is not a useful function, just that it should have a different name so that it does not have a surprising behavior as HashMap::insert does not behave like this. HashMap::insert returns the element that was removed at the place of insert, which is different!

src/capped_hashmap.rs

mimoo · 2024-01-28T21:41:50Z

src/capped_hashmap.rs

+            .iter()
+            .filter(|key| *key != k)
+            .map(|key| *key)
+            .collect::<VecDeque<_>>();


it kinda sucks that we have to go through the whole thing here :D I agree that we should assume that the key to remove is one of the last one appended (and that the oldest stuff is probably just stale stuff at this point). Maybe a LinkedList is better as removing something at any index is easier? (so find followed by a remove). In any case I think this would be better:

self.last_items.iter().position(|key| key == &k).and_then(self.last_items.remove(pos));

yes this looks much better 👍

a LinkedList isn't faster on removals.

This operation should compute in O(n) time.

Also this is what the official docs suggest:

NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

ah yeah you'd need some sort of linkedlist + hashmap to know which two nodes to update to remove a node :D

src/capped_hashmap.rs

mimoo · 2024-01-30T06:19:55Z

Nice! Thanks for addressing the nits :)

mimoo · 2024-01-30T06:11:43Z

.vscode/launch.json

+      "cwd": "${workspaceFolder}"
+    },
+  ]
+}


you probably didn't mean to push this file :o

ppoliani added 5 commits January 27, 2024 18:26

feat: implement a capped hashmap colelction

1751c86

feat: implemebt CappedHashMap::remove

d44929c

chore: use 4 space tabs

8929636

chore: fi typo

bab9e7a

chore: run cargo fmt

fe21a47

ppoliani changed the title ~~[WIP] Capped Hashmap~~ Capped Hashmap Jan 28, 2024

feat: add get fns to capped hashmap

6f8294a

ppoliani changed the title ~~Capped Hashmap~~ [WIP] Capped Hashmap Jan 28, 2024

ppoliani added 3 commits January 28, 2024 12:32

test: add test_insert_should_return_removed_key

f061aad

test: add test_remove

3329660

test: test insert duplicate

c636aae

ppoliani changed the title ~~[WIP] Capped Hashmap~~ Capped Hashmap Jan 28, 2024

mimoo reviewed Jan 28, 2024

View reviewed changes

src/capped_hashmap.rs Outdated Show resolved Hide resolved

ppoliani added 7 commits January 29, 2024 12:41

fix: PR comments

c60603d

feat: check if key exists before adding a new one

3eda4cf

feat: allow user replace the code

181a3ad

refactor: rename insert to add_entry

81f616c

feat: use max_size field

6f39a59

feat: add helper log function

a0adf49

chore: code format

8b7ab84

mimoo approved these changes Jan 30, 2024

View reviewed changes

.vscode/launch.json

"cwd": "${workspaceFolder}"

},

]

}

Copy link

Contributor

mimoo Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably didn't mean to push this file :o

mimoo merged commit 899ec01 into sigma0-dev:main Jan 30, 2024
1 check passed

ppoliani deleted the feat/capped_hashmap branch January 30, 2024 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capped Hashmap #25

Capped Hashmap #25

ppoliani commented Jan 27, 2024

mimoo left a comment

mimoo Jan 28, 2024

mimoo Jan 28, 2024

ppoliani Jan 29, 2024 •

edited

Loading

ppoliani Jan 29, 2024

mimoo Jan 28, 2024

ppoliani Jan 29, 2024

mimoo Jan 30, 2024

mimoo Jan 28, 2024

ppoliani Jan 29, 2024

ppoliani Jan 29, 2024 •

edited

Loading

mimoo Jan 30, 2024

mimoo commented Jan 30, 2024

mimoo Jan 30, 2024

Capped Hashmap #25

Capped Hashmap #25

Conversation

ppoliani commented Jan 27, 2024

mimoo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppoliani Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppoliani Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mimoo commented Jan 30, 2024

Choose a reason for hiding this comment

ppoliani Jan 29, 2024 •

edited

Loading

ppoliani Jan 29, 2024 •

edited

Loading