-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capped Hashmap #25
Capped Hashmap #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the PR. Added a few comments, IIUC I think the capacity check is not working as intended.
I was also wondering about two different approaches that could have worked: use IndexMap
as a dependency instead of the HashMap (as it remembers order of insertion), not sure how it performs but it might just be easier to rely on that dependency.
Another approach could have been to just wipe out the entire hashmap once we reach X entries, it's a bit of a nuclear approach but maybe that would have been fine as well :P
src/capped_hashmap.rs
Outdated
self.last_items.push_front(k); | ||
} | ||
|
||
if self.last_items.len() > self.capacity - 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have logs when we reach a quarter of the capacity, half of the capacity, 90% of the capacity, or something like that (so that we now something is happening and unfinished signatures are piling up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also imagine capacity
is set to 5, this means that the hashmap and vecdeque both have capacity 4. So when the insertion above makes inner
and last_items
full, the check here will be 4 > 4 = false
and do nothing. This means that the next insert will resize both the hashmap and vecdeque. I think you made a mistake in the initialization, you should keep the capacity, but have the two structures have capacity + 1 instead. Maybe a test checking for the array capacity would help with making sure that the logic works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should keep the capacity, but have the two structures have capacity + 1 instead
Oh shit, yes that's what I intended to do in the first place. Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read about the capacity and how it changes. There is a good explanation here.
I also tested it and it looks like it can double even though the length remains the same.
it has to reserve "slack space", the amount of which depends on the implementation of the hashmap (and specifically its collision resolution algorithm).
I guess the more keys we insert the highest the chance of a collision (even though some keys are removed) so it looks like it follows a conservative approach by doubling the capacity.
I believe this is not a big deal neither a huge performance implication given that we're not gonna be storing several thousand or millions of entries in the hash table. And besides capacity change does not mean entries are moved to a new memory location.
src/capped_hashmap.rs
Outdated
|
||
/// Inserts an new item to the collection. Return Some(key) where key is the | ||
/// key that was removed when we reach the max capacity. Otherwise returns None. | ||
pub fn insert(&mut self, k: K, v: V) -> Option<K> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should return the item that was removed, as this might be surprising behavior for something that is supposed to closely mimic a hashmap. If we want this API it should be named something else (insert_and_get_removed_item
or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should return the item that was removed
This is actually quite handy and it can help with us to things like this #2 (comment).
It's similar to what the core HashMap does with their insert
fn. It returns the value of the replaced item and they don't call it something like insert_and_get_replaced_item
.
I believe we can can this add_entry
to avoid any confusion with the HashMap::insert
fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying this is not a useful function, just that it should have a different name so that it does not have a surprising behavior as HashMap::insert
does not behave like this. HashMap::insert
returns the element that was removed at the place of insert, which is different!
src/capped_hashmap.rs
Outdated
.iter() | ||
.filter(|key| *key != k) | ||
.map(|key| *key) | ||
.collect::<VecDeque<_>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it kinda sucks that we have to go through the whole thing here :D I agree that we should assume that the key to remove is one of the last one appended (and that the oldest stuff is probably just stale stuff at this point). Maybe a LinkedList is better as removing something at any index is easier? (so find followed by a remove). In any case I think this would be better:
self.last_items.iter().position(|key| key == &k).and_then(self.last_items.remove(pos));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this looks much better 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah you'd need some sort of linkedlist + hashmap to know which two nodes to update to remove a node :D
Nice! Thanks for addressing the nits :) |
"cwd": "${workspaceFolder}" | ||
}, | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably didn't mean to push this file :o
Implementation of a Hashmap collection with a capped size. Based on the ideas discussed here #2