Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capped Hashmap #25

Merged
merged 16 commits into from
Jan 30, 2024
2 changes: 1 addition & 1 deletion docs/mpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ pub struct LocalSigningTask {
Members keep track of such signing tasks in a local hashmap:

```rust
signing_tasks: RwLock<HashMap<Txid, LocalSigningTask>>
signing_tasks: RwLock<CappedHashMap<Txid, LocalSigningTask>>
```

The commitments created at this point are sent back to the orchestrator:
Expand Down
172 changes: 172 additions & 0 deletions src/capped_hashmap.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
use std::{
cmp::Eq,
collections::{HashMap, VecDeque},
hash::Hash,
};

pub struct CappedHashMap<K, V>
where
K: Hash + Eq + Copy + Clone,
{
capacity: usize,
ppoliani marked this conversation as resolved.
Show resolved Hide resolved
inner: HashMap<K, V>,
last_items: VecDeque<K>,
}

impl<K, V> CappedHashMap<K, V>
where
K: Hash + Eq + Copy + Clone,
{
pub fn new(capacity: usize) -> Self {
Self {
// We add a little bit margin to the capacity, because we always insert first and then check
// if collection is full. So we don't want the hashmap to double it's capacity because of that.
capacity: capacity + 1,
inner: HashMap::with_capacity(capacity),
last_items: VecDeque::with_capacity(capacity),
}
}

/// Inserts an new item to the collection. Return Some(key) where key is the
/// key that was removed when we reach the max capacity. Otherwise returns None.
pub fn insert(&mut self, k: K, v: V) -> Option<K> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return the item that was removed, as this might be surprising behavior for something that is supposed to closely mimic a hashmap. If we want this API it should be named something else (insert_and_get_removed_item or something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return the item that was removed

This is actually quite handy and it can help with us to things like this #2 (comment).

It's similar to what the core HashMap does with their insert fn. It returns the value of the replaced item and they don't call it something like insert_and_get_replaced_item.

I believe we can can this add_entry to avoid any confusion with the HashMap::insert fn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying this is not a useful function, just that it should have a different name so that it does not have a surprising behavior as HashMap::insert does not behave like this. HashMap::insert returns the element that was removed at the place of insert, which is different!

let mut ret = None;

// replacing a value should not push any new items to last_items
if let None = self.inner.insert(k, v) {
ppoliani marked this conversation as resolved.
Show resolved Hide resolved
self.last_items.push_front(k);
}

if self.last_items.len() > self.capacity - 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have logs when we reach a quarter of the capacity, half of the capacity, 90% of the capacity, or something like that (so that we now something is happening and unfinished signatures are piling up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also imagine capacity is set to 5, this means that the hashmap and vecdeque both have capacity 4. So when the insertion above makes inner and last_items full, the check here will be 4 > 4 = false and do nothing. This means that the next insert will resize both the hashmap and vecdeque. I think you made a mistake in the initialization, you should keep the capacity, but have the two structures have capacity + 1 instead. Maybe a test checking for the array capacity would help with making sure that the logic works :)

Copy link
Contributor Author

@ppoliani ppoliani Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should keep the capacity, but have the two structures have capacity + 1 instead

Oh shit, yes that's what I intended to do in the first place. Good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read about the capacity and how it changes. There is a good explanation here.

I also tested it and it looks like it can double even though the length remains the same.

it has to reserve "slack space", the amount of which depends on the implementation of the hashmap (and specifically its collision resolution algorithm).

I guess the more keys we insert the highest the chance of a collision (even though some keys are removed) so it looks like it follows a conservative approach by doubling the capacity.

I believe this is not a big deal neither a huge performance implication given that we're not gonna be storing several thousand or millions of entries in the hash table. And besides capacity change does not mean entries are moved to a new memory location.

// remove the oldest item. We an safely unwrap because we know the last_items is not empty at this point
let key = self.last_items.pop_back().unwrap();
assert!(self.remove(&key).is_some());

ret = Some(key);
}

ret
}

/// Removes a key from the map, returning the value at the key if the key was previously in the map.
pub fn remove(&mut self, k: &K) -> Option<V> {
let Some(v) = self.inner.remove(k) else {
return None;
};

self.last_items = self
.last_items
.iter()
.filter(|key| *key != k)
.map(|key| *key)
.collect::<VecDeque<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it kinda sucks that we have to go through the whole thing here :D I agree that we should assume that the key to remove is one of the last one appended (and that the oldest stuff is probably just stale stuff at this point). Maybe a LinkedList is better as removing something at any index is easier? (so find followed by a remove). In any case I think this would be better:

self.last_items.iter().position(|key| key == &k).and_then(self.last_items.remove(pos));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this looks much better 👍

Copy link
Contributor Author

@ppoliani ppoliani Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a LinkedList isn't faster on removals.

This operation should compute in O(n) time.

Also this is what the official docs suggest:

NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah you'd need some sort of linkedlist + hashmap to know which two nodes to update to remove a node :D


Some(v)
}

/// Returns a reference to the value corresponding to the key.
pub fn get(&self, k: &K) -> Option<&V> {
self.inner.get(k)
}

/// Returns a mutable reference to the value corresponding to the key.
pub fn get_mut(&mut self, k: &K) -> Option<&mut V> {
self.inner.get_mut(k)
}

/// Returns the number of elements in the collection
pub fn size(&self) -> usize {
self.inner.len()
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_insert_into_non_full_collection() {
let mut col: CappedHashMap<u8, u8> = CappedHashMap::new(10);
col.insert(1, 1);
col.insert(2, 2);
col.insert(3, 3);

assert_eq!(*col.get(&1).unwrap(), 1);
assert_eq!(*col.get(&2).unwrap(), 2);
assert_eq!(*col.get(&3).unwrap(), 3);
}

#[test]
fn test_insert_should_return_removed_key() {
let mut col: CappedHashMap<u8, u8> = CappedHashMap::new(10);

for i in 0..10 {
col.insert(i, i);
}

for i in 10..30 {
// the nth oldest key will be removed
let key_removed = col.insert(i, i);
assert!(key_removed.is_some());
assert_eq!(key_removed.unwrap(), i - 10);
assert_eq!(col.size(), 10);
}

// Not that we should have the last 10 keys in the collection i.e. 20-30. All the previous
// were replaced by these new ones
for i in 0..20 {
assert!(col.get(&i).is_none());
}

// after cyclic inserts we still have a full capacity collection. We can remove one item...
assert!(col.remove(&20).is_some());
assert_eq!(col.size(), 9);

// ... and now inserting a new item will not replace any existing one
assert!(col.insert(31, 31).is_none());
}

#[test]
fn test_insert_duplicate() {
let mut col: CappedHashMap<u8, u8> = CappedHashMap::new(10);

for i in 0..10 {
col.insert(i, i);
}

assert_eq!(*col.get(&0).unwrap(), 0);
assert_eq!(col.size(), 10);

// replacing should simply replace the value and not affect the size.
// so altough our col is full capacity, replacing an existing should not remove the oldest item
assert!(col.insert(0, 2).is_none());
assert_eq!(*col.get(&0).unwrap(), 2);
assert_eq!(col.size(), 10);

// but inserting a new one should
let key_removed = col.insert(10, 10);
assert!(key_removed.is_some());
assert_eq!(key_removed.unwrap(), 0);
assert_eq!(col.size(), 10);
}

#[test]
fn test_remove() {
let mut col: CappedHashMap<u8, u8> = CappedHashMap::new(10);

for i in 0..10 {
col.insert(i, i);
}

for i in 0..10 {
let v = col.remove(&i);
assert!(v.is_some());
assert_eq!(v.unwrap(), i);
assert_eq!(col.size() as u8, 10 - i - 1);
}

// the collection is empty so the next remove should return None
let v = col.remove(&0);
assert!(v.is_none());
}
}
7 changes: 4 additions & 3 deletions src/committee/node.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use std::{
collections::{BTreeMap, HashMap},
collections::BTreeMap,
net::SocketAddr,
sync::{Arc, RwLock},
};
Expand All @@ -18,6 +18,7 @@ use serde::{Deserialize, Serialize};

use crate::{
bob_request::{BobRequest, SmartContract},
capped_hashmap::CappedHashMap,
frost,
mpc_sign_tx::get_digest_to_hash,
};
Expand All @@ -35,7 +36,7 @@ pub struct NodeState {
pub pubkey_package: frost::PublicKeyPackage,

// TODO: ensure that this cannot grow like crazy? prune old tasks?
pub signing_tasks: RwLock<HashMap<Txid, LocalSigningTask>>,
pub signing_tasks: RwLock<CappedHashMap<Txid, LocalSigningTask>>,
ppoliani marked this conversation as resolved.
Show resolved Hide resolved
}

#[derive(Clone)]
Expand Down Expand Up @@ -233,7 +234,7 @@ pub async fn run_server(
let ctx = NodeState {
key_package,
pubkey_package,
signing_tasks: RwLock::new(HashMap::new()),
signing_tasks: RwLock::new(CappedHashMap::new(100)),
ppoliani marked this conversation as resolved.
Show resolved Hide resolved
};

let server = Server::builder()
Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ pub mod bob_request;
/// 3. The zkBitcoin committee produce a collaborative schnorr signature to unlock the funds for Bob.
pub mod mpc_sign_tx;

pub mod capped_hashmap;
ppoliani marked this conversation as resolved.
Show resolved Hide resolved
//
// Helpers
//
Expand Down
Loading