Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to efficiently hash entire non-root subtrees to guts #329

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rklaehn
Copy link

@rklaehn rklaehn commented Jul 29, 2023

provides an efficient way to compute the hash for multiple chunks that do not start at chunk 0, as part of a larger tree.

This is useful for crates like bao, abao and bao-tree when using chunk groups.

Adding this required to disable some assertions that contained assumptions that are no longer true.

This gives a performance improvement of 4x when computing an outboard with a chunk group size of 16kb in bao-tree compared to using blake3::guts::ChunkState and blake3::guts::parent_cv to compute a hash for a chunk group.

UPDATE:

this works in the case of bao-tree, but the signature is not good since it can't work in the general case. E.g.

let data = [0u8; 2048];
let hash = hash_block(1, &data, false);

This does not make any sense at all, since there is no single root for chunks 1 and 2. I guess you could put in an assert to catch things like this?

provides an efficient way to compute the hash for multiple chunks that do
not start at chunk 0, as part of a larger tree.

This is useful for crates like bao, abao and bao-tree, especially when using
chunk groups.

Adding this required to disable some assertions that contained assumptions
that are no longer true.
@rklaehn
Copy link
Author

rklaehn commented Jul 31, 2023

I am not sure what is the best way to expose the required method to guts. It does not matter as long as there is a way to hash a non-root subtree that starts at a non-zero chunk.

src/lib.rs Outdated
@@ -1246,7 +1267,7 @@ impl Hasher {
// also. Convert it directly into an Output. Otherwise, we need to
// merge subtrees below.
if self.cv_stack.is_empty() {
debug_assert_eq!(self.chunk_state.chunk_counter, 0);
// debug_assert_eq!(self.chunk_state.chunk_counter, 0);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be disabled because the chunk counter can be set to a non zero value in the constructor now.

src/lib.rs Outdated
self.chunk_state.chunk_counter.count_ones() as usize,
"cv stack does not need a merge"
);
// debug_assert_eq!(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be disabled since because the chunk counter does not necessarily start at 0, it's current value is no longer equivalent to the number of chunks.

I don't think there is a way to reenable this except by adding an additional value to the state that tracks the start chunk.

@@ -1304,6 +1325,11 @@ impl Hasher {
self.final_output().root_hash()
}

fn finalize_node(&self, is_root: bool) -> Hash {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just exposes a way to create non root hashes

src/lib.rs Outdated
@@ -409,6 +409,18 @@ impl Output {
Hash(platform::le_bytes_from_words_32(&cv))
}

fn hash(&self, is_root: bool) -> Hash {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just exposes a way to create non root hashes

and add optional rayon support
@rklaehn rklaehn changed the title Add fn hash_block to guts Add function to efficiently hash entire non-root subtrees to guts Aug 1, 2023
@rklaehn
Copy link
Author

rklaehn commented Aug 1, 2023

I am not quite sure, but it seems that what this blake3 fork is doing would also be covered with this: fleek-network@0e15418

@rklaehn rklaehn force-pushed the more-guts branch 3 times, most recently from ecd9b41 to a050f34 Compare August 1, 2023 20:29
also add a little brute force test
@oconnor663-travel
Copy link

I'm out of the country and mostly offline until August 20, so I won't be able to take a proper look until then. But you might be interested in the new "guts" API I'm in the middle of:

Apologies for the big pile of undocumented code there :) But my hope is that I can port BLAKE3's existing SIMD implementations to that expanded API and then release that as a (permanently unstable) crate that other low-level callers can depend on. Here are a couple other projects I hope to support with the same API:

@rklaehn
Copy link
Author

rklaehn commented Aug 8, 2023

@oconnor663-travel this looks awesome. No stress, we have forked the crate and are depending on that one now, so there is not much urgency. Once a crate is out that allows us to compute subtree hashes efficiently, we will yank that one and go back to upstream.

Happy to close this if there is something better in the works. It would be nice if there was an example or a fn for the hash_subtree use case implemented in this PR.

Enjoy being offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants