-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf ⚡ Improved Event and Transaction commitment performance #133
Conversation
State commitment now calculates hashes in parallel, with bonsai trie comitments being run in a blocking-safe thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some inefficiencies when spawning tasks.
Other than that, I'm not familiar enough with the problem to have any opinion on the correctness of the algorithm.
// event hashes are calculated in parallel | ||
let mut set = JoinSet::new(); | ||
for (i, event) in events.iter().cloned().enumerate() { | ||
let arc_event = Arc::new(event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an Arc
here just creates an unnecessary allocation. Your event is already cloned (because you called .cloned()
on your iterator. get_hash
takes are regular refernence, no need to use reference counting
for (i, event) in events.iter().cloned().enumerate() {
set.spawn(async move { (i, get_hash::<H>(&event)) });
}
Also, be careful. If calling get_hash
really is expansive, then you're probably slowing down you're whole async executor. If you want to schedule a blocking call, use spawn_blocking
instead of a regular spawn
. That will make sure that Tokio runs the task on a thread that can afford to be blocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, dully noted! This part of the algorithm has since been overhauled in #134 but I have applied your remark to the new code, my bad 😅
events: &[Event], | ||
backend: &Arc<BonsaiDb<B>>, | ||
) -> Result<Felt252Wrapper, anyhow::Error> | ||
) -> Result<Felt252Wrapper, String> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why move from anyhow::Error
to String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time this was for two reasons:
- No error was actually being returned as
commit
s to the Bonsai db were being called withunwrap
, so it seemed to me the error type was not important. - I feel like we might be using anyhow a bit too much for errors that do not need as much genericity, so I am trying to cut down on that whenever I get the chance.
Let me know what you think though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my main points:
-
anyhow is not generic, so you don't need to worry about monomorphisation costs.
-
It works basically like a
String
minus the actual string allocation (there is actually an allocation as well under the hood, but it is much smaller in a vast majority of cases). -
You can turn any error type into an
anyhow::Error
just by calling the?
operator. With aString
you have to manually create it withformat!
, which can be become redundant. -
anyhow::Result<T>
orResult<T, anyhow::Error>
is arguably clearer thanResult<T, String>
. -
You get to use cool macros to make the code shorter and cleaner.
function_that_fails()?;
// or
function_that_fails().context("function has failed")?;
// vs
function_that_fails().map_err(|err| format!("function has failed: {}", err))?;
bail!("my custom error");
// vs
return Err(String::from("my custom error"));
ensure!(condition, "oh no");
// vs
if condition {
return Err(String::from("oh no"));
}
That being said, when possible, I'd lean more toward creating a custom error type. But that really depends on whether the error needs to be caught or not. If you only need to print an error that's not supposed to occur, anyhow
or String
are functionally the same (though I like anyhow better). In all other cases, creating a custom error type is strictly better (though it takes a bit more effort to maintain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm reading your answers again, I'm not sure whether we're using the same definition of "generic". What do you mean by "do not need as much genericity" ?
// transaction hashes are calculated in parallel | ||
let mut set = JoinSet::new(); | ||
for (i, tx) in transactions.iter().cloned().enumerate() { | ||
let arc_tx = Arc::new(tx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here: you don't need to allocate an Arc
. Just move the value into the future without Arc
ing it.
And for the same exact reason, you should be using spawn_blocking
instead of spawn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! Our hash computation is actually different from the hash computation inside the Bonsai lib, which is the one which is expensive. In comparison, calculating the hash on our side is mostly negligible, with committing to the Bonsai db being the main bottleneck due to recursive hashing of the many nodes of the trie. Steps have been taken to run that in spawn_blocking
where applicable though, as seen in recent commits.
transactions: &[Transaction], | ||
chain_id: Felt252Wrapper, | ||
block_number: u64, | ||
backend: &Arc<BonsaiDb<B>>, | ||
) -> Result<Felt252Wrapper, BonsaiDbError> | ||
) -> Result<Felt252Wrapper, String> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here: why not anyhow::Error
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, cutting down on un-used genericity in error return types.
…te commitment logic
Pull Request type
What is the current behavior?
fetch_block
calls take too long, lasting over 1s in some case for transaction/event-heavy blocks. This used to be due to inefficiencies in the way in which transaction and in particular event commitments were calculated, but is now caused by updating the state root after #134 .This is due to the cost of calculating the Merkle Root in the Bonsai db: as new nodes are added to the trie, it's size rapidly grows since content nodes can only be added as leafs. This greatly increases the amount of hashes to compute in order to determine the root.
Performance testing and statistical analysis of Deoxys has shown this to be a major issue, causing heavy slowdowns of other Substrate processes, resulting in exponential$O(b^N)$ temporal complexity (measured by experiment).
Resolves #94
What is the new behavior?
While no direct changes were able to be made to the way in which the Merkle Root is calculated (ie: no changes were made to the underlying recursive algorithms in the bonsai trie lib), the following steps were taken to gain perfomance:
This has lead to a significant increase in performance, down to a complexity of$O(N)$ (measured by experiement), with a decrease in execution time of up to 99% in some cases.
Benchmarking
Thorough benchmarking was applied over the first
5800
blocks. Testing of the unoptimized version of the code had to be halted at block2900
due to ballooning complexity, while testing of the optimized version of the code continued until block5600
.Benchmark results were also compared to growth in number of events, transactions and contracts in new blocks to see if a strong link could be established in regards to the slowdowns. This has shown similarities between the rate of increase in contracts and execution time, while new events do not seem to impact performance. Rate of transaction increase follows a similar trend but has been shown to have little impact on performance, with updating the contract trie root remaining the major bottleneck.
The complete sets of benchmark results along with visual representations can be found here.
Does this introduce a breaking change?
No. Nodes with an existing local database can keep synchronizing new blocks, albeit at a faster rate.
Other information