Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf ⚡ Improved Event and Transaction commitment performance #133

Merged
merged 9 commits into from
Feb 27, 2024
Merged

perf ⚡ Improved Event and Transaction commitment performance #133

merged 9 commits into from
Feb 27, 2024

Conversation

Trantorian1
Copy link
Member

@Trantorian1 Trantorian1 commented Feb 23, 2024

Pull Request type

  • Refactoring (no functional changes, no API changes)

What is the current behavior?

fetch_block calls take too long, lasting over 1s in some case for transaction/event-heavy blocks. This used to be due to inefficiencies in the way in which transaction and in particular event commitments were calculated, but is now caused by updating the state root after #134 .

This is due to the cost of calculating the Merkle Root in the Bonsai db: as new nodes are added to the trie, it's size rapidly grows since content nodes can only be added as leafs. This greatly increases the amount of hashes to compute in order to determine the root.

Performance testing and statistical analysis of Deoxys has shown this to be a major issue, causing heavy slowdowns of other Substrate processes, resulting in exponential $O(b^N)$ temporal complexity (measured by experiment).

Resolves #94

What is the new behavior?

While no direct changes were able to be made to the way in which the Merkle Root is calculated (ie: no changes were made to the underlying recursive algorithms in the bonsai trie lib), the following steps were taken to gain perfomance:

  • Contract trie root computations have been parallelized and are now run in a blocking-safe thread.
  • Transaction and event commitments have been parallelized, and are now computed independently and at the same time.
  • Transaction and event hash computation have also been parallelized.

This has lead to a significant increase in performance, down to a complexity of $O(N)$ (measured by experiement), with a decrease in execution time of up to 99% in some cases.

Benchmarking

Thorough benchmarking was applied over the first 5800 blocks. Testing of the unoptimized version of the code had to be halted at block 2900 due to ballooning complexity, while testing of the optimized version of the code continued until block 5600.

Benchmark results were also compared to growth in number of events, transactions and contracts in new blocks to see if a strong link could be established in regards to the slowdowns. This has shown similarities between the rate of increase in contracts and execution time, while new events do not seem to impact performance. Rate of transaction increase follows a similar trend but has been shown to have little impact on performance, with updating the contract trie root remaining the major bottleneck.

The complete sets of benchmark results along with visual representations can be found here.

💡 Note that these benchmarks were run on an Intel Xeon-E 2288G - 8c/16t. Due to the multithreaded nature of these improvements, performance might vary across CPUs with less threads.

Does this introduce a breaking change?

No. Nodes with an existing local database can keep synchronizing new blocks, albeit at a faster rate.

Other information

⚠️ The main bottleneck remains in the computation of the Merkle Root of the Bonsai Trie. Major improvements might still be made there.

@Trantorian1 Trantorian1 added the perf Improved performance label Feb 23, 2024
@Trantorian1 Trantorian1 self-assigned this Feb 23, 2024
Copy link
Collaborator

@nils-mathieu nils-mathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some inefficiencies when spawning tasks.

Other than that, I'm not familiar enough with the problem to have any opinion on the correctness of the algorithm.

// event hashes are calculated in parallel
let mut set = JoinSet::new();
for (i, event) in events.iter().cloned().enumerate() {
let arc_event = Arc::new(event);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an Arc here just creates an unnecessary allocation. Your event is already cloned (because you called .cloned() on your iterator. get_hash takes are regular refernence, no need to use reference counting

for (i, event) in events.iter().cloned().enumerate() {
    set.spawn(async move { (i, get_hash::<H>(&event)) });
}

Also, be careful. If calling get_hash really is expansive, then you're probably slowing down you're whole async executor. If you want to schedule a blocking call, use spawn_blocking instead of a regular spawn. That will make sure that Tokio runs the task on a thread that can afford to be blocked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, dully noted! This part of the algorithm has since been overhauled in #134 but I have applied your remark to the new code, my bad 😅

events: &[Event],
backend: &Arc<BonsaiDb<B>>,
) -> Result<Felt252Wrapper, anyhow::Error>
) -> Result<Felt252Wrapper, String>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move from anyhow::Error to String?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time this was for two reasons:

  1. No error was actually being returned as commits to the Bonsai db were being called with unwrap, so it seemed to me the error type was not important.
  2. I feel like we might be using anyhow a bit too much for errors that do not need as much genericity, so I am trying to cut down on that whenever I get the chance.

Let me know what you think though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my main points:

  • anyhow is not generic, so you don't need to worry about monomorphisation costs.

  • It works basically like a String minus the actual string allocation (there is actually an allocation as well under the hood, but it is much smaller in a vast majority of cases).

  • You can turn any error type into an anyhow::Error just by calling the ? operator. With a String you have to manually create it with format!, which can be become redundant.

  • anyhow::Result<T> or Result<T, anyhow::Error> is arguably clearer than Result<T, String>.

  • You get to use cool macros to make the code shorter and cleaner.

function_that_fails()?;
// or
function_that_fails().context("function has failed")?;

// vs
function_that_fails().map_err(|err| format!("function has failed: {}", err))?;
bail!("my custom error");
// vs
return Err(String::from("my custom error"));

ensure!(condition, "oh no");
// vs
if condition {
    return Err(String::from("oh no"));
}

That being said, when possible, I'd lean more toward creating a custom error type. But that really depends on whether the error needs to be caught or not. If you only need to print an error that's not supposed to occur, anyhow or String are functionally the same (though I like anyhow better). In all other cases, creating a custom error type is strictly better (though it takes a bit more effort to maintain).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm reading your answers again, I'm not sure whether we're using the same definition of "generic". What do you mean by "do not need as much genericity" ?

// transaction hashes are calculated in parallel
let mut set = JoinSet::new();
for (i, tx) in transactions.iter().cloned().enumerate() {
let arc_tx = Arc::new(tx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here: you don't need to allocate an Arc. Just move the value into the future without Arcing it.

And for the same exact reason, you should be using spawn_blocking instead of spawn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted! Our hash computation is actually different from the hash computation inside the Bonsai lib, which is the one which is expensive. In comparison, calculating the hash on our side is mostly negligible, with committing to the Bonsai db being the main bottleneck due to recursive hashing of the many nodes of the trie. Steps have been taken to run that in spawn_blocking where applicable though, as seen in recent commits.

transactions: &[Transaction],
chain_id: Felt252Wrapper,
block_number: u64,
backend: &Arc<BonsaiDb<B>>,
) -> Result<Felt252Wrapper, BonsaiDbError>
) -> Result<Felt252Wrapper, String>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here: why not anyhow::Error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, cutting down on un-used genericity in error return types.

@antiyro antiyro merged commit 3118703 into KasarLabs:main Feb 27, 2024
5 of 7 checks passed
@Trantorian1 Trantorian1 deleted the perf/fetch_block branch February 27, 2024 14:52
@github-actions github-actions bot locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
perf Improved performance
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

feat/opti_import
3 participants