Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEMP ON HOLD] tweak: Use a visitor pattern for more efficient SBOR traversals #1926

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

dhedey
Copy link
Contributor

@dhedey dhedey commented Sep 25, 2024

Note

This PR attempts to convert the "Pulling" event style SBOR traverser "exterior iterator" to a "Visiting" "interior iterator" style abstraction, to validate potential performance gains. Side node: Technically the "Pulling" style abstraction is actually a "Generator / Iterator / Coroutine without arguments" in new Rust terminology [article, 2023 blog - so we should probably just make it an Iterator of an Event - the main issue is it's a borrowing iterator, because its items refer to the traverser itself. It can actually also probably be rewritten as a generator when the gen block is stabilized.

This PR actually only goes half-way, to a "resumable visiting" pattern.

By attempting this, it was discovered we likely want to support both a pulling and a non-resumable visiting traverser. This would need to be done both at the untyped and typed layer.

That is because:

  • The visitor model can be up to ~10x more efficient for most use cases (decoding RawValue, validations, IndexedScryptoValue, re-encode ManifestValue to ScryptoValue)
  • BUT it would be very nasty to re-write the serialization as a visitor. Because of how serde serializers work (and lots of existing code!), it really favours an external pull-based system, which allows us to use the call frames/stack to keep track of how we're serializing things.

So quick wins:

  • Reinstate VecTraverser => UntypedPullingTraverser as its own thing - this lets us optimize UntypedTraverser => UntypedVisitingTraverser further to not be resumable, and allowing more declarative logic.
  • Also consider trialling a stack-allocated bounded array (or some kind of small-vec library)
  • Do the same with TypedTraverser and use it in PayloadValidator

Summary

Relates to #1860

Basically just a side-exploration on my off-day to trial a visitor pattern for SBOR traversals.
And makes VecTraverser use a shim to the new visitor pattern to validate it in tests.

I imagine it makes VecTraverser too slow in its current form, so we probably don't want to merge this.

But I'm interested to see if the RawSborValue decoding is significantly faster (I'm hoping it can be, due to various optimizations which can be done for calculate_value_tree_body_byte_length now).

When I have some more down time, I might investigate if we can do the same treatment to TypedTraverser, payload validation, etc.

Perf review:

Winners:

Test Base PR %
costing::decode_encoded_i8_array_to_manifest_raw_value 19.5±0.04ms 6.2±0.00ms -68.21%
costing::decode_encoded_tuple_array_to_manifest_raw_value 63.4±0.15ms 18.3±0.02ms -71.14%
costing::decode_rpd_to_manifest_raw_value 12.4±0.02µs 5.4±0.03µs -56.45%
costing::validate_sbor_payload 32.6±0.13µs 29.5±0.06µs -9.51%

And losers:

Test Base PR %
costing::decode_encoded_u8_array_to_manifest_raw_value 25.6±0.13µs 27.2±2.37µs +6.25%
costing::execute_transaction_reading_big_vec_substates 592.8±0.63ms 642.2±0.47ms +8.33%
schema::validate_payload 364.2±1.39µs 406.3±0.68µs +11.56%

This broadly fits with expectations.

  • Decoding to raw value is ~70% faster, and can likely be improved up to 85% I think if we make it non-resumable.
  • I think costing::validate_sbor_payload is an anomoly. Otherwise, validation is ~10% slower. This would be fixed when it's resumable.

Details

TBC

Testing

Existing tests pass... Let's take a look at the benchmark comparison for validation!

And make VecTraverser use a shim to the new visitor pattern to validate it in tests
Copy link

Phylum Report Link

Copy link

github-actions bot commented Sep 25, 2024

Docker tags
docker.io/radixdlt/private-scrypto-builder:3ebc88b536

Copy link

github-actions bot commented Sep 25, 2024

Benchmark for 3ebc88b

Click to view benchmark
Test Base PR %
costing::bench_prepare_wasm 44.7±0.07ms 44.6±0.10ms -0.22%
costing::decode_encoded_i8_array_to_manifest_raw_value 19.4±0.03ms 5.6±0.00ms -71.13%
costing::decode_encoded_i8_array_to_manifest_value 42.3±0.04ms 41.4±0.10ms -2.13%
costing::decode_encoded_tuple_array_to_manifest_raw_value 63.4±0.06ms 18.1±0.36ms -71.45%
costing::decode_encoded_tuple_array_to_manifest_value 98.0±0.14ms 105.6±0.28ms +7.76%
costing::decode_encoded_u8_array_to_manifest_raw_value 32.7±0.15µs 31.9±0.17µs -2.45%
costing::decode_encoded_u8_array_to_manifest_value 42.3±0.06ms 41.4±0.05ms -2.13%
costing::decode_rpd_to_manifest_raw_value 12.6±0.03µs 5.3±0.02µs -57.94%
costing::decode_rpd_to_manifest_value 11.0±0.04µs 10.9±0.02µs -0.91%
costing::deserialize_wasm 1272.0±5.36µs 1256.6±2.85µs -1.21%
costing::execute_transaction_creating_big_vec_substates 697.7±10.47ms 699.5±9.53ms +0.26%
costing::execute_transaction_reading_big_vec_substates 591.8±1.78ms 590.7±2.13ms -0.19%
costing::instantiate_flash_loan 872.3±411.13µs 985.4±605.11µs +12.97%
costing::instantiate_radiswap 1058.5±1216.17µs 1095.1±1629.37µs +3.46%
costing::spin_loop 21.7±0.04ms 20.8±0.05ms -4.15%
costing::validate_sbor_payload 32.6±0.06µs 29.0±0.07µs -11.04%
costing::validate_sbor_payload_bytes 258.1±1.51ns 251.3±5.90ns -2.63%
costing::validate_secp256k1 76.8±0.16µs 76.7±0.09µs -0.13%
costing::validate_wasm 33.6±0.05ms 33.8±0.04ms +0.60%
decimal::add/0 8.4±0.00ns 8.4±0.00ns 0.00%
decimal::add/rust-native 9.9±0.01ns 9.9±0.01ns 0.00%
decimal::add/wasmi 223.2±0.54ns 226.1±0.33ns +1.30%
decimal::add/wasmi-call-native 2.2±0.00µs 2.1±0.00µs -4.55%
decimal::div/0 186.6±0.16ns 186.4±0.13ns -0.11%
decimal::from_string/0 157.7±0.43ns 158.8±0.26ns +0.70%
decimal::mul/0 149.7±0.10ns 149.6±0.10ns -0.07%
decimal::mul/rust-native 148.0±0.18ns 150.5±0.23ns +1.69%
decimal::mul/wasmi 12.1±0.02µs 11.9±0.04µs -1.65%
decimal::mul/wasmi-call-native 2.3±0.00µs 2.3±0.01µs 0.00%
decimal::pow/0 719.1±1.14ns 689.1±0.25ns -4.17%
decimal::pow/rust-native 668.9±0.49ns 669.1±0.50ns +0.03%
decimal::pow/wasmi 58.2±0.23µs 56.8±0.12µs -2.41%
decimal::pow/wasmi-call-native 2.6±0.01µs 2.5±0.00µs -3.85%
decimal::root/0 8.1±0.01µs 8.2±0.00µs +1.23%
decimal::sub/0 8.2±0.01ns 8.2±0.00ns 0.00%
decimal::to_string/0 444.4±0.38ns 436.1±0.52ns -1.87%
precise_decimal::add/0 9.1±0.01ns 9.0±0.01ns -1.10%
precise_decimal::add/rust-native 10.8±0.06ns 10.7±0.03ns -0.93%
precise_decimal::add/wasmi 276.3±1.19ns 283.1±0.46ns +2.46%
precise_decimal::add/wasmi-call-native 2.9±0.00µs 2.8±0.01µs -3.45%
precise_decimal::div/0 317.3±0.40ns 317.9±1.18ns +0.19%
precise_decimal::from_string/0 210.0±0.91ns 209.0±0.62ns -0.48%
precise_decimal::mul/0 366.2±0.28ns 370.5±1.54ns +1.17%
precise_decimal::mul/rust-native 321.6±0.71ns 320.7±0.34ns -0.28%
precise_decimal::mul/wasmi 34.3±0.13µs 34.0±0.29µs -0.87%
precise_decimal::mul/wasmi-call-native 3.3±0.01µs 3.2±0.00µs -3.03%
precise_decimal::pow/0 1967.3±4.05ns 1931.1±8.12ns -1.84%
precise_decimal::pow/rust-native 1540.0±3.88ns 1544.3±2.81ns +0.28%
precise_decimal::pow/wasmi 164.0±1.71µs 167.3±2.94µs +2.01%
precise_decimal::pow/wasmi-call-native 5.8±0.01µs 5.7±0.02µs -1.72%
precise_decimal::root/0 58.4±0.06µs 58.6±0.05µs +0.34%
precise_decimal::sub/0 9.2±0.02ns 9.2±0.04ns 0.00%
precise_decimal::to_string/0 709.8±3.01ns 693.9±1.90ns -2.24%
schema::validate_payload 365.8±0.88µs 409.6±0.53µs +11.97%
transaction::radiswap 5.1±0.02ms 5.1±0.02ms 0.00%
transaction::transfer 1881.5±4.32µs 1730.2±3.60µs -8.04%
transaction_processing::prepare 2.6±0.00ms 2.3±0.00ms -11.54%
transaction_processing::prepare_and_decompile 6.8±0.03ms 6.5±0.02ms -4.41%
transaction_processing::prepare_and_decompile_and_recompile 30.6±1.60ms 32.2±0.31ms +5.23%
transaction_validation::validate_manifest 42.9±0.05µs 42.7±0.03µs -0.47%
transaction_validation::verify_bls_2KB 1062.5±11.54µs 1004.3±8.79µs -5.48%
transaction_validation::verify_bls_32B 1005.8±11.56µs 1003.7±8.39µs -0.21%
transaction_validation::verify_ecdsa 74.8±0.20µs 74.5±0.04µs -0.40%
transaction_validation::verify_ed25519 55.0±0.07µs 54.8±0.14µs -0.36%

@dhedey dhedey changed the title tweak: Use a visitor pattern for more efficient traversals [ON TEMP HOLD] tweak: Use a visitor pattern for more efficient traversals Sep 25, 2024
@dhedey dhedey changed the title [ON TEMP HOLD] tweak: Use a visitor pattern for more efficient traversals [ON TEMP HOLD] tweak: Use a visitor pattern for more efficient SBOR traversals Sep 25, 2024
@dhedey dhedey changed the title [ON TEMP HOLD] tweak: Use a visitor pattern for more efficient SBOR traversals [TEMP ON HOLD] tweak: Use a visitor pattern for more efficient SBOR traversals Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant