Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added serde feature and support #15

Merged
merged 3 commits into from
May 7, 2022
Merged

Added serde feature and support #15

merged 3 commits into from
May 7, 2022

Conversation

nlordell
Copy link
Owner

@nlordell nlordell commented May 6, 2022

This PR adds serde feature to the ethnum crate for adding serde serialization and deserialization support to both signed and unsigned integer types. The default serialization implementation encodes/decodes 0x-prefixed hexadecimal strings.

Additionally, new convenience conversion methods were added for parsing integers from optionally prefixed strings (so parsing 0x2a and 42 to the same value) as well as modules for use with #[serde(with = "...")] for much the same purpose.

For example:

#[derive(Serialize, Deserialize)]
struct Foo {
    #[serde(with = "ethnum::serde::prefixed")]
    value: U256,
}

would allow serializing the following JSON documents:

{ "value": "0x2a" }
{ "value": "42" }

Finally, the hex, octal and binary formatting implementation supports the - formatting flag now. For rust standard integer types:

assert_eq!(format!("{:#x}", -10_i16), "0xfff6");

This means, there is no way to print a sign when formatting in a non-decimal radix. So for I256, support was added so that you can write:

assert_eq!(format!("{:-#x}", I256::new(-10)), "-0xa");

Note that the {:#x} format specifier still behaves the same way as they do for primitive integer types (i.e. sign-extended 2's complement).

@nlordell
Copy link
Owner Author

nlordell commented May 6, 2022

@gakonst - Got around to taking some time to add serde support to ethnum. Do you mind taking a look at the implementation?

Also, I tried to address your comment from #7 (comment) by adding convenience methods for doing the parsing both 0x2a and 42.

Some(result) => result,
None => return Err(pie(PosOverflow)),

if can_not_overflow::<T>(radix, is_signed_ty, digits) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported from latest more optimized from_str_radix implementation from the Rust standard library: https://github.com/rust-lang/rust/blob/7f9e013ba62e7bf77a95ba29bcb2c5740b7b2059/library/core/src/num/mod.rs#L1067

@nlordell nlordell mentioned this pull request May 6, 2022
Copy link

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Lgtm. What are the highest leverage places where we can use this to achieve speedup over primitive_types/ethereum_types?

cc @rakita in case interesting for revm

@rakita
Copy link

rakita commented May 7, 2022

I checked it when I was looking to find fast div and I think that primitive-types was just faster as it uses knuth algo for division:
primitive_types: https://github.com/paritytech/parity-common/blob/436cb0827f0e3238ccb80d7d453f756d126c0615/uint/src/uint.rs#L854
and with intrinsics enabled it was even faster: paritytech/parity-common@2d7f7e2
ethnum:

for _ in 0..=shift {

maybe it can be backported to here as it is widely used algo. There is additional speedup from knuth algo that evmone uses from gmplib. This is a paper: https://gmplib.org/~tege/division-paper.pdf but that seems harder to backport

zkp_u256 is one other that implemented knuth but it had a little better comments there: https://github.com/0xProject/OpenZKP/blob/c28a5c66b6ee9b97bf177373ba148981df60b7fb/algebra/u256/src/arch/generic/knuth_division.rs#L96

@nlordell
Copy link
Owner Author

nlordell commented May 7, 2022

Awesome! That is a lot of great resources to look at.

@nlordell
Copy link
Owner Author

nlordell commented May 7, 2022

@rakita - so one trouble I have with benchmarking division is that there is a huge variation on performance depending on inputs. I re-ran the benchmarks against primitive-types and you are right that the performance of the long division at the end (the for (; shift>=0; --shift) loop at the end) is significantly worse than the primitive-types implementation (for some reason I thought Knuth was some generalization of long division over "words" and falsely assumed that the implementation in LLVM compiler runtime ☝️ was just a non-general - as in not for arbitrarily sized "big-integer" types - version of it).

How to run the benchmarks:
# First create a `primitive-types` baseline to compare against
cargo bench -p ethnum-bench --features primitive-types -- --save-baseline p div
# Now run the benchmarks comparing against the baseline
cargo bench -p ethnum-bench -- --baseline p div

Benchmark results on my early 2013 MBP:

U256::div (lo/lo)       time:   [50.672 ns 51.014 ns 51.334 ns]                              
                        change: [-44.193% -43.322% -42.463%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/lo)       time:   [155.89 ns 156.84 ns 157.83 ns]                              
                        change: [-20.304% -19.177% -18.016%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/hi)       time:   [414.45 ns 417.78 ns 421.05 ns]                              
                        change: [+233.79% +239.24% +244.38%] (p = 0.00 < 0.05)
                        Performance has regressed.

To me these results signal that the implementation here deals better with the case where the divisor fits within a u128. With that in mind, it should be possible to optimize the division implementation here to use something similar to the primitive-types implementation for the hi/hi case (i.e. division where both the divisor nor the dividend are greater than u128::MAX).

@nlordell nlordell merged commit dbc8b18 into main May 7, 2022
@rakita
Copy link

rakita commented May 7, 2022

@rakita - so one trouble I have with benchmarking division is that there is a huge variation on performance depending on inputs. I re-ran the benchmarks against primitive-types and you are right that the performance of the long division at the end (the for (; shift>=0; --shift) loop at the end) is significantly worse than the primitive-types implementation (for some reason I thought Knuth was some generalization of long division over "words" and falsely assumed that the implementation in LLVM compiler runtime point_up was just a non-general - as in not for arbitrarily sized "big-integer" types - version of it).

How to run the benchmarks:
Benchmark results on my early 2013 MBP:

U256::div (lo/lo)       time:   [50.672 ns 51.014 ns 51.334 ns]                              
                        change: [-44.193% -43.322% -42.463%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/lo)       time:   [155.89 ns 156.84 ns 157.83 ns]                              
                        change: [-20.304% -19.177% -18.016%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/hi)       time:   [414.45 ns 417.78 ns 421.05 ns]                              
                        change: [+233.79% +239.24% +244.38%] (p = 0.00 < 0.05)
                        Performance has regressed.

To me these results signal that the implementation here deals better with the case where the divisor fits within a u128. With that in mind, it should be possible to optimize the division implementation here to use something similar to the primitive-types implementation for the hi/hi case (i.e. division where both the divisor nor the dividend are greater than u128::MAX).

That sounds about right, Knuth is fastest for big ones. hm I didn't check earlier but this is something similar that intx (evmone) is doing: https://github.com/chfast/intx/blob/de3388d75bda494972c4dceb22ccd1b0f74e1e97/include/intx/intx.hpp#L1788-L1823
that is probably one of the reasons why I got a performance boost when tried intx.

@nlordell
Copy link
Owner Author

nlordell commented May 8, 2022

I didn't check earlier but this is something similar that intx (evmone) is doing

It looks like it. From what I understood, It has fast paths for division where the divisor is less than u64::MAX (using the 128-by-64-bit division using reciprocal multiplication from the libgmp paper you linked above) and another fast path if the divisor is less than u128::MAX and falling back to Knuth.

The code in intx is very clear and I should be able to port it. I'll try to find some time to do it in the coming weeks.

@nlordell nlordell deleted the serde-2 branch May 25, 2022 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants