Added `serde` feature and support #15

nlordell · 2022-05-06T10:06:30Z

This PR adds serde feature to the ethnum crate for adding serde serialization and deserialization support to both signed and unsigned integer types. The default serialization implementation encodes/decodes 0x-prefixed hexadecimal strings.

Additionally, new convenience conversion methods were added for parsing integers from optionally prefixed strings (so parsing 0x2a and 42 to the same value) as well as modules for use with #[serde(with = "...")] for much the same purpose.

For example:

#[derive(Serialize, Deserialize)]
struct Foo {
    #[serde(with = "ethnum::serde::prefixed")]
    value: U256,
}

would allow serializing the following JSON documents:

{ "value": "0x2a" }
{ "value": "42" }

Finally, the hex, octal and binary formatting implementation supports the - formatting flag now. For rust standard integer types:

assert_eq!(format!("{:#x}", -10_i16), "0xfff6");

This means, there is no way to print a sign when formatting in a non-decimal radix. So for I256, support was added so that you can write:

assert_eq!(format!("{:-#x}", I256::new(-10)), "-0xa");

Note that the {:#x} format specifier still behaves the same way as they do for primitive integer types (i.e. sign-extended 2's complement).

nlordell · 2022-05-06T10:08:20Z

@gakonst - Got around to taking some time to add serde support to ethnum. Do you mind taking a look at the implementation?

Also, I tried to address your comment from #7 (comment) by adding convenience methods for doing the parsing both 0x2a and 42.

nlordell · 2022-05-06T10:12:41Z

src/fmt.rs

-                Some(result) => result,
-                None => return Err(pie(PosOverflow)),
+
+    if can_not_overflow::<T>(radix, is_signed_ty, digits) {


Ported from latest more optimized from_str_radix implementation from the Rust standard library: https://github.com/rust-lang/rust/blob/7f9e013ba62e7bf77a95ba29bcb2c5740b7b2059/library/core/src/num/mod.rs#L1067

gakonst

This is great! Lgtm. What are the highest leverage places where we can use this to achieve speedup over primitive_types/ethereum_types?

cc @rakita in case interesting for revm

rakita · 2022-05-07T09:33:09Z

I checked it when I was looking to find fast div and I think that primitive-types was just faster as it uses knuth algo for division:
primitive_types: https://github.com/paritytech/parity-common/blob/436cb0827f0e3238ccb80d7d453f756d126c0615/uint/src/uint.rs#L854
and with intrinsics enabled it was even faster: paritytech/parity-common@2d7f7e2
ethnum:

ethnum-rs/src/intrinsics/native/divmod.rs

Line 166 in c6ad614

for _ in 0..=shift {

maybe it can be backported to here as it is widely used algo. There is additional speedup from knuth algo that evmone uses from gmplib. This is a paper: https://gmplib.org/~tege/division-paper.pdf but that seems harder to backport

zkp_u256 is one other that implemented knuth but it had a little better comments there: https://github.com/0xProject/OpenZKP/blob/c28a5c66b6ee9b97bf177373ba148981df60b7fb/algebra/u256/src/arch/generic/knuth_division.rs#L96

nlordell · 2022-05-07T13:05:05Z

Awesome! That is a lot of great resources to look at.

nlordell · 2022-05-07T17:45:24Z

@rakita - so one trouble I have with benchmarking division is that there is a huge variation on performance depending on inputs. I re-ran the benchmarks against primitive-types and you are right that the performance of the long division at the end (the for (; shift>=0; --shift) loop at the end) is significantly worse than the primitive-types implementation (for some reason I thought Knuth was some generalization of long division over "words" and falsely assumed that the implementation in LLVM compiler runtime ☝️ was just a non-general - as in not for arbitrarily sized "big-integer" types - version of it).

How to run the benchmarks:

# First create a `primitive-types` baseline to compare against
cargo bench -p ethnum-bench --features primitive-types -- --save-baseline p div
# Now run the benchmarks comparing against the baseline
cargo bench -p ethnum-bench -- --baseline p div

Benchmark results on my early 2013 MBP:

U256::div (lo/lo)       time:   [50.672 ns 51.014 ns 51.334 ns]                              
                        change: [-44.193% -43.322% -42.463%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/lo)       time:   [155.89 ns 156.84 ns 157.83 ns]                              
                        change: [-20.304% -19.177% -18.016%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/hi)       time:   [414.45 ns 417.78 ns 421.05 ns]                              
                        change: [+233.79% +239.24% +244.38%] (p = 0.00 < 0.05)
                        Performance has regressed.

To me these results signal that the implementation here deals better with the case where the divisor fits within a u128. With that in mind, it should be possible to optimize the division implementation here to use something similar to the primitive-types implementation for the hi/hi case (i.e. division where both the divisor nor the dividend are greater than u128::MAX).

rakita · 2022-05-07T20:16:22Z

@rakita - so one trouble I have with benchmarking division is that there is a huge variation on performance depending on inputs. I re-ran the benchmarks against primitive-types and you are right that the performance of the long division at the end (the for (; shift>=0; --shift) loop at the end) is significantly worse than the primitive-types implementation (for some reason I thought Knuth was some generalization of long division over "words" and falsely assumed that the implementation in LLVM compiler runtime point_up was just a non-general - as in not for arbitrarily sized "big-integer" types - version of it).

How to run the benchmarks:
Benchmark results on my early 2013 MBP:
U256::div (lo/lo)       time:   [50.672 ns 51.014 ns 51.334 ns]                              
                        change: [-44.193% -43.322% -42.463%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/lo)       time:   [155.89 ns 156.84 ns 157.83 ns]                              
                        change: [-20.304% -19.177% -18.016%] (p = 0.00 < 0.05)
                        Performance has improved.

U256::div (hi/hi)       time:   [414.45 ns 417.78 ns 421.05 ns]                              
                        change: [+233.79% +239.24% +244.38%] (p = 0.00 < 0.05)
                        Performance has regressed.
To me these results signal that the implementation here deals better with the case where the divisor fits within a u128. With that in mind, it should be possible to optimize the division implementation here to use something similar to the primitive-types implementation for the hi/hi case (i.e. division where both the divisor nor the dividend are greater than u128::MAX).

That sounds about right, Knuth is fastest for big ones. hm I didn't check earlier but this is something similar that intx (evmone) is doing: https://github.com/chfast/intx/blob/de3388d75bda494972c4dceb22ccd1b0f74e1e97/include/intx/intx.hpp#L1788-L1823
that is probably one of the reasons why I got a performance boost when tried intx.

nlordell · 2022-05-08T07:41:27Z

I didn't check earlier but this is something similar that intx (evmone) is doing

It looks like it. From what I understood, It has fast paths for division where the divisor is less than u64::MAX (using the 128-by-64-bit division using reciprocal multiplication from the libgmp paper you linked above) and another fast path if the divisor is less than u128::MAX and falling back to Knuth.

The code in intx is very clear and I should be able to port it. I'll try to find some time to do it in the coming weeks.

nlordell added 2 commits May 6, 2022 11:43

Added serde feature and support

523e38e

cleanup features and CI

7b4d145

nlordell commented May 6, 2022

View reviewed changes

documentation fixes

d5fc2c9

nlordell mentioned this pull request May 6, 2022

Impls #7

Open

gakonst approved these changes May 7, 2022

View reviewed changes

nlordell mentioned this pull request May 7, 2022

Improve udivmod Performance #16

Open

nlordell merged commit dbc8b18 into main May 7, 2022

nlordell deleted the serde-2 branch May 25, 2022 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `serde` feature and support #15

Added `serde` feature and support #15

nlordell commented May 6, 2022

nlordell commented May 6, 2022

nlordell May 6, 2022

gakonst left a comment

rakita commented May 7, 2022

nlordell commented May 7, 2022

nlordell commented May 7, 2022

rakita commented May 7, 2022

nlordell commented May 8, 2022

Added serde feature and support #15

Added serde feature and support #15

Conversation

nlordell commented May 6, 2022

nlordell commented May 6, 2022

nlordell May 6, 2022

Choose a reason for hiding this comment

gakonst left a comment

Choose a reason for hiding this comment

rakita commented May 7, 2022

nlordell commented May 7, 2022

nlordell commented May 7, 2022

rakita commented May 7, 2022

nlordell commented May 8, 2022

Added `serde` feature and support #15

Added `serde` feature and support #15