Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Formatter to replace DelayedFormat #1163

Closed
wants to merge 7 commits into from

Conversation

pitdicker
Copy link
Collaborator

@pitdicker pitdicker commented Jul 1, 2023

Add a new formatter that can work without allocating and has less performance overhead, as mentioned in #1127 (comment).

I have made an effort to make the formatting code more readable, and split it up into reasonable commits. My first PR to chrono that adds functionality but ends up with less lines 😄.

I would like to depend on this PR to get #1035 over the finish line. Serialization worked without allocating before, and the new offset formatter should work without allocating in order to use it there.

This doesn't yet include extra benchmarks or the rest of my proposal for the new formatting API. But I have enough in another branch to test it all works.

Four standalone format* functions chrono::format are deprecated: they were near unusable, taking &mut Formatter<'_> as an argument, which can't be constructed outside of a Display implementation.


Why a new formatter?

This adds a new type Formatter<I, O>, that can replace DelayedFormat<I> in a new formatting API.

Our current formatting code has noticeable overhead (see #94). This is mostly caused by one design choice: if DelayedFormat::new is given an offset, it first formats a timezone name into a String. In case of DateTime<FixedOffset> and DateTime<Local> this involves formatting the offset, as there is no name available. This is responsible for 20~25% of the ca. 40% overhead in the format_with_items benchmark.

Formatter<I, O> takes the offset as a generic, so it can delay formatting the timezone name until it is needed, which often is never.

I made a couple of other changes that all reduce the overhead a tiny bit or help readablility: split the formatting function into smaller methods on Formatter, make better use of match, format using a smaller integer type i32, and delay getting locale strings until use.

We also format to impl Write instead of String like in #1126 to work without alloc.

All combined we have only about ~10% overhead left compared to the format_manual benchmark, which seems good enough to me.

@pitdicker pitdicker force-pushed the new_formatter branch 3 times, most recently from 83f64e9 to 92e5477 Compare July 1, 2023 17:06
@pitdicker pitdicker force-pushed the new_formatter branch 3 times, most recently from 11f6c3b to e2d998d Compare July 8, 2023 11:51
@pitdicker
Copy link
Collaborator Author

I added the ability to format with formatting parameters that can work without allocations in no_std.

We need two adapters for fmt::Write. One that can count the number of characters written, so we know how much padding is required. And another that can truncate if we write more than the maximum allowed characters.

This does add some complexity, but not too much in my opinion. And it makes formatting with no_std support the same features as with std.

@pitdicker pitdicker force-pushed the new_formatter branch 12 times, most recently from 27c41a5 to 4df403b Compare July 15, 2023 06:31
@jtmoon79
Copy link
Contributor

jtmoon79 commented Jul 15, 2023

I really like this PR for the performance benefits.

A request for your PRs @pitdicker , are you able to set your PRs to Draft until you feel done with additions? I know this won't be possible for all PRs as many related things can be in flux, but maybe where it is possible. While I don't mind doing PR reviews, it might be more efficient for both of us if I know that a PR is "still cooking" before deciding to dive in. Just a suggestion.

Thanks for putting so much great work into this crate! 🙂

@pitdicker
Copy link
Collaborator Author

Thank for the reviews 😄.

I intended this to be done. This PR and two others (maybe more are coming) are split of from my branch to add a new formatting API which I keep working at, sorry.... Now at 50 commits 😞

In this PR I adjusted the benchmarks to be comparable to the to_rfc3339 benchmark, to get a second baseline besides format_manual.

Before:

bench_format            time:   [875.81 ns 877.40 ns 879.11 ns]
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

bench_format_with_items time:   [705.81 ns 707.47 ns 709.34 ns]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

bench_format_manual     time:   [589.20 ns 591.01 ns 593.20 ns]
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

bench_datetime_to_rfc3339
                        time:   [232.58 ns 233.11 ns 233.65 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The interesting numbers are:

  • format_manual using the formatting machinery of the standard library is ca. 2,5× slower to_rfc3339.
  • format_with items and format are 1,2× and 1,5× slower respectively than format_manual.

With my WIP branch that relies on this PR the numbers are:

bench_format            time:   [727.49 ns 731.78 ns 736.64 ns]
                        change: [-13.988% -12.839% -11.527%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_format_to_string  time:   [574.73 ns 577.34 ns 579.79 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

bench_format_with       time:   [441.08 ns 442.00 ns 443.02 ns]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

bench_format_with_items time:   [564.46 ns 566.93 ns 569.68 ns]
                        change: [-20.521% -20.049% -19.585%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

bench_format_manual     time:   [590.28 ns 592.68 ns 595.31 ns]
                        change: [+0.1228% +1.0523% +1.9683%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

bench_datetime_to_rfc3339
                        time:   [169.69 ns 170.18 ns 170.66 ns]
                        change: [-27.113% -26.795% -26.469%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

bench_datetime_to_rfc3339_opts
                        time:   [138.26 ns 138.67 ns 139.11 ns]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

The new benchmarks format_to_string and format_with use the new formatter directly without wrapping it in the old DelayedFormat.

  • The new format_to_string has comparable performance to format_manual
  • The new format_with takes only 0,75× the time of format_manual, and ca. 60% of the old format_with_items.

But I also found a way to optimize to_rfc3339 methods a little more. format_with is 2,6× times slower than that (no PR open for that yet).


Most of our time with formatting is spend converting numbers to strings of decimals. Printing the 9 digits of a nanosecond takes for example ca. 40% of the time spend in to_rfc3339.

The trick is to copy number formatting for small numbers from to_rfc3339: special-case numbers with only one or two digits to not go trough the number formatting of the standard library.

@pitdicker
Copy link
Collaborator Author

Disclaimer: Of course a lot of the performance depends on which formatting items are used. The benchmarked RFC 3339 format only uses numbers for example. So these numbers are only an indication.

@pitdicker
Copy link
Collaborator Author

Included a commit from my WIP branch here: collect all formatting tests in the formatting module. So we can see a bit better which parts are covered.

@pitdicker pitdicker force-pushed the new_formatter branch 2 times, most recently from 5eca600 to 24b3fcb Compare July 23, 2023 08:28
@pitdicker pitdicker mentioned this pull request Jul 24, 2023
@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #1163 (cef3f54) into 0.4.x (a47e0e3) will increase coverage by 0.05%.
The diff coverage is 91.94%.

@@            Coverage Diff             @@
##            0.4.x    #1163      +/-   ##
==========================================
+ Coverage   91.24%   91.30%   +0.05%     
==========================================
  Files          38       38              
  Lines       17062    17039      -23     
==========================================
- Hits        15568    15557      -11     
+ Misses       1494     1482      -12     
Files Changed Coverage Δ
src/format/mod.rs 85.04% <ø> (ø)
src/format/formatting.rs 94.03% <91.94%> (+1.49%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Formatter { date, time, offset, items, locale: default_locale() }
}

/// Makes a new `DelayedFormat` value out of local date and time, UTC offset and locale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant:

  /// Makes a new `Formatter` value out of ...

Off: Offset + Display,
{
/// Makes a new `Formatter` value out of local date and time and UTC offset.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest mentioning

 /// Uses the `default_locale()` as the locale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants