Skip to content

Conversation

weihanglo
Copy link
Member

@weihanglo weihanglo commented Oct 17, 2025

What does this PR try to resolve?

This was found during experimenting -Zbuild-analysis with ndjson.

From me tracing the code with cargo expand, basically there shouldn't have any significant performance difference between serde(flatten) and inlining all the fields. Here the differences between them

  • flatten one calls Serialize::serialize_map without fields size hint so cannot pre-allocate Vec with Vec::with_capacity, whereas inline case calls Serialize::serialize_struct with a known length of fields.
  • flatten would end up calling Serializer::serialize_map and line calls Serializer::serialize_struct. And in serde_json serializer serialize_struct actually call serailze_map. So no difference on serializer side.
  • There might be some function calls not inlined I like FlatMapSerializer but I doubt it is costly than allocation.

Here is the cargo-expand'd result:

#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    #[serde(flatten)]
    data: D,
}

#[derive(Serialize)]
struct Bar {
    a: bool,
}

// Expand to

extern crate serde as _serde;
impl<D: Serialize> _serde::Serialize for Foo<D>
where
    D: _serde::Serialize,
{
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_map(
            __serializer,
            _serde::__private228::None,
        )?;
        _serde::ser::SerializeMap::serialize_entry(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::Serialize::serialize(
            &&self.data,
            _serde::__private228::ser::FlatMapSerializer(&mut __serde_state),
        )?;
        _serde::ser::SerializeMap::end(__serde_state)
    }
}

impl _serde::Serialize for Bar {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Bar",
            false as usize + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}
#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    a: bool,
}

// Expand to

impl<D: Serialize> _serde::Serialize for Foo<D> {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Foo",
            false as usize + 1 + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}

How to test and review this PR?

CI passing.

One change is that reason will no longer be the first field. We could activate the preserve_order feature in serde_json if we want, though that may get more perf loess than the gain.

See the benchmark result in #16130 (comment)

This was found during experimenting `-Zbuild-analysis` with ndjson.

From me tracing the code with `cargo expand`, basically there shouldn't
have any significant performance difference between `serde(flatten)`
and inlining all the fields. Here the differences between them

* flatten one calls `Serialize::serialize_map` without fields size
  hint so cannot pre-allocate Vec with `Vec::with_capacity`,
  whereas inline case calls `Serialize::serialize_struct` with a
  known length of fields.
* flatten would end up calling `Serializer::serialize_map`
  and line calls `Serializer::serialize_struct`. And in serde_json
  serializer `serialize_struct` actually call `serailze_map`.
  So no difference on serializer side.
* There might be some function calls not inlined I like
  `FlatMapSerializer` but I doubt it is costly than allocation.

Here is the `cargo-expand`'d result:

```rust
#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    #[serde(flatten)]
    data: D,
}

#[derive(Serialize)]
struct Bar {
    a: bool,
}

// Expand to

extern crate serde as _serde;
impl<D: Serialize> _serde::Serialize for Foo<D>
where
    D: _serde::Serialize,
{
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_map(
            __serializer,
            _serde::__private228::None,
        )?;
        _serde::ser::SerializeMap::serialize_entry(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::Serialize::serialize(
            &&self.data,
            _serde::__private228::ser::FlatMapSerializer(&mut __serde_state),
        )?;
        _serde::ser::SerializeMap::end(__serde_state)
    }
}

impl _serde::Serialize for Bar {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Bar",
            false as usize + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}
```

```rust
#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    a: bool,
}

// Expand to

impl<D: Serialize> _serde::Serialize for Foo<D> {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Foo",
            false as usize + 1 + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}
```
@rustbot rustbot added A-console-output Area: Terminal output, colors, progress bar, etc. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 17, 2025

r? @epage

rustbot has assigned @epage.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@weihanglo
Copy link
Member Author

Just used Claude Haiku 4.5 to generate a criterioin benchamrk script for me. The benchmark was run under opt-level=3 and debug=false.

Here is the result:

Message Type master d80156f serde(flatten) Improvement
BuildFinished 173.91 ns 89.814 ns 48.5%
Artifact 1.4953 µs 1.3515 µs 9.6%

BTW, it is interesting that I can directly pull in local path dependencies in -Zscript.

Click to expand the AI-generated benchmark script (reviewed by human)

#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[package]
name = "bench_message_serialization"
edition = "2024"

[dependencies]
cargo = { path = "." }
cargo-util-schemas = { path = "crates/cargo-util-schemas" }
criterion = { version = "0.5" }
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["raw_value"] }

[profile.dev]
opt-level = 3
debug = false
---

//! # Machine Message Serialization Benchmark (Criterion-based)
//!
//! ## Quick Start
//! ```sh
//! cargo +nightly -Zscript bench_message_serialization.rs
//! ```
//!
//! ## Overview
//!
//! This benchmark uses criterion.rs for statistically rigorous benchmarking of
//! real message types from the cargo crate (`cargo::util::machine_message`):
//! - BuildFinished (simple)
//! - Artifact (complex)
//!
//! Compares:
//! - Original: Double serialization + string manipulation
//! - Optimized: Single serialization using `#[serde(flatten)]`
//!
//! ## Features
//!
//! - Multiple samples (100) for statistical reliability
//! - Automatic outlier detection and handling
//! - Confidence intervals and variance analysis
//! - Statistical significance testing

use cargo::core::Edition;
use cargo::core::Target;
use cargo::core::compiler::CrateType;
use cargo::util::machine_message::{Artifact, ArtifactDebuginfo, ArtifactProfile, BuildFinished};
use cargo_util_schemas::core::PackageIdSpec;
use criterion::{Criterion, black_box};
use serde::Serialize;
use std::path::PathBuf;
use std::time::Duration;

// ============================================================================
// Trait with Both Original and Optimized Methods
// ============================================================================

pub trait Message: Serialize {
    fn reason(&self) -> &str;

    fn to_json_string_original(&self) -> String {
        // Before: 2 serializations + string concatenation
        let json = serde_json::to_string(self).unwrap();
        assert!(json.starts_with("{\""));
        let reason = serde_json::json!(self.reason());
        format!("{{\"reason\":{},{}", reason, &json[1..])
    }

    fn to_json_string_optimized(&self) -> String {
        // After: 1 serialization using flatten
        #[derive(Serialize)]
        struct WithReason<'a, S: Serialize> {
            reason: &'a str,
            #[serde(flatten)]
            msg: &'a S,
        }
        let with_reason = WithReason {
            reason: self.reason(),
            msg: &self,
        };
        serde_json::to_string(&with_reason).unwrap()
    }
}

impl Message for BuildFinished {
    fn reason(&self) -> &str {
        "build-finished"
    }
}

impl<'a> Message for Artifact<'a> {
    fn reason(&self) -> &str {
        "compiler-artifact"
    }
}

// ============================================================================
// Test Data Builders
// ============================================================================

fn build_finished() -> BuildFinished {
    BuildFinished { success: true }
}

fn artifact<'a>() -> Artifact<'a> {
    let target = Box::leak(Box::new(Target::lib_target(
        "my_package",
        vec![CrateType::Rlib],
        PathBuf::from("/home/user/projects/my-package/src/lib.rs"),
        Edition::Edition2021,
    )));

    Artifact {
        package_id: PackageIdSpec::parse("my-package").unwrap(),
        manifest_path: PathBuf::from("/home/user/projects/my-package/Cargo.toml"),
        target,
        profile: ArtifactProfile {
            opt_level: "3",
            debuginfo: Some(ArtifactDebuginfo::Int(2)),
            debug_assertions: false,
            overflow_checks: false,
            test: false,
        },
        features: vec![
            "default".to_string(),
            "experimental".to_string(),
            "feature3".to_string(),
            "feature4".to_string(),
            "feature5".to_string(),
            "feature6".to_string(),
            "feature7".to_string(),
        ],
        filenames: vec![
            PathBuf::from("/home/user/projects/my-package/target/debug/deps/libmy_package.rlib"),
            PathBuf::from("/home/user/projects/my-package/target/debug/deps/libmy_package.d"),
        ],
        executable: None,
        fresh: false,
    }
}

// ============================================================================
// Benchmark Helper
// ============================================================================

fn benchmark_message<M: Message>(criterion: &mut Criterion, name: &str, msg: M) {
    let mut group = criterion.benchmark_group(name);
    group.sample_size(100);
    group.measurement_time(Duration::from_secs(5));
    group.warm_up_time(Duration::from_secs(3));

    group.bench_function("original", |b| {
        b.iter(|| black_box(&msg).to_json_string_original())
    });

    group.bench_function("optimized", |b| {
        b.iter(|| black_box(&msg).to_json_string_optimized())
    });

    group.finish();
}

// ============================================================================
// Main
// ============================================================================

fn main() {
    let mut criterion = Criterion::default()
        .measurement_time(Duration::from_secs(5))
        .sample_size(100)
        .warm_up_time(Duration::from_secs(3))
        .with_plots();

    eprintln!();
    eprintln!("╔══════════════════════════════════════════════════════════╗");
    eprintln!("║   Machine Message Serialization Benchmark (Criterion)    ║");
    eprintln!("║   Statistical Analysis • Real Cargo Types                ║");
    eprintln!("╚══════════════════════════════════════════════════════════╝");
    eprintln!();

    // Verify correctness before benchmarking
    {
        let bf = build_finished();
        let orig = bf.to_json_string_original();
        let opt = bf.to_json_string_optimized();
        assert_eq!(orig, opt, "BuildFinished outputs don't match!");

        let art = artifact();
        let orig = art.to_json_string_original();
        let opt = art.to_json_string_optimized();
        assert_eq!(orig, opt, "Artifact outputs don't match!");

        eprintln!("✓ Correctness check passed\n");
    }

    eprintln!("📊 Benchmarking BuildFinished (simple message)...");
    eprintln!();
    benchmark_message(&mut criterion, "BuildFinished", build_finished());
    eprintln!();

    eprintln!("📊 Benchmarking Artifact (complex message)...");
    eprintln!();
    benchmark_message(&mut criterion, "Artifact", artifact());
    eprintln!();

    criterion.final_summary();

    eprintln!();
    eprintln!("╔════════════════════════════════════════╗");
    eprintln!("║   Benchmark Complete                   ║");
    eprintln!("║   Results saved to: target/criterion/  ║");
    eprintln!("╚════════════════════════════════════════╝");
    eprintln!();
}

@weihanglo weihanglo added Performance Gotta go fast! A-json-output Area: JSON message output and removed A-console-output Area: Terminal output, colors, progress bar, etc. labels Oct 20, 2025
@epage
Copy link
Contributor

epage commented Oct 20, 2025

From me tracing the code with cargo expand, basically there shouldn't have any significant performance difference between serde(flatten) and inlining all the fields. Here the differences between them

Deserialize is where differences are noticed but we're not really setup for that so it should be fine

@epage
Copy link
Contributor

epage commented Oct 20, 2025

Here is the result:
Message Type master d80156f serde(flatten) Improvement
BuildFinished 173.91 ns 89.814 ns 48.5%
Artifact 1.4953 µs 1.3515 µs 9.6%

BTW, it is interesting that I can directly pull in local path dependencies in -Zscript.

BuildFinished is relatively small, making it more sensitive to change.

I'm surprised we noticed that big of a difference for Artifact. Our old version is doing 3 distinct allocations (to_string, json!, format!) vs just 1 (to_string). I'm surprised it makes that much of a difference. Likely the way to get the original version faster is to stream the content all into the same string which would be uglier.

Since this simplifies the code, I'm good with it. I do like having reason first though.

In case there is something unexpected, like key ordering, I looked up when this code was added (#6081) and it doesn't seem like we're losing anything with this change.

@epage epage added this pull request to the merge queue Oct 20, 2025
Merged via the queue into rust-lang:master with commit 802826e Oct 20, 2025
25 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2025
@weihanglo weihanglo deleted the serialize branch October 20, 2025 19:04
bors added a commit to rust-lang/rust that referenced this pull request Oct 22, 2025
Update cargo submodule

7 commits in 367fd9f213750cd40317803dd0a5a3ce3f0c676d..344c4567c634a25837e3c3476aac08af84cf9203
2025-10-15 15:01:32 +0000 to 2025-10-21 21:29:43 +0000
- refactor: Centralize CONTEXT style (rust-lang/cargo#16135)
- chore(triagebot): `A-json-output` for machine_message.rs (rust-lang/cargo#16133)
- refactor: JSON message with less allocations (rust-lang/cargo#16130)
- More warning conversions (rust-lang/cargo#16126)
- fix(check): Fix suggested command for bin package (rust-lang/cargo#16127)
- fix(script): Remove name sanitiztion outside what is strictly required (rust-lang/cargo#16120)
- refactor: Centralize some more styling (rust-lang/cargo#16124)

r? ghost
@rustbot rustbot added this to the 1.92.0 milestone Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-json-output Area: JSON message output Performance Gotta go fast!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants