Check test cases with measurements #2161

Kobzol · 2025-06-17T15:56:14Z

With the new design, it will be possible to backfill results into the DB, for example if you ask on a PR that you want to see results for the cranelift backend (which is not benchmarked by default), the collector will go back and actually backfill cranelift backend data for the parent master commit.

To support that, we need to expand the notion of a benchmark being "done". Right now, we record a (artifact, benchmark_name) tuple into the DB (called a step) when a benchmark begins, and then if we ever encounter the same tuple again, we don't benchmark it again. That's not ideal, because if an error happened and no data was generated, you won't be able to retry the collection without removing everything for the given artifact from the DB. And mainly, you cannot backfill more results (e.g. by running only Debug first, and then backfilling Opt, which is useful also for local experiments).

This PR expands the concept of a benchmark being done by actually checking which compile-time test cases are present in the DB. We cheat a bit to have better perf - if there is at least one recorded statistic in the DB for a given test case, we consider it to be done (so we essentially ignore missing iterations, but that should be a niche edge case).

Even though this logic is mostly useful for the new scheme, which is not implemented yet, I decided to also implement it for the current benchmarking logic, because it's useful for local experiments.

Best reviewed commit by commit.

database/src/pool/postgres.rs

Mark-Simulacrum · 2025-06-20T12:42:45Z

Haven't looked at code yet, but can you clarify how we avoid trying to benchmark e.g. 1000s of artifacts when a new benchmark or new scenario is added? In other words, IIUC the backfill being referenced shouldn't always apply, right? (At least not on the official collector)?

Kobzol · 2025-06-20T13:01:22Z

It's a bit complicated 😅 Details are in https://hackmd.io/wq30YNEIQMSFLWWcWDSI9A. The TLDR is:

We have a permanent table (benchmark_request) that stores all requests for benchmarking artifacts. Once a benchmark is completed there, we mark it as completed, and it will never be resubmitted to a collector again (thus we won't auto backfill old PRs when benchmark parameters are added).
We have a transient table (job_queue), which contains specific jobs for a certain collector machine to benchmark. When a try build needs to backfill certain benchmark parameters, it will also create jobs for the master parent commit that will backfill the missing results into the DB.

Note that this PR does not actually allow backfill with the old system, it should work the same as before. It just allows backfill in the future with the new system, while being compatible with both systems.

Mark-Simulacrum · 2025-06-21T16:22:48Z

collector/src/compile/benchmark/mod.rs

+                                .into_iter()
+                                .map(|test_case| (*scenario, test_case))
+                        })
+                        .filter(|(_, test_case)| !already_computed.contains(test_case))


Shouldn't this condition be more complex? For example, if we add a new incr-patched scenario, we can't run the build for that without going through incr-clean (at least) - and probably the preceding patches, too.

Mark-Simulacrum · 2025-06-21T16:24:14Z

collector/src/compile/benchmark/mod.rs

+        backend: &CodegenBackend,
+        target: &Target,
+    ) -> Vec<CompileTestCase> {
+        self.patches


Are we expecting this to get more complicated later? It seems a bit wasteful (I guess fine in practice...) to produce a relatively large vec and then dedup most of it away.

Mark-Simulacrum · 2025-06-21T16:24:37Z

collector/src/compile/benchmark/mod.rs

+                    if remaining_scenarios.is_empty() {
+                        continue;
+                    }
+                    remaining_scenarios.sort();


nit: if we're going to sort, maybe just use that for the dedup rather than intermediating through the HashSet?

Kobzol requested a review from Mark-Simulacrum June 17, 2025 15:56

Jamesbarford requested changes Jun 18, 2025

View reviewed changes

database/src/pool/postgres.rs Outdated Show resolved Hide resolved

Kobzol force-pushed the check-test-cases-with-measurements branch from 4aa85e6 to 6e4c94c Compare June 18, 2025 07:51

Jamesbarford mentioned this pull request Jun 20, 2025

rustc-perf improvements rust-lang/rust-project-goals#275

Open

7 tasks

Kobzol added 5 commits June 20, 2025 14:05

Add function for finding computed test cases for an artifact

956ae04

Store already computed compile test cases

2cc9a44

Filter compile-time test cases on a more granular level

428d568

Do not warn if a step is ended multiple times

83ed891

Sort scenarios for deterministic output

b4ebd59

Kobzol force-pushed the check-test-cases-with-measurements branch from 6e4c94c to b4ebd59 Compare June 20, 2025 12:05

Fix imports

685a1ff

Mark-Simulacrum reviewed Jun 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check test cases with measurements #2161

Check test cases with measurements #2161

Uh oh!

Kobzol commented Jun 17, 2025

Uh oh!

Uh oh!

Mark-Simulacrum commented Jun 20, 2025

Uh oh!

Kobzol commented Jun 20, 2025 •

edited

Loading

Uh oh!

Mark-Simulacrum Jun 21, 2025

Uh oh!

Mark-Simulacrum Jun 21, 2025

Uh oh!

Mark-Simulacrum Jun 21, 2025

Uh oh!

Uh oh!

Check test cases with measurements #2161

Are you sure you want to change the base?

Check test cases with measurements #2161

Uh oh!

Conversation

Kobzol commented Jun 17, 2025

Uh oh!

Uh oh!

Mark-Simulacrum commented Jun 20, 2025

Uh oh!

Kobzol commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Jun 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kobzol commented Jun 20, 2025 •

edited

Loading