Skip to content

fix: perf instrumentation not available for cpp codspeed #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

not-matthias
Copy link
Member

Moved from #13, so that it's linked to the correct Linear issue.

Blocked by CodSpeedHQ/instrument-hooks#6

Copy link

codspeed-hq bot commented Jul 1, 2025

CodSpeed Instrumentation Performance Report

Merging #17 will improve performances by ×2.5

Comparing cod-1040-perf-instrumentation-not-available-for-cpp-codspeed (c943b12) with main (835fcda)

Summary

⚡ 25 improvements
✅ 17 untouched benchmarks
🆕 20 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 BM_FibonacciIterative[50] N/A 174.4 ns N/A
🆕 BM_FibonacciRecursive[35] N/A 100.9 ms N/A
BarTest[MyFixture] 122.2 ns 61.9 ns +97.31%
DoubleTest[MyTemplatedFixture, double] 122.2 ns 61.9 ns +97.31%
FooTest[MyFixture] 122.2 ns 61.9 ns +97.31%
IntTest[MyTemplatedFixture, int] 122.2 ns 61.9 ns +97.31%
TestA[MyTemplate1, int] 122.2 ns 61.9 ns +97.31%
TestB[MyTemplate2, int, double] 122.2 ns 61.9 ns +97.31%
BM_Capture[int_string_test] 121.4 ns 61.4 ns +97.74%
BM_Capture[int_test] 121.4 ns 61.4 ns +97.74%
BM_memcpy[64] 334.4 ns 303.3 ns +10.26%
BM_memcpy[8] 305.6 ns 274.4 ns +11.34%
BM_rand_vector 182.8 ns 122.8 ns +48.87%
🆕 BM_sleep_100ms N/A 436.7 ns N/A
🆕 BM_sleep_100us N/A 436.7 ns N/A
🆕 BM_sleep_10ms N/A 436.7 ns N/A
🆕 BM_sleep_10us N/A 436.7 ns N/A
🆕 BM_sleep_1ms N/A 436.7 ns N/A
🆕 BM_sleep_1us N/A 436.7 ns N/A
🆕 BM_sleep_50ms N/A 436.7 ns N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Copy link

codspeed-hq bot commented Jul 1, 2025

CodSpeed WallTime Performance Report

Merging #17 will not alter performance

Comparing cod-1040-perf-instrumentation-not-available-for-cpp-codspeed (c943b12) with main (835fcda)

Summary

✅ 42 untouched benchmarks
🆕 20 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 BM_FibonacciIterative[50] N/A 62.2 ns N/A
🆕 BM_FibonacciRecursive[35] N/A 66.6 ms N/A
🆕 BM_sleep_100ms N/A 100.1 ms N/A
🆕 BM_sleep_100us N/A 158.5 µs N/A
🆕 BM_sleep_10ms N/A 10.1 ms N/A
🆕 BM_sleep_10us N/A 68.4 µs N/A
🆕 BM_sleep_1ms N/A 1.1 ms N/A
🆕 BM_sleep_1us N/A 59.2 µs N/A
🆕 BM_sleep_50ms N/A 50.1 ms N/A
🆕 BM_sleep_50us N/A 108.5 µs N/A
🆕 BM_FibonacciIterative[50] N/A 62.2 ns N/A
🆕 BM_FibonacciRecursive[35] N/A 66.6 ms N/A
🆕 BM_sleep_100ms N/A 100.1 ms N/A
🆕 BM_sleep_100us N/A 158.5 µs N/A
🆕 BM_sleep_10ms N/A 10.1 ms N/A
🆕 BM_sleep_10us N/A 68.5 µs N/A
🆕 BM_sleep_1ms N/A 1.1 ms N/A
🆕 BM_sleep_1us N/A 59.5 µs N/A
🆕 BM_sleep_50ms N/A 50.1 ms N/A
🆕 BM_sleep_50us N/A 108.6 µs N/A

Copy link
Contributor

@GuillaumeLagrange GuillaumeLagrange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have performance regression introduced on instrumented test, I'm guessing this is due to the instrument-hooks library introduction, could we hunt changes to make them either non existant or as minimal as possible ?

Callgraphs show that we pickup instrumentation in callgrind in a way that we did not before, if we don't change thsi it will be a breaking change: https://codspeed.io/CodSpeedHQ/codspeed-cpp/branches/cod-1040-perf-instrumentation-not-available-for-cpp-codspeed?runnerMode=Instrumentation

@art049
Copy link
Member

art049 commented Jul 1, 2025

The regression is quite problematic since it appears to introduce a significant amount of overhead here. For interpreted languages, it doesn't make a big difference, but here, it seems to be substantial for micro benchmarks (even if the ones with regressions are very small). And we'll probably encounter the same issue when we port this to codspeed-rust.

Copy link
Member

@art049 art049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few changes left

@not-matthias not-matthias force-pushed the cod-1040-perf-instrumentation-not-available-for-cpp-codspeed branch 11 times, most recently from c638d0d to e73c140 Compare July 3, 2025 15:06
@not-matthias
Copy link
Member Author

The benchmarks show the correct timings now. We're not optimizing away the memcpy and some benchmarks even have improved performance, since we're placing the measurement_stop closer to the benchmark (so we're not measuring extra instructions)

@not-matthias not-matthias force-pushed the cod-1040-perf-instrumentation-not-available-for-cpp-codspeed branch 4 times, most recently from 87b977a to 8a6b904 Compare July 4, 2025 13:07
Copy link
Contributor

@GuillaumeLagrange GuillaumeLagrange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

olgtm if becnhmark results don't vary to much, but this is still a breaking change

@@ -1065,6 +1065,8 @@ struct State::StateIterator {
bool operator!=(StateIterator const&) const {
if (BENCHMARK_BUILTIN_EXPECT(cached_ != 0, true)) return true;
#ifdef CODSPEED_INSTRUMENTATION
measurement_stop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will have to be a major release because we are essentially changing benchmark result values for end users.

I agree that this is better here, but we'll have to keep this in mind

@art049
Copy link
Member

art049 commented Jul 8, 2025

We have significant deltas on the following walltime benches:

🆕 BM_sleep_100ns N/A 58.8 µs N/A
🆕 BM_sleep_100us N/A 158.7 µs N/A
🆕 BM_sleep_10ms N/A 10.1 ms N/A
🆕 BM_sleep_1ms N/A 1.1 ms N/A
🆕 BM_sleep_1ns N/A 58.6 µs N/A
🆕 BM_sleep_1us N/A 59.6 µs N/A
🆕 BM_sleep_100ns N/A 58.6 µs N/A
🆕 BM_sleep_100us N/A 158.5 µs N/A
🆕 BM_sleep_1ns N/A 58.5 µs N/A
🆕 BM_sleep_1us N/A 59.5 µs N/A

Is this expected? Do we have a resolution issue? @not-matthias

Copy link
Member

@art049 art049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it seems we have significant issues with instrumentation. This time the benchmarks are getting twice as fast, this seems to be an issue.

We need to be sure of what's happening both for the walltime issue and the instrumentation

@not-matthias not-matthias force-pushed the cod-1040-perf-instrumentation-not-available-for-cpp-codspeed branch from b3d124e to b93318f Compare July 9, 2025 08:52
@not-matthias
Copy link
Member Author

not-matthias commented Jul 9, 2025

We have significant deltas on the following walltime benches:

🆕 BM_sleep_100ns N/A 58.8 µs N/A
🆕 BM_sleep_100us N/A 158.7 µs N/A
🆕 BM_sleep_10ms N/A 10.1 ms N/A
🆕 BM_sleep_1ms N/A 1.1 ms N/A
🆕 BM_sleep_1ns N/A 58.6 µs N/A
🆕 BM_sleep_1us N/A 59.6 µs N/A
🆕 BM_sleep_100ns N/A 58.6 µs N/A
🆕 BM_sleep_100us N/A 158.5 µs N/A
🆕 BM_sleep_1ns N/A 58.5 µs N/A
🆕 BM_sleep_1us N/A 59.5 µs N/A

Is this expected? Do we have a resolution issue? @not-matthias

We are not calculating this ourselves, but rather take the measurements from Google Benchmark (@GuillaumeLagrange pls correct me if i'm wrong). So my assumption is that it's also measuring parts of the benchmark framework which we don't have that much control over.

EDIT: I removed the nanosecond benchmarks since they are just too short to be measured correctly (we also don't have them in Rust). Even the microsecond benchmarks are quite unstable in Rust:

  • 100us: Rust
  • 1us: Rust
  • 100ns: Rust
  • For C++ it's roughly 50-100%

I tried to time it manually using this code and still got 150us for a 100us sleep:

auto start = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for(std::chrono::microseconds(100));
auto end = std::chrono::high_resolution_clock::now();

Also it seems we have significant issues with instrumentation. This time the benchmarks are getting twice as fast, this seems to be an issue.

We need to be sure of what's happening both for the walltime issue and the instrumentation

The instrumentation speedup is expected, since we moved the measurement_stop closer to the benchmark end which means we're not measuring additional unrelated instructions.

@not-matthias not-matthias force-pushed the cod-1040-perf-instrumentation-not-available-for-cpp-codspeed branch 3 times, most recently from 0d2e29e to 364e3ec Compare July 10, 2025 17:09
@not-matthias not-matthias force-pushed the cod-1040-perf-instrumentation-not-available-for-cpp-codspeed branch from b6d18f8 to c943b12 Compare July 10, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants