-
Notifications
You must be signed in to change notification settings - Fork 1
fix: perf instrumentation not available for cpp codspeed #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: perf instrumentation not available for cpp codspeed #17
Conversation
CodSpeed Instrumentation Performance ReportMerging #17 will improve performances by ×2.5Comparing Summary
Benchmarks breakdown
|
CodSpeed WallTime Performance ReportMerging #17 will not alter performanceComparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have performance regression introduced on instrumented test, I'm guessing this is due to the instrument-hooks library introduction, could we hunt changes to make them either non existant or as minimal as possible ?
Callgraphs show that we pickup instrumentation in callgrind in a way that we did not before, if we don't change thsi it will be a breaking change: https://codspeed.io/CodSpeedHQ/codspeed-cpp/branches/cod-1040-perf-instrumentation-not-available-for-cpp-codspeed?runnerMode=Instrumentation
The regression is quite problematic since it appears to introduce a significant amount of overhead here. For interpreted languages, it doesn't make a big difference, but here, it seems to be substantial for micro benchmarks (even if the ones with regressions are very small). And we'll probably encounter the same issue when we port this to codspeed-rust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a few changes left
c638d0d
to
e73c140
Compare
The benchmarks show the correct timings now. We're not optimizing away the memcpy and some benchmarks even have improved performance, since we're placing the |
87b977a
to
8a6b904
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
olgtm if becnhmark results don't vary to much, but this is still a breaking change
@@ -1065,6 +1065,8 @@ struct State::StateIterator { | |||
bool operator!=(StateIterator const&) const { | |||
if (BENCHMARK_BUILTIN_EXPECT(cached_ != 0, true)) return true; | |||
#ifdef CODSPEED_INSTRUMENTATION | |||
measurement_stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will have to be a major release because we are essentially changing benchmark result values for end users.
I agree that this is better here, but we'll have to keep this in mind
We have significant deltas on the following walltime benches:
Is this expected? Do we have a resolution issue? @not-matthias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it seems we have significant issues with instrumentation. This time the benchmarks are getting twice as fast, this seems to be an issue.
We need to be sure of what's happening both for the walltime issue and the instrumentation
b3d124e
to
b93318f
Compare
We are not calculating this ourselves, but rather take the measurements from Google Benchmark (@GuillaumeLagrange pls correct me if i'm wrong). So my assumption is that it's also measuring parts of the benchmark framework which we don't have that much control over. EDIT: I removed the nanosecond benchmarks since they are just too short to be measured correctly (we also don't have them in Rust). Even the microsecond benchmarks are quite unstable in Rust: I tried to time it manually using this code and still got 150us for a 100us sleep: auto start = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for(std::chrono::microseconds(100));
auto end = std::chrono::high_resolution_clock::now();
The instrumentation speedup is expected, since we moved the |
0d2e29e
to
364e3ec
Compare
b6d18f8
to
c943b12
Compare
Moved from #13, so that it's linked to the correct Linear issue.
Blocked by CodSpeedHQ/instrument-hooks#6