-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueMap interface change #2117
ValueMap interface change #2117
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2117 +/- ##
=====================================
Coverage 79.3% 79.4%
=====================================
Files 121 121
Lines 20968 20968
=====================================
+ Hits 16646 16660 +14
+ Misses 4322 4308 -14 ☔ View full report in Codecov by Sentry. |
825d181
to
125c3fe
Compare
Thanks for review! main branch - metrics_counter ~4.8M iter/sec expandThroughput: 4,784,400 iterations/sec this branch - metrics_counter ~5.2M iter/sec expandThroughput: 5,175,800 iterations/sec main - histograms ~4.8M iter/sec expandThroughput: 4,726,400 iterations/sec this branch - metrics_histograms ~4.7M iter/sec expandThroughput: 4,820,400 iterations/sec Generally it looks that results fluctuate between the "stress_test" runs , not sure why... I didn't look too deep but there's there's at least two sources of randomness (1. generating attribute sets, 2. initializing HashMap), maybe this might be the issue. |
125c3fe
to
9f87703
Compare
removed redudant check from
|
9f87703
to
6968110
Compare
I was wrong regarding f64 NaN and Infinity, because actual type can be f64, and user can pass in anything he likes :) |
I am seeing a different results. For metrics_histogram, my stress test drops significantly with this PR branch!! Benchmarks is not a lot different. This usually indicates that some contention is getting introduced. Maybe #2117 (comment) is the reason? Could you move the index calculation to be completely outside. |
6968110
to
43d1878
Compare
Thanks for measuring it and sharing your results! Regarding performance in general I see at least few performance killers that would be easy to implement (and I would love to contribute).
Also collection phase needs some love too... which is another story :) |
Can you open separate issues so as to track this separately.
|
Sure,I'll definitely create separate tickets for there optimizations, but before I do that, I want to make sure
I have 1 (this) PR and 2 issue, that I want to implement before implementing/experimenting with optimizations, so I don't want to create an issue now, because I'll not be able to work on it anyway. |
bd47a96
to
bf24bca
Compare
badd0c6
to
c52c479
Compare
Just ran perf test of my box: Benchmarks for counter,histogram shows no difference in perf. (within noise level only). |
On my dev machine: counter: main=11.7 pr=13.4 |
713bb20
to
a503fef
Compare
This is insane difference... I wonder why this is so different on your platform compared to mine. #!/usr/bin/env bash
for i in $(seq 1 10);
do
git switch -
timeout 60s cargo run --release --package stress --bin metrics_histogram
done Then closed all applications, (to avoid any external noise) and went for a coffee for 10m :)) Here's the resultsmv@mv-t14:~/Projects/opentelemetry-rust$ ./run_stress.sh Switched to branch 'value-map-interface-change' Your branch is up to date with 'origin/value-map-interface-change'. Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk) Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress) Finished `release` profile [optimized] target(s) in 3.02s Running `target/release/metrics_histogram` Number of threads: 12Throughput: 5,077,200 iterations/sec Throughput: 5,085,400 iterations/sec Throughput: 5,081,000 iterations/sec Throughput: 5,081,200 iterations/sec Throughput: 4,509,600 iterations/sec Throughput: 4,172,000 iterations/sec Throughput: 4,189,000 iterations/sec Throughput: 4,204,800 iterations/sec Throughput: 4,220,800 iterations/sec Throughput: 4,221,200 iterations/sec Throughput: 4,230,400 iterations/sec Switched to branch 'main' Throughput: 4,423,000 iterations/sec Throughput: 4,401,000 iterations/sec Throughput: 4,396,800 iterations/sec Throughput: 4,409,800 iterations/sec Throughput: 4,406,000 iterations/sec Throughput: 4,432,800 iterations/sec Throughput: 4,409,600 iterations/sec Throughput: 4,451,600 iterations/sec Throughput: 4,396,800 iterations/sec Throughput: 4,402,200 iterations/sec Throughput: 4,395,600 iterations/sec Switched to branch 'value-map-interface-change' Throughput: 4,510,000 iterations/sec Throughput: 4,483,200 iterations/sec Throughput: 4,492,400 iterations/sec Throughput: 4,488,400 iterations/sec Throughput: 4,490,800 iterations/sec Throughput: 4,489,800 iterations/sec Throughput: 4,499,200 iterations/sec Throughput: 4,508,800 iterations/sec Throughput: 4,518,000 iterations/sec Throughput: 4,510,200 iterations/sec Throughput: 4,514,000 iterations/sec Switched to branch 'main' Throughput: 4,811,400 iterations/sec Throughput: 4,780,400 iterations/sec Throughput: 4,790,800 iterations/sec Throughput: 4,782,800 iterations/sec Throughput: 4,781,200 iterations/sec Throughput: 4,754,600 iterations/sec Throughput: 4,777,600 iterations/sec Throughput: 4,785,800 iterations/sec Throughput: 4,772,800 iterations/sec Throughput: 4,786,000 iterations/sec Throughput: 4,735,000 iterations/sec Switched to branch 'value-map-interface-change' Throughput: 4,493,200 iterations/sec Throughput: 4,463,600 iterations/sec Throughput: 4,472,000 iterations/sec Throughput: 4,466,200 iterations/sec Throughput: 4,470,000 iterations/sec Throughput: 4,470,000 iterations/sec Throughput: 4,485,600 iterations/sec Throughput: 4,482,200 iterations/sec Throughput: 4,469,600 iterations/sec Throughput: 4,498,200 iterations/sec Throughput: 4,509,000 iterations/sec Switched to branch 'main' Throughput: 5,150,600 iterations/sec Throughput: 5,133,000 iterations/sec Throughput: 5,132,400 iterations/sec Throughput: 5,138,200 iterations/sec Throughput: 5,127,600 iterations/sec Throughput: 5,133,200 iterations/sec Throughput: 5,133,800 iterations/sec Throughput: 5,127,600 iterations/sec Throughput: 5,138,600 iterations/sec Throughput: 5,140,000 iterations/sec Throughput: 5,147,000 iterations/sec Switched to branch 'value-map-interface-change' Throughput: 4,757,800 iterations/sec Throughput: 4,736,200 iterations/sec Throughput: 4,725,000 iterations/sec Throughput: 4,736,400 iterations/sec Throughput: 4,737,200 iterations/sec Throughput: 4,737,400 iterations/sec Throughput: 4,744,000 iterations/sec Throughput: 4,702,000 iterations/sec Throughput: 4,732,800 iterations/sec Throughput: 4,737,800 iterations/sec Throughput: 4,736,200 iterations/sec Switched to branch 'main' Throughput: 4,414,800 iterations/sec Throughput: 4,410,400 iterations/sec Throughput: 4,375,800 iterations/sec Throughput: 4,382,800 iterations/sec Throughput: 4,379,200 iterations/sec Throughput: 4,383,000 iterations/sec Throughput: 4,422,800 iterations/sec Throughput: 4,405,200 iterations/sec Throughput: 4,385,000 iterations/sec Throughput: 4,381,800 iterations/sec Throughput: 4,378,800 iterations/sec Switched to branch 'value-map-interface-change' Throughput: 4,756,200 iterations/sec Throughput: 4,727,400 iterations/sec Throughput: 4,734,200 iterations/sec Throughput: 4,729,000 iterations/sec Throughput: 4,732,200 iterations/sec Throughput: 4,732,000 iterations/sec Throughput: 4,735,400 iterations/sec Throughput: 4,736,000 iterations/sec Throughput: 4,720,200 iterations/sec Throughput: 4,721,600 iterations/sec Throughput: 4,736,000 iterations/sec Switched to branch 'main' Throughput: 4,688,600 iterations/sec Throughput: 4,658,400 iterations/sec Throughput: 4,676,400 iterations/sec Throughput: 4,695,200 iterations/sec Throughput: 4,672,600 iterations/sec Throughput: 4,656,400 iterations/sec Throughput: 4,657,000 iterations/sec Throughput: 4,659,400 iterations/sec Throughput: 4,660,000 iterations/sec Throughput: 4,656,000 iterations/sec Throughput: 4,674,400 iterations/sec I would say that results are very similar, sometimes |
a45d153
to
c1b4d15
Compare
@cijothomas and @lalitb thanks for sharing your results! I updated code by making it look as close as possible to what it was before (preserving existing bugs). |
c1b4d15
to
6420f8d
Compare
6420f8d
to
606d126
Compare
I just ran tests on another computer Here's the script that I used: #!/usr/bin/env bash
echo "CPU $(cat /proc/cpuinfo | grep 'name' | uniq)"
for i in $(seq 1 2);
do
echo "Current branch: $(git branch --show-current)"
echo "stress-test metrics_histogram"
timeout 30s cargo run --release --package stress --bin metrics_histogram
echo "stress-test metrics"
timeout 30s cargo run --release --package stress --bin metrics
git switch -
done Here's the full console outputfraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$ ./run_tests.sh Throughput: 12,594,800 iterations/sec Throughput: 12,642,800 iterations/sec Throughput: 12,645,600 iterations/sec Throughput: 12,571,800 iterations/sec stress-test metrics Throughput: 13,018,000 iterations/sec Throughput: 13,011,600 iterations/sec Throughput: 12,922,800 iterations/sec Throughput: 12,231,200 iterations/sec Throughput: 11,879,200 iterations/sec Switched to branch 'value-map-interface-change' Throughput: 11,383,000 iterations/sec Throughput: 11,422,600 iterations/sec Throughput: 11,432,800 iterations/sec Throughput: 11,428,800 iterations/sec stress-test metrics Throughput: 18,337,200 iterations/sec Throughput: 18,394,400 iterations/sec Throughput: 18,224,400 iterations/sec Throughput: 18,356,200 iterations/sec Throughput: 18,334,400 iterations/sec Switched to branch 'main' |
I think I know the reason for these results. Rust version: 1.70
Rust version: 1.72
Rust version: 1.75
Rust version: 1.78
Rust version: 1.81
script outcomefraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$ ./run_tests.sh 1.70-x86_64-unknown-linux-gnu unchanged - rustc 1.70.0 (90c541806 2023-05-31) Rust version: 1.70 Throughput: 16,570,200 iterations/sec stress-test metrics Throughput: 12,781,200 iterations/sec Switched to branch 'main' Throughput: 16,122,600 iterations/sec stress-test metrics Throughput: 17,320,600 iterations/sec Switched to branch 'value-map-interface-change' 1.72-x86_64-unknown-linux-gnu unchanged - rustc 1.72.1 (d5c2e9c34 2023-09-13) Rust version: 1.72 Throughput: 12,094,400 iterations/sec stress-test metrics Throughput: 12,431,200 iterations/sec Switched to branch 'main' Throughput: 12,051,000 iterations/sec stress-test metrics Throughput: 12,259,200 iterations/sec Switched to branch 'value-map-interface-change' 1.75-x86_64-unknown-linux-gnu unchanged - rustc 1.75.0 (82e1608df 2023-12-21) Rust version: 1.75 Throughput: 12,312,000 iterations/sec stress-test metrics Throughput: 12,505,000 iterations/sec Switched to branch 'main' Throughput: 12,621,400 iterations/sec stress-test metrics Throughput: 12,907,600 iterations/sec Switched to branch 'value-map-interface-change' 1.78-x86_64-unknown-linux-gnu unchanged - rustc 1.78.0 (9b00956e5 2024-04-29) Rust version: 1.78 Throughput: 11,197,800 iterations/sec stress-test metrics Throughput: 17,282,200 iterations/sec Switched to branch 'main' Throughput: 12,465,200 iterations/sec stress-test metrics Throughput: 12,887,200 iterations/sec Switched to branch 'value-map-interface-change' 1.81-x86_64-unknown-linux-gnu unchanged - rustc 1.81.0 (eeb90cda1 2024-09-04) Rust version: 1.81 Throughput: 11,088,800 iterations/sec stress-test metrics Throughput: 18,036,200 iterations/sec Switched to branch 'main' Throughput: 12,169,800 iterations/sec stress-test metrics Throughput: 12,572,400 iterations/sec Switched to branch 'value-map-interface-change' So how should we proceed? |
This is weird, I see different results now on the same machine, atlease don't see the degradation earlier observed with histogram. Would wait for someone else to confirm too. @fraillt - can you also gather the stats without using script (running manually), for these two rust versions in both release and debug(default) mode? CPU: CPU model name : AMD EPYC 7763 64-Core Processor, 8 cores rustc version: 1.70.0 histogram: counter: rustc version: 1.81 histogram: counter: |
CPU model name : AMD Ryzen 5 3600 6-Core Processor rustc version: 1.70.0 histogram: counter: rustc version: 1.81 histogram: counter: Rust version 1.70 with counter metrics is super weird... :/ |
Thanks @lalitb for your tests
I have mentioned that
Basically, the only functional change is that I removed
Which is probably ok, since histograms are unbounded anyway... Maybe you can test this explicitly (by adding this check in |
/// Some aggregators can do some computations before updating aggregator. | ||
/// This helps to reduce contention for aggregators because it makes | ||
/// [`Aggregator::update`] as short as possible. | ||
type PreComputedValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: PrecomputedValue
is not the best name as there is no precomputation happening for counters and gauges.
I think we could use something like MeasurementData
instead.
type PreComputedValue; | |
type MeasurementData; |
// Ignore NaN and infinity. | ||
// Only makes sense if T is f64, maybe this could be no-op for other cases? | ||
// TODO: uncomment once we know the reason for performance degradation | ||
// if f.is_infinite() || f.is_nan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas Let's add this check as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patiently waiting!
Changes
Absolute minimum changes to
ValueMap
in order to provide interface that can be applied for all kinds of metrics.I've tried to make this revision as small as possible, (leaving out optimization opportunities and bugs that noticed along the way), so it would be easier to review.
The benefit of this new interface is that it can be applied efficiently to all histograms. I have a proof in #2114 (which uses same interface) that it can be elegantly applied to
ExpoHistogram
as well.Few more points/notes that might be helpful for reviewer:
Histogram
might be a bit harder to review, because it required most changes, but essentially there's configuration extracted into newBucketsConfig
type, andAggregator
is implemented forMutex<Buckets<T>>
, the rest is basically compiler-driven-development.LastValue
,Sum
,PrecomputedSum
should be trivial to review, I just had to implementAggregator
interface forIncrement
andAssign
functionality, which is pretty trivial.AtomicTracker
andAtomicallyUpdate
had to be made as well, either to get rid of "unused code" warning, or required forIncrement
andAssign
. Generally these interfaces got simplified as well, as they no longer contain histogram-only related stuff.I really want to unify common/important code so all metrics could benefit from it.
And I really hope this revision is small enough to review efficiently :) Happy reviewing :)
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes