Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueMap interface change #2117

Merged
merged 5 commits into from
Nov 1, 2024

Conversation

fraillt
Copy link
Contributor

@fraillt fraillt commented Sep 14, 2024

Changes

Absolute minimum changes to ValueMap in order to provide interface that can be applied for all kinds of metrics.
I've tried to make this revision as small as possible, (leaving out optimization opportunities and bugs that noticed along the way), so it would be easier to review.
The benefit of this new interface is that it can be applied efficiently to all histograms. I have a proof in #2114 (which uses same interface) that it can be elegantly applied to ExpoHistogram as well.

Few more points/notes that might be helpful for reviewer:

  • Histogram might be a bit harder to review, because it required most changes, but essentially there's configuration extracted into new BucketsConfig type, and Aggregator is implemented for Mutex<Buckets<T>>, the rest is basically compiler-driven-development.
  • LastValue, Sum, PrecomputedSum should be trivial to review, I just had to implement Aggregator interface for Increment and Assign functionality, which is pretty trivial.
  • Changes to AtomicTracker and AtomicallyUpdate had to be made as well, either to get rid of "unused code" warning, or required for Increment and Assign. Generally these interfaces got simplified as well, as they no longer contain histogram-only related stuff.

I really want to unify common/important code so all metrics could benefit from it.
And I really hope this revision is small enough to review efficiently :) Happy reviewing :)

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@fraillt fraillt requested a review from a team September 14, 2024 14:59
Copy link

codecov bot commented Sep 14, 2024

Codecov Report

Attention: Patch coverage is 97.75281% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.4%. Comparing base (e1860c7) to head (133f317).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
opentelemetry-sdk/src/metrics/internal/mod.rs 93.7% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2117   +/-   ##
=====================================
  Coverage   79.3%   79.4%           
=====================================
  Files        121     121           
  Lines      20968   20968           
=====================================
+ Hits       16646   16660   +14     
+ Misses      4322    4308   -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fraillt
Copy link
Contributor Author

fraillt commented Sep 18, 2024

Thanks for review!
Here's a stress test results (I ran them for a while to stabilize a bit):

main branch - metrics_counter ~4.8M iter/sec

expand

Throughput: 4,784,400 iterations/sec
Throughput: 4,781,200 iterations/sec
Throughput: 4,814,800 iterations/sec
Throughput: 4,815,000 iterations/sec
Throughput: 4,829,400 iterations/sec
Throughput: 4,833,400 iterations/sec
Throughput: 4,827,400 iterations/sec
Throughput: 4,836,600 iterations/sec
Throughput: 4,838,600 iterations/sec
Throughput: 4,822,400 iterations/sec
Throughput: 4,830,000 iterations/sec
Throughput: 4,832,400 iterations/sec

this branch - metrics_counter ~5.2M iter/sec

expand

Throughput: 5,175,800 iterations/sec
Throughput: 5,176,200 iterations/sec
Throughput: 5,176,800 iterations/sec
Throughput: 5,182,800 iterations/sec
Throughput: 5,189,200 iterations/sec
Throughput: 5,193,600 iterations/sec
Throughput: 5,195,800 iterations/sec
Throughput: 5,189,400 iterations/sec
Throughput: 5,188,200 iterations/sec
Throughput: 5,184,800 iterations/sec
Throughput: 5,178,200 iterations/sec
Throughput: 5,184,800 iterations/sec

main - histograms ~4.8M iter/sec

expand

Throughput: 4,726,400 iterations/sec
Throughput: 4,754,200 iterations/sec
Throughput: 4,773,600 iterations/sec
Throughput: 4,790,800 iterations/sec
Throughput: 4,789,200 iterations/sec
Throughput: 4,802,400 iterations/sec
Throughput: 4,788,200 iterations/sec
Throughput: 4,797,600 iterations/sec
Throughput: 4,794,800 iterations/sec
Throughput: 4,784,400 iterations/sec
Throughput: 4,786,200 iterations/sec
Throughput: 4,798,400 iterations/sec
Throughput: 4,780,000 iterations/sec

this branch - metrics_histograms ~4.7M iter/sec

expand

Throughput: 4,820,400 iterations/sec
Throughput: 4,676,000 iterations/sec
Throughput: 4,718,800 iterations/sec
Throughput: 4,749,200 iterations/sec
Throughput: 4,730,000 iterations/sec
Throughput: 4,741,800 iterations/sec
Throughput: 4,744,000 iterations/sec
Throughput: 4,744,800 iterations/sec
Throughput: 4,739,000 iterations/sec
Throughput: 4,709,000 iterations/sec
Throughput: 4,741,200 iterations/sec

Generally it looks that results fluctuate between the "stress_test" runs , not sure why... I didn't look too deep but there's there's at least two sources of randomness (1. generating attribute sets, 2. initializing HashMap), maybe this might be the issue.
I tried to run these tests multiple times so general "feeling" is that histograms performance is probably the same... didn't noticed a difference.
But this branch really is faster with metrics_counter no matter how much I measure.

@fraillt
Copy link
Contributor Author

fraillt commented Sep 18, 2024

removed redudant check from histogram.rs and exponential_histogram.rs.
I think it's not possible to get infinite or nan when doing conversion from integer.

let f_value = measurement.into_float();
if f_value.is_infinite() || f_value.is_nan() {
    return;
}

@fraillt
Copy link
Contributor Author

fraillt commented Sep 18, 2024

I was wrong regarding f64 NaN and Infinity, because actual type can be f64, and user can pass in anything he likes :)
Also it looks like this check is missing in main branch for histogram so I guess this might be considered a bug fix?
I also added test for histogram for this specific case.

@cijothomas
Copy link
Member

Thanks for review! Here's a stress test results (I ran them for a while to stabilize a bit):

main branch - metrics_counter ~4.8M iter/sec

expand
this branch - metrics_counter ~5.2M iter/sec

expand
main - histograms ~4.8M iter/sec

expand
this branch - metrics_histograms ~4.7M iter/sec

expand
Generally it looks that results fluctuate between the "stress_test" runs , not sure why... I didn't look too deep but there's there's at least two sources of randomness (1. generating attribute sets, 2. initializing HashMap), maybe this might be the issue. I tried to run these tests multiple times so general "feeling" is that histograms performance is probably the same... didn't noticed a difference. But this branch really is faster with metrics_counter no matter how much I measure.

I am seeing a different results. For metrics_histogram, my stress test drops significantly with this PR branch!! Benchmarks is not a lot different. This usually indicates that some contention is getting introduced. Maybe #2117 (comment) is the reason? Could you move the index calculation to be completely outside.

@fraillt fraillt requested a review from a team as a code owner September 20, 2024 08:30
@fraillt
Copy link
Contributor Author

fraillt commented Sep 20, 2024

Thanks for measuring it and sharing your results!
I did more stress tests and indeed it looks histograms are a bit slower.
Anyway, I finally understood where the issue was. There is two contention points: one for acquiring specific aggregator per attribute set (RwLock) and another for updating it (Mutex).
So I updated Aggregator interface to allow to precompute a value, so this should reduce contention for Histograms.
So when it comes to contention is should be exactly the same as on main branch, so performances should be identical (even though on main branch there are more function calls from measurement to update Counter metrics has extra unused parameter, but this should be optimized anyway).
I tried measuring with stress-test but it really is hard to say, for Counter metrics, sometimes it stabilize at 5.2M iter/s, another time at 4.1M iter/s, (and I tried to clear all the windows on my Ubuntu machine, to reduce noise).
My gut feeling is that it should be the same (if optimizer is smart to optimize main branch), but you're wellcome to test, maybe you find something more :)

Regarding performance in general I see at least few performance killers that would be easy to implement (and I would love to contribute).

  • change hashing function (maybe using rustc_hash crate). I really don't see a reason to have cryptographically secure hashing functions for attribute sets.
  • wrap attributes into some sort of HashOnce<T> wrapper, that would preserve hashing results.
  • implement some sort of sharding to reduce contention in multithreaded environment, something trivial like this might have big improvements.
let hash = attribs.get_hash();
let trackers = self.trackers[hash % 8]; // this number is a trade of between more memory OR less contention.
// do usual stuff like read_lock, get, write_lock etc...

Also collection phase needs some love too... which is another story :)

@cijothomas
Copy link
Member

Regarding performance in general I see at least few performance killers that would be easy to implement (and I would love to contribute).

change hashing function (maybe using rustc_hash crate). I really don't see a reason to have cryptographically secure hashing functions for attribute sets.
wrap attributes into some sort of HashOnce wrapper, that would preserve hashing results.
implement some sort of sharding to reduce contention in multithreaded environment, something trivial like this might have big improvements.

Can you open separate issues so as to track this separately.

  1. Hashing - if we have a faster hash, we should explore that. If it introduces possibility of trigger collisions with controlled input, then it should be under feature flag, so users can opt-in into that for higher perf.
  2. I am not sure how to to achieve that. In hot path, hash is calculated only once today. Are you referring to optimizing non-hot path?
  3. Agree that the existing contention can room for improvement, but it is generally hard to implement shardings. Here's link to a previous attempt : Adding two level hashing in metrics hashmap  #1564 which was abandoned due to challenges getting it correct + we already reduced contention significantly with other changes. Happy to explore this further! (For comparison, OTel Rust's throughput is far lower than OTel .NET's throughput, though the latency is same, indicating that OTel Rust has more contentions. The key reason is .NET language has a built-in ConcurrentDictionary, whereas implementing such a thing in Rust would likely need unsafe code and/or rely on external crates. By default, we'd like to avoid unsafe and avoid external crates. But very open to adding them based on opt-in feature flags. (The linked PR shows that simply replacing hash with hashbrown boosted perf)

@fraillt
Copy link
Contributor Author

fraillt commented Sep 29, 2024

Sure,I'll definitely create separate tickets for there optimizations, but before I do that, I want to make sure ValueMap can be applied to all metrics (measure and collection phase). This is important for at least few reasons:

  • makes sure that optimizations can be applied for all metrics (this revision is a good proof that this is possible)
  • easier to review and iterate/experiment, as optimization code will be localized to one class (ValueMap) instead of +5 extra places (all metrics).

I have 1 (this) PR and 2 issue, that I want to implement before implementing/experimenting with optimizations, so I don't want to create an issue now, because I'll not be able to work on it anyway.

@fraillt fraillt force-pushed the value-map-interface-change branch 2 times, most recently from bd47a96 to bf24bca Compare September 29, 2024 18:59
Copy link

linux-foundation-easycla bot commented Sep 29, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: cijothomas / name: Cijo Thomas (1f54f90, 133f317)
  • ✅ login: fraillt / name: Mindaugas Vinkelis (606d126)
  • ✅ login: lalitb / name: Lalit Kumar Bhasin (5ff18ca, ffe0d4a)

@cijothomas
Copy link
Member

Just ran perf test of my box:

Benchmarks for counter,histogram shows no difference in perf. (within noise level only).
Stress tests:
counter 12.4 (main) --> 13.2 (PR)
histogram 12.7 (main) --> 8.1 (PR)

@lalitb
Copy link
Member

lalitb commented Oct 8, 2024

On my dev machine:

counter: main=11.7 pr=13.4
histogram: main=12.7 pr=8.1

@fraillt
Copy link
Contributor Author

fraillt commented Oct 8, 2024

histogram: main=12.7 pr=8.1

This is insane difference... I wonder why this is so different on your platform compared to mine.
I wrote a script run_stress.sh

#!/usr/bin/env bash
for i in $(seq 1 10);
do
  git switch -
  timeout 60s cargo run --release --package stress --bin metrics_histogram
done

Then closed all applications, (to avoid any external noise) and went for a coffee for 10m :))

Here's the results mv@mv-t14:~/Projects/opentelemetry-rust$ ./run_stress.sh Switched to branch 'value-map-interface-change' Your branch is up to date with 'origin/value-map-interface-change'. Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk) Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress) Finished `release` profile [optimized] target(s) in 3.02s Running `target/release/metrics_histogram` Number of threads: 12

Throughput: 5,077,200 iterations/sec

Throughput: 5,085,400 iterations/sec

Throughput: 5,081,000 iterations/sec

Throughput: 5,081,200 iterations/sec

Throughput: 4,509,600 iterations/sec

Throughput: 4,172,000 iterations/sec

Throughput: 4,189,000 iterations/sec

Throughput: 4,204,800 iterations/sec

Throughput: 4,220,800 iterations/sec

Throughput: 4,221,200 iterations/sec

Throughput: 4,230,400 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 3.93s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,423,000 iterations/sec

Throughput: 4,401,000 iterations/sec

Throughput: 4,396,800 iterations/sec

Throughput: 4,409,800 iterations/sec

Throughput: 4,406,000 iterations/sec

Throughput: 4,432,800 iterations/sec

Throughput: 4,409,600 iterations/sec

Throughput: 4,451,600 iterations/sec

Throughput: 4,396,800 iterations/sec

Throughput: 4,402,200 iterations/sec

Throughput: 4,395,600 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.09s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,510,000 iterations/sec

Throughput: 4,483,200 iterations/sec

Throughput: 4,492,400 iterations/sec

Throughput: 4,488,400 iterations/sec

Throughput: 4,490,800 iterations/sec

Throughput: 4,489,800 iterations/sec

Throughput: 4,499,200 iterations/sec

Throughput: 4,508,800 iterations/sec

Throughput: 4,518,000 iterations/sec

Throughput: 4,510,200 iterations/sec

Throughput: 4,514,000 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 3.91s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,811,400 iterations/sec

Throughput: 4,780,400 iterations/sec

Throughput: 4,790,800 iterations/sec

Throughput: 4,782,800 iterations/sec

Throughput: 4,781,200 iterations/sec

Throughput: 4,754,600 iterations/sec

Throughput: 4,777,600 iterations/sec

Throughput: 4,785,800 iterations/sec

Throughput: 4,772,800 iterations/sec

Throughput: 4,786,000 iterations/sec

Throughput: 4,735,000 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.02s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,493,200 iterations/sec

Throughput: 4,463,600 iterations/sec

Throughput: 4,472,000 iterations/sec

Throughput: 4,466,200 iterations/sec

Throughput: 4,470,000 iterations/sec

Throughput: 4,470,000 iterations/sec

Throughput: 4,485,600 iterations/sec

Throughput: 4,482,200 iterations/sec

Throughput: 4,469,600 iterations/sec

Throughput: 4,498,200 iterations/sec

Throughput: 4,509,000 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 3.90s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 5,150,600 iterations/sec

Throughput: 5,133,000 iterations/sec

Throughput: 5,132,400 iterations/sec

Throughput: 5,138,200 iterations/sec

Throughput: 5,127,600 iterations/sec

Throughput: 5,133,200 iterations/sec

Throughput: 5,133,800 iterations/sec

Throughput: 5,127,600 iterations/sec

Throughput: 5,138,600 iterations/sec

Throughput: 5,140,000 iterations/sec

Throughput: 5,147,000 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.02s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,757,800 iterations/sec

Throughput: 4,736,200 iterations/sec

Throughput: 4,725,000 iterations/sec

Throughput: 4,736,400 iterations/sec

Throughput: 4,737,200 iterations/sec

Throughput: 4,737,400 iterations/sec

Throughput: 4,744,000 iterations/sec

Throughput: 4,702,000 iterations/sec

Throughput: 4,732,800 iterations/sec

Throughput: 4,737,800 iterations/sec

Throughput: 4,736,200 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 3.90s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,414,800 iterations/sec

Throughput: 4,410,400 iterations/sec

Throughput: 4,375,800 iterations/sec

Throughput: 4,382,800 iterations/sec

Throughput: 4,379,200 iterations/sec

Throughput: 4,383,000 iterations/sec

Throughput: 4,422,800 iterations/sec

Throughput: 4,405,200 iterations/sec

Throughput: 4,385,000 iterations/sec

Throughput: 4,381,800 iterations/sec

Throughput: 4,378,800 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.04s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,756,200 iterations/sec

Throughput: 4,727,400 iterations/sec

Throughput: 4,734,200 iterations/sec

Throughput: 4,729,000 iterations/sec

Throughput: 4,732,200 iterations/sec

Throughput: 4,732,000 iterations/sec

Throughput: 4,735,400 iterations/sec

Throughput: 4,736,000 iterations/sec

Throughput: 4,720,200 iterations/sec

Throughput: 4,721,600 iterations/sec

Throughput: 4,736,000 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Compiling opentelemetry_sdk v0.26.0 (/home/mv/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/home/mv/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 3.90s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 4,688,600 iterations/sec

Throughput: 4,658,400 iterations/sec

Throughput: 4,676,400 iterations/sec

Throughput: 4,695,200 iterations/sec

Throughput: 4,672,600 iterations/sec

Throughput: 4,656,400 iterations/sec

Throughput: 4,657,000 iterations/sec

Throughput: 4,659,400 iterations/sec

Throughput: 4,660,000 iterations/sec

Throughput: 4,656,000 iterations/sec

Throughput: 4,674,400 iterations/sec

I would say that results are very similar, sometimes my branch is faster, sometimes main. Maybe main is on average a bit faster...? but not like 12k vs 8k...

@fraillt fraillt force-pushed the value-map-interface-change branch 2 times, most recently from a45d153 to c1b4d15 Compare October 8, 2024 20:03
@fraillt
Copy link
Contributor Author

fraillt commented Oct 8, 2024

@cijothomas and @lalitb thanks for sharing your results!

I updated code by making it look as close as possible to what it was before (preserving existing bugs).
Unfortunatelly I cannot reproduce these results( see comment above), but I'm really confused why there is 50% slowdown for histograms :/
Maybe you have some insights?
Could you run these tests again (Even though I didn't change much, mostly move code around and restored few minor bugs)?

@fraillt
Copy link
Contributor Author

fraillt commented Oct 11, 2024

I just ran tests on another computer
Stress tests:
counter 12.7 (main) --> 18.3 (PR)
histogram 12.6 (main) --> 11.4 (PR)

Here's the script that I used:

#!/usr/bin/env bash
echo "CPU $(cat /proc/cpuinfo | grep 'name' | uniq)"
for i in $(seq 1 2);
do
  echo "Current branch: $(git branch --show-current)"
  echo "stress-test metrics_histogram"
  timeout 30s cargo run --release --package stress --bin metrics_histogram
  echo "stress-test metrics"
  timeout 30s cargo run --release --package stress --bin metrics
  git switch -
done
Here's the full console output

fraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$ ./run_tests.sh
CPU model name : AMD Ryzen 5 3600 6-Core Processor
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 7.95s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,594,800 iterations/sec

Throughput: 12,642,800 iterations/sec

Throughput: 12,645,600 iterations/sec

Throughput: 12,571,800 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.21s
Running target/release/metrics
Number of threads: 12

Throughput: 13,018,000 iterations/sec

Throughput: 13,011,600 iterations/sec

Throughput: 12,922,800 iterations/sec

Throughput: 12,231,200 iterations/sec

Throughput: 11,879,200 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 8.37s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 11,383,000 iterations/sec

Throughput: 11,422,600 iterations/sec

Throughput: 11,432,800 iterations/sec

Throughput: 11,428,800 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.11s
Running target/release/metrics
Number of threads: 12

Throughput: 18,337,200 iterations/sec

Throughput: 18,394,400 iterations/sec

Throughput: 18,224,400 iterations/sec

Throughput: 18,356,200 iterations/sec

Throughput: 18,334,400 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
fraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$

@fraillt
Copy link
Contributor Author

fraillt commented Oct 13, 2024

I think I know the reason for these results.
It looks that different compiler version has really different results.
I have ran these tests using several different compiler versions, and here's the summary:

Rust version: 1.70

  • counter 17.3 (main) --> 12.7 (PR)
  • histogram 16.0 (main) --> 16.5 (PR)

Rust version: 1.72

  • counter 12.2 (main) --> 12.4 (PR)
  • histogram 12.0 (main) --> 12.0 (PR)

Rust version: 1.75

  • counter 12.9 (main) --> 12.4 (PR)
  • histogram 12.5 (main) --> 12.4 (PR)

Rust version: 1.78

  • counter 12.9 (main) --> 17.7 (PR)
  • histogram 12.5 (main) --> 11.3 (PR)

Rust version: 1.81

  • counter 12.5 (main) --> 18.0 (PR)
  • histogram 12.4 (main) --> 11.2 (PR)
script outcome

fraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$ ./run_tests.sh
CPU model name : AMD Ryzen 5 3600 6-Core Processor
info: using existing install for '1.70-x86_64-unknown-linux-gnu'
info: default toolchain set to '1.70-x86_64-unknown-linux-gnu'

1.70-x86_64-unknown-linux-gnu unchanged - rustc 1.70.0 (90c541806 2023-05-31)

Rust version: 1.70
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 10.72s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 16,570,200 iterations/sec
Throughput: 16,536,400 iterations/sec
Throughput: 16,557,400 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 5.39s
Running target/release/metrics
Number of threads: 12

Throughput: 12,781,200 iterations/sec
Throughput: 12,685,600 iterations/sec
Throughput: 12,754,200 iterations/sec
Throughput: 12,713,400 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 10.00s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 16,122,600 iterations/sec
Throughput: 16,096,600 iterations/sec
Throughput: 16,156,000 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 5.51s
Running target/release/metrics
Number of threads: 12

Throughput: 17,320,600 iterations/sec
Throughput: 17,350,600 iterations/sec
Throughput: 17,322,600 iterations/sec
Throughput: 17,328,600 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
info: using existing install for '1.72-x86_64-unknown-linux-gnu'
info: default toolchain set to '1.72-x86_64-unknown-linux-gnu'

1.72-x86_64-unknown-linux-gnu unchanged - rustc 1.72.1 (d5c2e9c34 2023-09-13)

Rust version: 1.72
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 10.46s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,094,400 iterations/sec
Throughput: 12,040,400 iterations/sec
Throughput: 12,075,600 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 5.01s
Running target/release/metrics
Number of threads: 12

Throughput: 12,431,200 iterations/sec
Throughput: 12,466,600 iterations/sec
Throughput: 12,448,400 iterations/sec
Throughput: 12,450,000 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 9.70s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,051,000 iterations/sec
Throughput: 11,986,400 iterations/sec
Throughput: 12,063,000 iterations/sec
Throughput: 12,068,000 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 5.12s
Running target/release/metrics
Number of threads: 12

Throughput: 12,259,200 iterations/sec
Throughput: 12,269,800 iterations/sec
Throughput: 12,273,200 iterations/sec
Throughput: 12,266,200 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
info: using existing install for '1.75-x86_64-unknown-linux-gnu'
info: default toolchain set to '1.75-x86_64-unknown-linux-gnu'

1.75-x86_64-unknown-linux-gnu unchanged - rustc 1.75.0 (82e1608df 2023-12-21)

Rust version: 1.75
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 9.93s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,312,000 iterations/sec
Throughput: 12,450,000 iterations/sec
Throughput: 12,433,400 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 4.80s
Running target/release/metrics
Number of threads: 12

Throughput: 12,505,000 iterations/sec
Throughput: 12,544,400 iterations/sec
Throughput: 12,517,000 iterations/sec
Throughput: 12,484,200 iterations/sec
Throughput: 12,327,400 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 9.27s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,621,400 iterations/sec
Throughput: 12,229,400 iterations/sec
Throughput: 12,538,400 iterations/sec
Throughput: 12,670,000 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release [optimized] target(s) in 4.74s
Running target/release/metrics
Number of threads: 12

Throughput: 12,907,600 iterations/sec
Throughput: 12,931,200 iterations/sec
Throughput: 12,910,000 iterations/sec
Throughput: 12,938,000 iterations/sec
Throughput: 12,957,200 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
info: using existing install for '1.78-x86_64-unknown-linux-gnu'
info: default toolchain set to '1.78-x86_64-unknown-linux-gnu'

1.78-x86_64-unknown-linux-gnu unchanged - rustc 1.78.0 (9b00956e5 2024-04-29)

Rust version: 1.78
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 9.19s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 11,197,800 iterations/sec
Throughput: 11,416,000 iterations/sec
Throughput: 11,394,800 iterations/sec
Throughput: 9,392,800 iterations/sec <--- open browser

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.96s
Running target/release/metrics
Number of threads: 12

Throughput: 17,282,200 iterations/sec
Throughput: 16,250,200 iterations/sec
Throughput: 17,872,600 iterations/sec
Throughput: 17,867,400 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 8.87s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,465,200 iterations/sec
Throughput: 12,610,600 iterations/sec
Throughput: 12,596,200 iterations/sec
Throughput: 12,080,200 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.79s
Running target/release/metrics
Number of threads: 12

Throughput: 12,887,200 iterations/sec
Throughput: 12,956,800 iterations/sec
Throughput: 12,624,000 iterations/sec
Throughput: 12,252,000 iterations/sec
Throughput: 12,381,400 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
info: using existing install for '1.81-x86_64-unknown-linux-gnu'
info: default toolchain set to '1.81-x86_64-unknown-linux-gnu'

1.81-x86_64-unknown-linux-gnu unchanged - rustc 1.81.0 (eeb90cda1 2024-09-04)

Rust version: 1.81
Current branch: value-map-interface-change
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 8.93s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 11,088,800 iterations/sec
Throughput: 11,227,200 iterations/sec
Throughput: 11,198,600 iterations/sec
Throughput: 11,189,600 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 4.64s
Running target/release/metrics
Number of threads: 12

Throughput: 18,036,200 iterations/sec
Throughput: 18,069,200 iterations/sec
Throughput: 18,066,200 iterations/sec
Throughput: 18,009,000 iterations/sec
Throughput: 18,037,600 iterations/sec

Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
Current branch: main
stress-test metrics_histogram
Compiling opentelemetry_sdk v0.26.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/opentelemetry-sdk)
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 8.67s
Running target/release/metrics_histogram
Number of threads: 12

Throughput: 12,169,800 iterations/sec
Throughput: 12,409,200 iterations/sec
Throughput: 12,436,000 iterations/sec
Throughput: 11,985,000 iterations/sec

stress-test metrics
Compiling stress v0.1.0 (/mnt/c/Users/frail/Projects/opentelemetry-rust/stress)
Finished release profile [optimized] target(s) in 5.12s
Running target/release/metrics
Number of threads: 12

Throughput: 12,572,400 iterations/sec
Throughput: 12,559,000 iterations/sec
Throughput: 12,581,800 iterations/sec
Throughput: 12,460,600 iterations/sec

Switched to branch 'value-map-interface-change'
Your branch is up to date with 'origin/value-map-interface-change'.
fraillt@Fraillt-PC:~/HostProjects/opentelemetry-rust$

So how should we proceed?

@lalitb
Copy link
Member

lalitb commented Oct 15, 2024

This is weird, I see different results now on the same machine, atlease don't see the degradation earlier observed with histogram. Would wait for someone else to confirm too. @fraillt - can you also gather the stats without using script (running manually), for these two rust versions in both release and debug(default) mode?

CPU: CPU model name : AMD EPYC 7763 64-Core Processor, 8 cores
Memory: 62G

rustc version: 1.70.0

histogram:
main: ~10.3K (release), ~5.2K (debug)
PR: ~10.8K (release), ~5.2K (debug)

counter:
main: ~10.6K (release) , ~5.1K (debug)
PR: ~11.2K (release), ~5.1K (debug)

rustc version: 1.81

histogram:
main: ~10.5K (release), ~5.2K (debug)
PR: ~10.5K (release), ~5.2K (debug)

counter:
main: ~10.6K (release) , ~5.1K (debug)
PR: ~11.1K (release), ~5.1K (debug)

@fraillt
Copy link
Contributor Author

fraillt commented Oct 16, 2024

CPU model name : AMD Ryzen 5 3600 6-Core Processor

rustc version: 1.70.0

histogram:
main: ~15.9 (release), ~3.3 (debug)
PR: ~16.3 (release), ~3.6 (debug)

counter:
main: ~17.3 (release) , ~4.1 (debug)
PR: ~12.6 (release), ~3.7 (debug)

rustc version: 1.81

histogram:
main: ~12.6 (release), ~3.1 (debug)
PR: ~11.6 (release), ~3.2 (debug)

counter:
main: ~12.3 (release) , ~3.4 (debug)
PR: ~18.2 (release), ~4.0 (debug)


Rust version 1.70 with counter metrics is super weird... :/

@fraillt
Copy link
Contributor Author

fraillt commented Oct 16, 2024

Thanks @lalitb for your tests

This is weird, I see different results now on the same machine, atlease don't see the degradation earlier observed with histogram.

I have mentioned that

I updated code by making it look as close as possible to what it was before (preserving existing bugs).

Basically, the only functional change is that I removed

        if f_value.is_infinite() || f_value.is_nan() {
            return;
        }

Which is probably ok, since histograms are unbounded anyway...
It's wierd, because personally on my machine I didn't observe any difference (if I check infinity+nan or not), but maybe your machine is different...

Maybe you can test this explicitly (by adding this check in fn measure) and see if this will cause performance degradation?

@fraillt fraillt mentioned this pull request Oct 25, 2024
4 tasks
@cijothomas
Copy link
Member

@fraillt @utpilla I think we can merge this PR, irrespective of the unexplainable stress test result variation. The changes are mostly internal, and we can keep optimizing this further.
I'll do another review, and mark approval today.

/// Some aggregators can do some computations before updating aggregator.
/// This helps to reduce contention for aggregators because it makes
/// [`Aggregator::update`] as short as possible.
type PreComputedValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PrecomputedValue is not the best name as there is no precomputation happening for counters and gauges.

I think we could use something like MeasurementData instead.

Suggested change
type PreComputedValue;
type MeasurementData;

// Ignore NaN and infinity.
// Only makes sense if T is f64, maybe this could be no-op for other cases?
// TODO: uncomment once we know the reason for performance degradation
// if f.is_infinite() || f.is_nan() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas Let's add this check as well?

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patiently waiting!

@cijothomas cijothomas merged commit 706a067 into open-telemetry:main Nov 1, 2024
23 of 25 checks passed
@fraillt fraillt deleted the value-map-interface-change branch November 2, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants