Skip to content

240105 code cleanup #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 26, 2024
Merged

Conversation

luigirizzo
Copy link
Contributor

No description provided.

Luigi Rizzo added 7 commits January 5, 2024 19:26
There seems to be no use for the numlist object
percentiles in CSV file were incorrectly divided by MILLION,
resulting in mostly 0 values. Remove the divisor.

Probably the feature was never used, otherwise it would have
been noticed.
histogram methods were implemented as virtual functions, but since there
is only one possible implementation this was overkill.
Simplify the code by exposing the actual methods. The implementation
still remains opaque.

No functional changes.

Tested with
 ./tcp_rr -c -H 127.0.0.1 -p1,2,10,50,90,999.9999,100 -A/tmp/x.csv -l 4
 and verified that the csv file has the correct data.
 (histograms are only exercised in rr tests)
histograms store samples in buckets with pseudo logarithmic size.
The previous implementation used a table of thresholds, and binary
search to locate the correct bucket.

This patch replaces the thresholds with the fast pseudo-logarithm
algorithm used in lr-cstats and bpftrace so we can locate the bucket in
a handful of instructions.

This gives memory savings, reduced cache trashing, and better
performance. Tests show that with a hot cache a lookup now takes
less than 2us compared to 20-25us with the previous approach.
Also, we can remove the now-useless neper_histo_factory.

The actual resolution of the buckets is approximately the same as in
the previous implementation (about 1.5%).

On passing, correct a few bugs in the previous implementation:
- resolution was supposed to be 0.25% but due to an implementation bug
  it was around 1% or even bigger at low values, and cause the
  thresholds to become negative
- conversion from double to int for the sample could have unchecked
  overflows.

Tested with tcp_rr and verifying that the distribution and csv files
contain correct values.
Allow arbitrary percentiles to be specificed, instead of just integer
and p99.9 and p99.99

This also makes the code faster because we can just compute the values
requested instead of all 103 entries.

Any floating point number between 0 and 100 is now accepted, with 999
and 9999 mapped to 99.9 and 99.99 for backward compatibility.

Tested as usual with
 ./tcp_rr -c -H 127.0.0.1 -p1,2,10,50,90,999,9999,100 -A/tmp/x.csv
 and verifying the correct content of the csv file.
Computing percentiles is expensive, as it requires scanning all the 4k-8k
buckets used to store samples, and is done for each flow.  Benchmarks show
the original code took an average of 20us per flow, with frequent peaks
in the 60-80us range.

This patch eliminates the cost by not storing samples in buckets if no
percentiles are requested, and otherwise achieves a ~5x reduction by
tracking the range of buckets that contain values in each epoch.

Also change the precision to 6 bits, which halves the cost without much
impact on the results. This value may become a command line flag.

Tested, as usual, by running tcp_rr and verifying the logs and csv
neper_snaps methods were implemented as virtual functions, but since there
is only one possible implementation this was overkill.  Simplify the code
by exposing the actual methods. The implementation still remains opaque.

No functional changes.

Tested with
./tcp_rr -c -H 127.0.0.1 -p1,2,10,50,90,999.9999,100 -A/tmp/x.csv -l 4
and verified that the csv file has the correct data.
(histograms are only exercised in rr tests)
@luigirizzo
Copy link
Contributor Author

Short description of the patches:

  • numlist: remove unused component 284fe50 is code cleanup

  • rr: remove incorrect division by MILLION csv printing 313f5e7 is the same as pull#25

  • the next 4 patches give a 10x performance improvement on stats collection (which can consume >20% of CPU time on rr tests), as well as allowing better analysis at tails
    histo: de-virtualize histogram methods. No functional change.] 38d1ff1
    histo: replace threshold table with faster bit-based logarithms] 80b171f
    histograms: allow arbitrary percentiles] 8b2f9e1
    histo: only scan necessary buckets when computing percentiles] 7e3efb3

  • snaps: de-virtualize methods. No functional change. c412b18 is cleanup for upcoming optimizations

@gfantom gfantom self-requested a review January 26, 2024 19:20
@gfantom gfantom merged commit 077c75b into google:master Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants