Optimize UTF8 encoding #259

davidik1 · 2024-11-24T14:23:11Z

We have observed several workloads where UTF_8$Encoder.encodeLoop called from StatsDProcessor$ProcessingTask.writeBuilderToSendBuffer is very prominent in the profile. We also noticed that a lot of time was spent getting and putting the characters/bytes, i.e., StringCharBuffer.get and DirectByteBuffer.put (see screenshot).
This is actually quite similar to the grey frames in the screenshot in the following comment (although it dealt with a completely separate issue): #203 (comment)
After applying the changes suggested in this PR (replacing the CharsetEncoder interface with plain String.getBytes) performance improved dramatically and writeBuilderToSendBuffer is no longer visible in the profile.
We also found a pretty old blogpost benchmarking the two: https://www.evanjones.ca/software/java-string-encoding.html.

YoniKF · 2024-12-12T14:47:58Z

Hi @vickenty and team,
Is this change something you are willing to accept into the project?
We (Intel Granulate) have observed this inefficiency on several services of our clients and think that it makes sense to fix it at the source.

Optimize UTF8 encoding

93122d9

davidik1 requested a review from a team as a code owner November 24, 2024 14:23

YoniKF approved these changes Dec 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize UTF8 encoding #259

Optimize UTF8 encoding #259

davidik1 commented Nov 24, 2024 •

edited

Loading

YoniKF commented Dec 12, 2024

Optimize UTF8 encoding #259

Are you sure you want to change the base?

Optimize UTF8 encoding #259

Conversation

davidik1 commented Nov 24, 2024 • edited Loading

YoniKF commented Dec 12, 2024

davidik1 commented Nov 24, 2024 •

edited

Loading