Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize UTF8 encoding #259

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

davidik1
Copy link

@davidik1 davidik1 commented Nov 24, 2024

We have observed several workloads where UTF_8$Encoder.encodeLoop called from StatsDProcessor$ProcessingTask.writeBuilderToSendBuffer is very prominent in the profile. We also noticed that a lot of time was spent getting and putting the characters/bytes, i.e., StringCharBuffer.get and DirectByteBuffer.put (see screenshot).
This is actually quite similar to the grey frames in the screenshot in the following comment (although it dealt with a completely separate issue): #203 (comment)
After applying the changes suggested in this PR (replacing the CharsetEncoder interface with plain String.getBytes) performance improved dramatically and writeBuilderToSendBuffer is no longer visible in the profile.
We also found a pretty old blogpost benchmarking the two: https://www.evanjones.ca/software/java-string-encoding.html.

Screenshot 2024-11-26 at 16 19 54

@davidik1 davidik1 requested a review from a team as a code owner November 24, 2024 14:23
@YoniKF
Copy link

YoniKF commented Dec 12, 2024

Hi @vickenty and team,
Is this change something you are willing to accept into the project?
We (Intel Granulate) have observed this inefficiency on several services of our clients and think that it makes sense to fix it at the source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants