Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bytetstring instead of string for performance and switch proto libraries. #7

Merged
merged 6 commits into from
Dec 11, 2024

Conversation

mattdurham
Copy link
Collaborator

@mattdurham mattdurham commented Dec 11, 2024

The gogo library though deprecated has better performance than googles and is what prometheus uses. The bytestring is safe as long as the underlying buffer is not reused, which it is not.

PS: the branch name is because originally I tried interning the strings with the new unique package but it ended up not saving memory and increased cpu significantly. If/when we make the whole prometheus pipeline interned then this will need to change.

@mattdurham mattdurham requested a review from a team as a code owner December 11, 2024 14:24
}
wr.Timeseries = wr.Timeseries[:len(series)]
func createWriteRequest(series []*types.TimeSeriesBinary, externalLabels map[string]string, data *proto.Buffer) ([]byte, error) {
wr := &prompb.WriteRequest{Timeseries: make([]prompb.TimeSeries, len(series))}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to inside the func had no impact on CPU but saved memory.

@@ -48,6 +48,72 @@ func TestLabels(t *testing.T) {
}
}

func BenchmarkFill(b *testing.B) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these compare before/after?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly 20% better if memory serves me and my notes are accurate.

Comment on lines +165 to +167
// Assume roughly each series has 10 labels, we do this because at very large mappings growing the map took up to 5% of cpu time.
// By pre allocating it that disappeared.
strMapToIndex := make(map[string]uint32, len(s.series)*10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Kinda surprised it had so big impact. Maybe we can track the average size of this and autotune it at runtime?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also surprised. This trades memory for cpu, ie this map is bigger than we actually need in most cases since we store only the unique strings.

@mattdurham mattdurham merged commit 2b91b7d into main Dec 11, 2024
2 checks passed
@mattdurham mattdurham deleted the intern branch December 11, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants