Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(metrics): Use a priority queue for metrics aggregator #3845

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

Dav1dde
Copy link
Member

@Dav1dde Dav1dde commented Jul 22, 2024

Uses a priority queue for the metrics aggregator:

  • improves worst case performance massively (huge aggregator, small flushes)
  • makes the performance for fast flushes (small aggregators, all flushed) slightly worse
bench_flush_metrics/same metric on different projects
                        time:   [18.261 µs 21.705 µs 25.368 µs]
                        change: [+97.208% +144.62% +196.81%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_flush_metrics/a lot of metrics from a lot of projects with mix of backdated buckets and non
                        time:   [21.889 ms 22.119 ms 22.365 ms]
                        change: [-67.986% -67.566% -67.127%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)

#skip-changelog

@Dav1dde Dav1dde requested a review from a team as a code owner July 22, 2024 07:48
@Dav1dde Dav1dde self-assigned this Jul 22, 2024
@@ -122,6 +122,7 @@ pest = "2.1.3"
pest_derive = "2.1.0"
pin-project-lite = "0.2.12"
pretty-hex = "0.3.0"
priority-queue = "2.0.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting crate, might actually be a good replacement for my homegrown priority map. But do we need it for the aggregator use case? IIUC the std binary heap should be good enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need keyed access and priority separately. Keyed for the aggregation key, priority for the flush time.

@Dav1dde Dav1dde force-pushed the dav1d/pq-aggregator branch from 977facc to f9878c6 Compare July 23, 2024 11:07
@Dav1dde Dav1dde requested a review from jjbayer July 23, 2024 11:29
@Dav1dde
Copy link
Member Author

Dav1dde commented Jul 23, 2024

Looked into not using a dependency for this but instead having a HashMap with a secondary BinaryHeap, performance improved but was still twice as slow as the priority_queue implementation.

I'd just go with the dependency and revisit this again if performance is still not good enough or when we're looking into priority queue alternatives for the spooler.

@Dav1dde Dav1dde requested a review from a team July 23, 2024 11:31
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See question about change_priority_by vs get_mut.

break;
}

let (key, entry) = self.buckets.pop().expect("pop after peek");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awkward, right? For our own priority map I'm considering writing an interface that lets you do let value = queue.peek().remove() instead of queue.pop().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is, I proposed an API here.

What could also be nice is also something like in heed where you have extra functionality on the iterator.

namespace = entry.key().namespace().as_str(),
);
let mut error = None;
let updated = self.buckets.change_priority_by(&key, |value| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not use self.buckets.get_mut() here? IIUC, the closure is only called if the element exists, and we don't actually update any priorities.

Copy link
Member Author

@Dav1dde Dav1dde Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_mut only returns the key as mutable, but we need a mutable value which also causes this unfortunate error handling.

It's kinda double unfortunate API wise, because we never change the priority, maybe worth investigating wrapping the value itself in a RefCell then we can use get() although I am not sure if that makes it better ...

@Dav1dde Dav1dde merged commit b7aedbe into master Jul 23, 2024
23 checks passed
@Dav1dde Dav1dde deleted the dav1d/pq-aggregator branch July 23, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants