Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset metric name lookup table when another worker deletes a metric #171

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

knyar
Copy link
Owner

@knyar knyar commented Apr 17, 2024

A few commits here that are easier to review individually, adding a test to reproduce #168 and fixing the issue by emtying the per-metric lookup table every time a metric is deleted by one of the other nginx workers. (cc @Ashion who reported this)

Since we already have a way to propagate metric key deletion to other workers via KeyIndex, I am piggy-backing on the same mechanism to trigger lookup table cleanup. (cc @dolik-rce who implemented the KeyIndex)

This also addresses the issue of unbounded memory growth when there is metric churn (#151), so I am removing the static lookup table size limit that was introduced as a mitigation. (cc @kingluo who reported that memory leak issue)

I will wait for a few days before merging this just in case someone has any feedback.

Fixes #168

Make it easier to define multiple tests and run them concurrently.
This test verifies that:
- a gauge metric that has been reset is no longer reported;
- a previously-reset metric that is used again correctly reports its
  values.
The metric key index now accepts a function that gets called every time
a key is deleted from the index, which happens at the next sync after
another worker had deleted a key. The callback function finds the same
metric in the local worker's registry and cleans up its lookup table.

To make sure this happens deterministically for all workers, the key
index sync is now also scheduled to run every `sync_interval` (1s
by default).

The goals are:
- to prevent unbounded growth of the per-metric lookup tables;
- to ensure that previously reset metrics can be used again.
Size based cleanup was added to avoid unbounded growth of lookup tables
as metrics get deleted/reset. This should not be necessary now when lookup
tables are being reset on metric deletion.

This reverts eb1876d
@knyar knyar merged commit 1479fce into main Apr 22, 2024
6 checks passed
@knyar knyar deleted the reset_lookup branch April 22, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in multi-worker, call reset will cause other worker's lookup different with key_index
1 participant