Reset metric name lookup table when another worker deletes a metric #171

knyar · 2024-04-17T16:47:20Z

A few commits here that are easier to review individually, adding a test to reproduce #168 and fixing the issue by emtying the per-metric lookup table every time a metric is deleted by one of the other nginx workers. (cc @Ashion who reported this)

Since we already have a way to propagate metric key deletion to other workers via KeyIndex, I am piggy-backing on the same mechanism to trigger lookup table cleanup. (cc @dolik-rce who implemented the KeyIndex)

This also addresses the issue of unbounded memory growth when there is metric churn (#151), so I am removing the static lookup table size limit that was introduced as a mitigation. (cc @kingluo who reported that memory leak issue)

I will wait for a few days before merging this just in case someone has any feedback.

Fixes #168

Make it easier to define multiple tests and run them concurrently.

This test verifies that: - a gauge metric that has been reset is no longer reported; - a previously-reset metric that is used again correctly reports its values.

The metric key index now accepts a function that gets called every time a key is deleted from the index, which happens at the next sync after another worker had deleted a key. The callback function finds the same metric in the local worker's registry and cleans up its lookup table. To make sure this happens deterministically for all workers, the key index sync is now also scheduled to run every `sync_interval` (1s by default). The goals are: - to prevent unbounded growth of the per-metric lookup tables; - to ensure that previously reset metrics can be used again.

Size based cleanup was added to avoid unbounded growth of lookup tables as metrics get deleted/reset. This should not be necessary now when lookup tables are being reset on metric deletion. This reverts eb1876d

knyar added 4 commits March 13, 2024 07:56

Restructure integration test

549ab5f

Make it easier to define multiple tests and run them concurrently.

Add a metric reset test

abad483

This test verifies that: - a gauge metric that has been reset is no longer reported; - a previously-reset metric that is used again correctly reports its values.

Remove size limit for per-metric lookup tables

bee53e4

Size based cleanup was added to avoid unbounded growth of lookup tables as metrics get deleted/reset. This should not be necessary now when lookup tables are being reset on metric deletion. This reverts eb1876d

knyar merged commit 1479fce into main Apr 22, 2024
6 checks passed

knyar deleted the reset_lookup branch April 22, 2024 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset metric name lookup table when another worker deletes a metric #171

Reset metric name lookup table when another worker deletes a metric #171

knyar commented Apr 17, 2024

Reset metric name lookup table when another worker deletes a metric #171

Reset metric name lookup table when another worker deletes a metric #171

Conversation

knyar commented Apr 17, 2024