Skip to content

pmdaopenmetrics metric removal qa bug fix #2242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lmchilton
Copy link
Contributor

addresses multiple qa test failures. all qa tests
now pass, and updated test #1976 checks for metric removal from a modified source that still persists in the config directory.

@lmchilton lmchilton requested a review from wcohen July 22, 2025 12:35
@lmchilton lmchilton force-pushed the openmetrics-qa-bug-fix branch from eebf0b8 to 7267dfe Compare July 23, 2025 19:38
split_metric = name.split(".")
stripped_metric = name.replace("openmetrics.%s." % self.name, "")
self.pmda.debug("stripped metric --> %s" % stripped_metric) if self.pmda.dbg else None
if stripped_metric == "metric2":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two lines for checking for "metric2" look to be development code that should be removed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmchilton the debug lines are still there BTW, marking 'unresolved' for now.

fname = os.path.join(path, f)
with open(fname, 'r') as name:
f_path = name.readline().strip()
if f_path.startswith("file:///"):
Copy link
Member

@natoscott natoscott Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmchilton @wcohen I think this approach may be incorrect - why keep mtime on these files? We cannot treat files differently to a curl response from an endpoint. The important part is whether the metrics are still there in the response (whether a file or http endpoint was used) - there should be no need to check filesystem state like this, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should suffice for QA (patch below). It doesn't seem to work correctly yet - I think we still have some issues in the PMDA code to resolve - but this removes metric2 (without changing config file timestamps) and reinstates that metric2. Once this works as shown in the .out change, we can have confidence in this PMDA code change.

$ git diff
qa/1976
--- /tmp/git-blob-7FfOb9/1976   2025-08-11 13:57:40.423237960 +1000
+++ qa/1976     2025-08-11 13:57:39.206463438 +1000
@@ -77,6 +77,16 @@ $sudo touch -t 197001010000 $PCP_PMDAS_D
 pminfo openmetrics.simple_metric
 echo
 
+echo "-- remove a metric from an existing source --"
+sed -e '/metric2/d' -i $tmp.simple_metric.txt
+pminfo openmetrics.simple_metric
+echo
+
+echo "-- reinstate metric for an existing source --"
+cp $here/openmetrics/samples/simple_metric.txt $tmp.simple_metric.txt
+pminfo openmetrics.simple_metric
+echo
+
 _pmdaopenmetrics_remove >/dev/null 2>&1
 
 # success, all done
qa/1976.out
--- /tmp/git-blob-B2e7Ur/1976.out       2025-08-11 14:03:04.574975077 +1000
+++ qa/1976.out 2025-08-11 14:03:00.664966764 +1000
@@ -15,3 +15,10 @@ Error: openmetrics.simple_metric: Unknow
 openmetrics.simple_metric.metric2
 openmetrics.simple_metric.metric1
 
+-- remove a metric from an existing source --
+openmetrics.simple_metric.metric1
+
+-- reinstate metric for an existing source --
+openmetrics.simple_metric.metric2
+openmetrics.simple_metric.metric1
+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natoscott @wcohen I just pushed an update with a working / simplifed qa #1976 test.

Regarding the added code in function [traverse] keeping track of files mtime, the reasoning behind this is to signal a change in configuration when actual text files are modified and not just the URL holding the path to the file in the config directory.

old traverse function code picks up a change when $PCP_PMDAS_DIR/openmetrics/config.d/metric.url is modified

new traverse function code picks up a change when $PCP_PMDAS_DIR/openmetrics/config.d/metric.url is modified AND picks up a change when the contents under metric.url (say /some/path/metric.txt) so the contents in metric.txt is modified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how we can't treat files differently from curl responses, but it seems imperative that the code picks up modifications in the text files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmchilton hi Lauren, yes that's right, we certainly need to detect modifications (in both the text files and network stream mode). But, looking at the file modification time is not the way to go here - there is no modification time for network traffic - we need a mechanism that can tell from the data itself whether a metric has been removed.

The approach we use in other places (such as the pmdaCache(3) interfaces), is an algorithm more like this:

  • initially mark all metrics as deleted for the source we're currently updating
  • as we scan through and encounter each refreshed metric in the current update, mark that metric as present
  • at the end of the scan, any metrics that remain not yet marked as present are removed from the namespace and no updated values returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natoscott OK got it- thank you for clarifying

Copy link
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit more love, as per comments.

addresses multiple qa test failures. all qa tests
now pass, and updated test performancecopilot#1976 checks for metric
removal from a modified source that still persists
in the config directory.
@lmchilton lmchilton force-pushed the openmetrics-qa-bug-fix branch from 7267dfe to 7b201d7 Compare August 14, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants