Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add complete documentation #1045

Open
calestyo opened this issue Jul 4, 2024 · 7 comments
Open

add complete documentation #1045

calestyo opened this issue Jul 4, 2024 · 7 comments

Comments

@calestyo
Copy link

calestyo commented Jul 4, 2024

Hey.

It would be nice if there was something like a write_to_stdout() function, or if write_to_textfile() would somehow allow writing to stdout.

For debugging that would seem to be much more helpful. Also, distros may have their own unified means of collecting the .prom files and thus even in production it may make sense to let a program simply write the metrics to stdout.

Cheers,
Chris.

PS: Using /dev/stdout doesn't work, as it tries to write it's tmpfile there.

@calestyo
Copy link
Author

calestyo commented Jul 6, 2024

After reading the code I found generate_latest() which seems to do what I want.

So the actual problem seems rather that the whole Python module is basically not documented at all.

https://prometheus.github.io/client_python/ can at best count as a quick how-to, but even that doesn't seem to cover the most simplest and probably regularly used stuff (like clearing time series with labels that no longer exist), not to talk about general concepts like the registries or proper documentation of all classes, methods and their params. :-(

@calestyo
Copy link
Author

calestyo commented Jul 6, 2024

Re-dedicating this issue, asking for some more complete documentation ;-)

@calestyo calestyo changed the title write_to_stdout()? add complete documentation Jul 6, 2024
@calestyo
Copy link
Author

Another area that seems completely undocumented is CustomCollectors - not only the class itself, but also the concepts, when/for what they're used and how they work.

The example given at https://prometheus.github.io/client_python/collector/custom/ doesn't even work, as prometheus_client.core no longer exits (guess it was renamed to metrics_core at some point.

There's no telling that CustomCollectors are automatically called (and do their thing in collect()) whenever the exporter is scraped.

In contrast to direct instrumentation, where the metrics are updated "on the fly" in the code, regardless of whether the exporter is even scraped at all.

No word about what "direct instrumentation" even means.

No word about what describe() is used for, which would be handy, if one e.g. writes an exporter that shall support both modes:

  • running continuously and being queryable via http
  • one-shot mode, and e.g. being usable with something like node-exporter-collector

I one wants to do both, then it seems generate_latest() also calls the custom collectors, but with the default registry it does so twice (probably because of auto describe), which is of course rather undesired in one-shot mode.

(I had to gather all that information from blogs and 3rd party websites... all doing things a bit different... some making collect() a generator with yield others not.

Having basically no proper documentation is really a pity, as the Python client seems actually to be quite neat, but if people have to read the whole sources or try their luck on google with questionable outcome, they probably simply won't use all the nice features in a proper way.

@csmarchbanks
Copy link
Member

#1021 is another issue focusing on API documentation that would hopefully make things like generate_latest more discoverable, that said the custom collector documentation would be more than just API documentation and would be nice to have in addition. Perhaps this issue could focus on that?

I hesitate to have an issue for just "complete documentation" since that is a moving target as more features are added.

@calestyo
Copy link
Author

It's at least difficult IMO to have many small issues for just single points that seem to be missing from the documentation, because if I add another issue for every thing I find presumably missing, I'd spam the issue tracker ;-)

Like there was the point of yield (as used by the example - though I actually don't understand why that would work if collect() is called only once per scrape) vs. return (as used in the default collectors shipped in the package) I've already mentioned above.

It would further e.g. be nice to document that:

  • With direct instrumentation - I presume - only single metrics are set atomically. In the sense that if one has:
    gauge_a.set(34)
    gauge_b.set(66)
    
    and these - via labels (used like primary keys) - somehow relate to each other, a scrape in-between may see of course gauge_a set but guage_b not yet.
    Perhaps also mentioning that it's guaranteed that every single metric is either fully set or not - i.e. it cannot happen that e.g. one label is already set, the value, too, but a 2nd label is still empty.
  • With a custom collector OTOH, it seem that one can set multiple different metrics in an atomic way, i.e. any scrapes wait until the collect() has run through.

Another thing, which lacks documentation is IMO, what describe() shall actually return. As mentioned above, I have a use-case where I want to prevent collect() being run just for auto describe.
It's also not really clear what happens if there is simply no describe() function and not auto describe... I mean will it just fail? Will some functionallty be missing, which?

And there are many little bits and pieces:

And at least I'm kinda lacking some more concrete best-practises on how do design metrics. I know there are a few documents on that, but they're rather very abstract IMO.

I've had written a post with some more concrete questions here and another one in the same place, though that got caught by the spam filter and so far not moderator was willing to approve it.

@calestyo
Copy link
Author

calestyo commented Jul 11, 2024

What's IMO also confusing is:
For the custom collector metric classes (CounterMetricFamily, etc.).

My understanding was that each object of e.g. CounterMetricFamily is one metric (at last if gone goes by the terminology used in the Prometheus documentation), which can have multiple samples for different values of the possible labels.

Here however, MetricFamily seems to be what Prometheus calls metric and add_metric seems to set the most recent sample of timeseries for a metric+labelset combination.

Edit:
For reference:

Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

I would interpret this as e.g. node_filesystem_files is the “metric” (not metric family)... and every metric may have many timeseries, which are identified by their label(values).

So it feels that *MetricFamily and add_metric() are a bit of a misnomer.


Also, it's unclear why/when one would use __init__()’s value= and when .add_metric(…, value=). I haven't followed the code to the end, but would guess that .add_metric(…, value=) needs to be used whenever one has different labelsets per metric, and the constructor’s value= may be used when there are no lables (and perhaps also when one simply uses the empty value for all labels?)?

@calestyo
Copy link
Author

Yet another ;-) ... part that should IMO be documented is the following:

With direct instrumentation the typical usage seems to be:

  • first create once the metric object at some place
  • then, modify that object with .set(), .inc(), etc.

With custom collectors, one could, too do something like above e.g. as in:

import prometheus_client
import sys
import os
import time

r=prometheus_client.CollectorRegistry()

class MyCollector(prometheus_client.registry.Collector):
    def __init__(self, registry):
        self.foo = 0
        registry.register(self)
        self.tm = prometheus_client.metrics_core.CounterMetricFamily('my_time','Current time',labels=['host', 'foo'])

    def collect(self):
        print("collect called", file=sys.stderr)
        self.foo +=1
        tm = self.tm

        tm.add_metric([ os.uname()[1] ], value=self.foo)
        if self.foo % 2 == 0:
            tm.add_metric([ "foo" ], value=666)

        return [tm]


MYCOLLECTOR = MyCollector(r)

prometheus_client.start_http_server(8000, registry=r)
time.sleep(100000000)

where I create one CounterMetricFamily in __init__() and always call .add_metric() on that.

But the outcome of that is at best unexpected:

First scrape:

# HELP my_time_total Current time
# TYPE my_time_total counter
my_time_total{host="heisenberg"} 1.0

Second scrape:

# HELP my_time_total Current time
# TYPE my_time_total counter
my_time_total{host="heisenberg"} 1.0
my_time_total{host="heisenberg"} 2.0
my_time_total{host="foo"} 666.0

Third scrape:

# HELP my_time_total Current time
# TYPE my_time_total counter
my_time_total{host="heisenberg"} 1.0
my_time_total{host="heisenberg"} 2.0
my_time_total{host="foo"} 666.0
my_time_total{host="heisenberg"} 3.0

Seems like it would just store all the samples and print all... even for the same metric+labelset combinations.

I wasn't even aware that such "re-definitions" are allowed.

Instead, it seems, that – unlike with the direct instrumentation usage – one has to always re-create the metric objects like in:

import prometheus_client
import sys
import os
import time

r=prometheus_client.CollectorRegistry()

class MyCollector(prometheus_client.registry.Collector):
    def __init__(self, registry):
        self.foo = 0
        registry.register(self)

    def collect(self):
        print("collect called", file=sys.stderr)
        self.foo +=1
        tm = prometheus_client.metrics_core.CounterMetricFamily(
            'my_time',
            'Current time',
            labels=['host', 'foo'],
        )

        tm.add_metric([ os.uname()[1] ], value=self.foo)
        if self.foo % 2 == 0:
            tm.add_metric([ "foo" ], value=666)

        return [tm]


MYCOLLECTOR = MyCollector(r)

prometheus_client.start_http_server(8000, registry=r)
time.sleep(100000000)

in order to get reasonable results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants