Add proposal for created timestamps

Signed-off-by: Arthur Silva Sens <[email protected]>
prometheus · Oct 3, 2023 · ca867f9 · ca867f9
1 parent 6169c90
commit ca867f9
Show file tree

Hide file tree

Showing 3 changed files with 152 additions and 0 deletions.
diff --git a/assets/2023-06-13_created-timestamps/followup-scrape-example.png b/assets/2023-06-13_created-timestamps/followup-scrape-example.png
diff --git a/assets/2023-06-13_created-timestamps/initial-scrape-example.png b/assets/2023-06-13_created-timestamps/initial-scrape-example.png
diff --git a/proposals/2023-06-13_created-timestamp.md b/proposals/2023-06-13_created-timestamp.md
@@ -0,0 +1,152 @@
+## Created Timestamps
+
+* **Owners**
+  * Arthur Silva Sens ([@ArthurSens](https://github.com/ArthurSens))
+  * Bartłomiej Płotka ([@bwplotka](https://github.com/bwplotka))
+  * Max Amin ([@macxamin](https://github.com/macxamin))
+  * Daniel Hrabovcak ([@TheSpiritXIII](https://github.com/TheSpiritXIII))
+
+* **Implementation**: Partially implemented
+
+* **Related Issues and PRs:**
+  * https://github.com/prometheus/prometheus/issues/6541
+
+* **Other docs or links:**
+  * https://docs.google.com/document/d/1kakDVn8aP1JerimLeazuMfy2jaB14W2kZDHMRokPcv4/edit
+
+> We propose that created timestamps to be exposed by common exposition formats. During scrape, created timestamps are processed before normal sample values. They are ingested as a "synthetic" sample with value 0 and timestamp equal to the created timestamp found in the scrape, followed by appending the actual sample with the usual scraped value timestamp.
+>
+> Additionally, created timestamps are added to time series metadata to be propagated via remote-write.
+
+## Why
+
+Prometheus counters are one of the most useful metric types in Prometheus. Thanks to monotonous characteristics and simple semantics it allows cheap and reliable calculations of rate/increase/delta (called “rates” in this document), even in the event of missed scrapes. 
+
+This worked well for years, but certain edge cases started to impact the community:
+* Common problems of uninitialized counters occur. We don’t know when the metric with value 0 started, so the counter starts with the incremented value. This can impact visualization and alerting.
+* Counter resets can go undetected if a counter is quickly incremented after a counter reset so that the next scrape sees a value that is higher than the value during the last scrape before the counter reset. 
+* Finding absolute value for longer metrics is expensive as one has to go sample by sample to detect resets. This impacts ingestors as it requires stateful algorithms that discern between new metrics and long-running ones it did not see before, as well as advanced algorithms like read-level deduplication algorithm in Thanos.
+
+Furthermore, the same problems occur for Summaries and Histograms (old and native ones) since all of them are represented by counters.
+
+### Pitfalls of the current solution
+
+Rate-like functions calculate relative differences between 2 points in time, naturally it requires at least 2 datapoints to be able to perform any calculations. For the most common cases, it works well for rates over long time ranges since it is likely that several data points where ingested.
+
+However, it falls short for rates that englobes the datapoint collected in the first scrape of an application, or when using rates against metrics ingested from push-based applications with long push intervals, because only one data point is present in such occasions. 
+
+## Goals
+
+* Prometheus can collect and store reset timestamps for counters, summaries and histograms on scrape using common exposition formats.
+* Prometheus will persist created timestamp across restarts (WAL).
+* Prometheus can perform meaningful rates-like functions even with a single scrape.
+* Prometheus can inject created timestamps for stateless remote write use.
+  * Stateless means here that every streamed batch of samples for certain counters have its created timestamp known.
+
+### Audience
+
+* Users monitoring push-based applications with long push intervals.
+* Users depending on high-precision rates, specially during the initial scrapes of an application.
+* Users that are sending metrics to ingestors that require created timestamp, e.g. Monarch.
+
+## Non-Goals
+
+* Propose changes to Prometheus text exposition format for now. Advanced functionalities like exemplars, native histograms or created timestamps are not on the short-term roadmap.
+* Elaborate on technical details for rate implementation.
+* Remote Write extensions (should be in different proposals).
+* Support Gauges and Gauge-like metrics (Info, Stateset, Gauge Histograms). None of the exposition formats allows this, thus it’s out of scope.
+
+## How
+
+We propose to append a newly appearing created timestamp as an additional 0-value sample to the TSDB. We call this a “zero injection”.
+
+During a scrape, created timestamps are processed before normal sample values. If a certain metric is not yet present in the TSDB, a “synthetic” sample is appended with value 0 and timestamp equal to the created timestamp found in the scrape. Followed by appending the actual sample with the usual scraped value and timestamp.
+
+![Initial scrape example](../assets/2023-06-13_created-timestamps/initial-scrape-example.png)
+
+On following scrapes, created timestamps are not appended until a scrape finds a created timestamp higher than the latest sample's timestamp, which would trigger the insertion of another sample with a 0-value sample.
+
+![Alt text](../assets/2023-06-13_created-timestamps/followup-scrape-example.png)
+
+This is already enough to improve rate calculations as mentioned in the [Goals](#goals) section, while also persisting such data in WAL (even if not explicit) and although it allows an easier implementation, it comes with few limitations:
+
+* Unable to ingest created timestamps out of TSDB Head range.
+* Unable to help ingestors that require created timestamps as a separate field, as they would need to implement stateful logic to identify injected zeros in a batch of samples.
+
+For those reasons, created timestamps will also be stored as metadata **per series**, following the similar logic used for the zero-injection.
+
+### Collection
+
+We propose created timestamps to be supported when Prometheus is used with Prometheus protobuf format, OpenMetrics proto and text formats. Let’s explore each case:
+
+#### Prometheus Protobuf
+
+Since Prometheus already supports scraping endpoints with Prometheus protobuf and knowing that adding the created timestamp field is trivial, it makes Prometheus protobuf the prefered option for initial implementation:
+
+```proto
+message Counter {
+  optional double   value    = 1;
+  optional Exemplar exemplar = 2;
+  optional google.protobuf.Timestamp    created_timestamp_ms = 3;
+}
+
+message Summary {
+  optional uint64   sample_count = 1;
+  optional double   sample_sum   = 2;
+  repeated Quantile quantile     = 3;
+  optional google.protobuf.Timestamp    created_timestamp_ms = 4;
+}
+
+message Histogram {
+  // …
+  repeated double positive_count    = 14; // Absolute count of each bucket.
+  optional google.protobuf.Timestamp  created_timestamp_ms = 15;
+}
+```
+
+Once the protocol is extended, Prometheus parser can easily identify created timestamps and ingest as mentioned in the [How](#how) section.
+
+#### OpenMetrics Protobuf
+
+Follows the same semantics as mentioned in [Prometheus Protobuf](#prometheus-protobuf), but Prometheus still isn't able to parse it.
+
+The task here is to simply implement the missing parser.
+
+#### OpenMetrics text
+
+For OpenMetrics text the situation is more complex. For OM text format, the _created timestamps series would be normally ingested as a normal series. However, for this solution we propose to capture the timestamp from the series and treat it as the synthetic zero. Then we propose to remove the artificial series to avoid further metric collisions, semantic differences (series vs created timestamp metadata) and consistency with other exposition formats.
+
+The removal of a special metric is a **breaking change**. However, we still propose doing it to stop the spread of this confusing pattern in the community (with adequate warning). The current usage of created metrics is low and users are [actually not adopting OM](https://github.com/prometheus/client_python/tree/v0.17.1#disabling-_created-metrics) for that reason. Thus allowing users to opt-out (users can set if they actually want to preserve those metrics) seems like a good way forward.
+
+Note that, the conversion of  _created series to created timestamp append will not be a trivial, or efficient code. We have to check various metrics, and buffer appends of potentially related metrics. Ideally, this will go away from the next version of OpenMetrics text (outside of the scope of this proposal).
+
+
+## Alternatives
+
+### Inject sentinel values
+
+Instead of injecting zero samples, a sentinel value, e.g. _CreatedNaN_ could be ingested intead to avoid confusion when noticing that not all samples follow the configured scrape interval.
+
+It would also help remote-write ingestors to easily identify created timestamps in a batch of samples.
+
+It would require refactoring of rate-like functions, where sentinel values would need to be replaced with zeros.
+
+### Created timestamps as timeseries
+
+Prometheus can already parse and ingest created timestamps from OpenMetrics text format. However, currently, it is ingested as a new timeseries.
+
+It comes with a stream of downsides, though:
+* A source of friction for adopting OpenMetrics text format for Prometheus users.
+* Inconsistent with OpenTelemetry and OpenMetrics proto format.
+* Confusing and surprising semantics by adding special series that have to be used especially. The link to related series is also poor and prone to errors.
+* Metric name collisions - there might be user’s metrics with _created suffix that means something else.
+* Inefficient storage, leading up to 50% memory increase if the instance ingests mostly counters. This is because each series adds indexing overhead. The created timestamp semantics allows for significantly more efficient storage if we treat them not as a series but tailor its APIs. Additionally, increases network utilization as more data is being scraped and transferred over the wire.
+* Inefficient and error-prone usage. rate(), increase() and delta() functions would need to perform extra queries to TSDB to find restarts and staleness. Metrics with clashing names might provide unexpected results.
+
+
+## Action Plan
+
+* [X] Extend Prometheus protobuf with created timestamps. https://github.com/prometheus/client_model/pull/66
+* [X] Implement created timestamps in client_golang. https://github.com/prometheus/client_golang/pull/1313
+* [ ] Implement parsing and zero-injection of created timestamps in Prometheus.
+* [ ] Implement created timestamps as time series metadata.