Running into spikes in metrics after deduplicaiton because vertical Compaction removing label makes it a new timeseries, is this expected? #4274
Replies: 5 comments 2 replies
-
Also could this be potentially addressed by keeping the label receive_replica when doing the vertical compaction? |
Beta Was this translation helpful? Give feedback.
-
It looks like vertical compacted (dedup of 1:1 data) has some artefact (or generate artefact due to some staleness maybe?) so it's bug. Can you give the raw data from Series API in a JSON from:
Using: https://github.com/thanos-io/thanos/blob/main/scripts/insecure_grpcurl_series.sh First thing we need to answer: Is this artefact of query time or dedup time. |
Beta Was this translation helpful? Give feedback.
-
Amazing work @jmichalek132 , you essentially solved the problem, let me try to implement this fix quickly 💪🏽 |
Beta Was this translation helpful? Give feedback.
-
Those test cases shows the issue: |
Beta Was this translation helpful? Give feedback.
-
WIP fix: #4375 I take this carefully and adding benchmarks to make sure we can improve dedu latency too |
Beta Was this translation helpful? Give feedback.
-
Hi, I have a question regarding usage of Vertical Compaction for case of one-to-one duplication (because of using thanos receive with replication factor higher than 1).
When querying the data ingested via thanos receive I noticed weird drops in metrics like this:
After removing the rate function we can see that the counter did truly stopped increasing for a few minutes:
After disabling deduplication and focusing on data from only one prometheus instance (from the HA pair), we can this is caused by fourth timeseriers ending after 5 minute because Staleness.
The other 3 timeseries come from thanos receive instance, the fourth one is coming from thanos store.
However it no longer has the receive_replica label since the Vertical Compaction compated the three blocks into one without the label.
The issue for us are the spikes caused by this in metrics.
I know we could potentially avoid this by disabling Vertical Compaction but that would lead to significant increase in data stored in S3 for us.
Is this expected? Is it a known issue (couldn't find one describing this)? Is there a workaround for this?
Beta Was this translation helpful? Give feedback.
All reactions