Description
Given a query like get <some_cumulative_timeseries> | ....
, we automatically convert those cumulative values into deltas. That's to make the data easier to operate on, and is required for grouping, alignment, or joining. We currently do that outside the ClickHouse database, after selecting out the minimal set of raw data that matches the query's filtering predicates. That has been fine thus far, but to help with #6480, we'll need to figure out how to do that inside the database. That's needed so that we can push other operations, like align
ment, into the database for cumulative timeseries.
This one will be pretty tricky, I think. ClickHouse has some methods for computing adjacent differences, but they're explicitly only valid within a single block. There might be a way to force ClickHouse to select the data into exactly one block, but I'm not sure that's possible. Other options might be grouping all the data into an array, and then using arrayDifference
. That makes us subject to the 1 million-element array size limit, though. That might be fine for today's data. For example, given a cumulative counter sampled at 1Hz, that would enable queries selecting up to about 11.5 days of data, which is pretty good.
There might be smarter ways to do it though, such as using a window function or another method entirely. This will need a bit of research, but I think will bear lots of fruit.