You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been thinking what makes an expression computable in a remote engine and to be reassembled in the root engine. I think the following might be a sound algorithm that enables us to push down even more expressions.
Problem
Assume we have remote engines that are partitioned by a label P. Imagine an expression like topk(10, sum by (P, instance) (X)) and assume that instance is a high-cardinality label. In the current implementation we would have to evaluate sum by (P, instance) (X) remotely and fetch all values of instance into the root engine to compute the surrounding topk even though we would not need most of them, clearly we can evaluate topk per engine and only coalesce the top 10 values for the inner expression since we dont have any overlap. But the algorithm right now doesnt know that.
Observation
Imagine we are traversing the AST of a PromQL expression from the bottom up. Assume we have expression nodes P and C, where P is the parent of node C. If the expression at node C still has its original P label I want to conjecture that we can evaluate C in the remote engines and how we coalesce in the root engine depends on the parent node P. If P is nil and we are at the root we just join the series together, if its an aggregation we do the aggregation in the root engine, etc.
What is meant with the heuristic that expression node C is having the original P label still? I think we can define this recursively. In the base case we are at a vector selector, by definition it will have P (disregarding constant expressions and absent functions right now). If we are a node that has children and all children have the P label still then we also have it if we are not a relabel_replace function that overwrites it, if we are not aggregating it away (i.e. sum without(P) (X) would lose P) or if we are not a binary expression that doesnt have P in its matching labels (i.e. A * ignoring (P) B would lose P). In all other cases unless im forgetting something we should still have the partition label P and should be computable in a remote engine.
Suggestion
We rewrite the distribution algorithm by checking recursively if we keep the partition labels P, if so we traverse up until we cannot anymore. Once a parent P would lose the label we stop recursion and evaluate the expression henceforth in the root engine, we coalesce depending on the parent node P.
Is there anything obiously wrong with this approach?
Edit: this should also hold if P is multiple labels that form a logical partition of remote engines.
The text was updated successfully, but these errors were encountered:
I have been thinking what makes an expression computable in a remote engine and to be reassembled in the root engine. I think the following might be a sound algorithm that enables us to push down even more expressions.
Problem
Assume we have remote engines that are partitioned by a label
P
. Imagine an expression liketopk(10, sum by (P, instance) (X))
and assume thatinstance
is a high-cardinality label. In the current implementation we would have to evaluatesum by (P, instance) (X)
remotely and fetch all values of instance into the root engine to compute the surroundingtopk
even though we would not need most of them, clearly we can evaluatetopk
per engine and only coalesce the top 10 values for the inner expression since we dont have any overlap. But the algorithm right now doesnt know that.Observation
Imagine we are traversing the AST of a PromQL expression from the bottom up. Assume we have expression nodes P and C, where P is the parent of node C. If the expression at node C still has its original
P
label I want to conjecture that we can evaluate C in the remote engines and how we coalesce in the root engine depends on the parent node P. If P is nil and we are at the root we just join the series together, if its an aggregation we do the aggregation in the root engine, etc.What is meant with the heuristic that expression node C is having the original
P
label still? I think we can define this recursively. In the base case we are at a vector selector, by definition it will haveP
(disregarding constant expressions and absent functions right now). If we are a node that has children and all children have theP
label still then we also have it if we are not arelabel_replace
function that overwrites it, if we are not aggregating it away (i.e.sum without(P) (X)
would lose P) or if we are not a binary expression that doesnt haveP
in its matching labels (i.e.A * ignoring (P) B
would lose P). In all other cases unless im forgetting something we should still have the partition label P and should be computable in a remote engine.Suggestion
We rewrite the distribution algorithm by checking recursively if we keep the partition labels
P
, if so we traverse up until we cannot anymore. Once a parent P would lose the label we stop recursion and evaluate the expression henceforth in the root engine, we coalesce depending on the parent node P.Is there anything obiously wrong with this approach?
Edit: this should also hold if
P
is multiple labels that form a logical partition of remote engines.The text was updated successfully, but these errors were encountered: