Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add multi-column support to AggFormula #6206

Merged
merged 24 commits into from
Nov 18, 2024

Conversation

lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Oct 15, 2024

NOTE: this operator allows you to use key columns (group by columns) as input to the formula. In this case, the key column value is presented to the formula as a scalar and not a vector.

Python examples:

from deephaven import empty_table
from deephaven import agg

source = empty_table(20).update(
    ["X = i", "Y = 2 * i", "Z = 3 * i", "Letter = (X % 2 == 0) ? `A` : `B`"]
)

result = source.agg_by([
    agg.formula(formula="out_a=sqrt(5.0)"),
    agg.formula(formula="out_b=min(X)"),
    agg.formula(formula="out_c=min(X) + max(Y)"),
    agg.formula(formula="out_d=sum(X + Y + Z)"),
], by=["Letter"])

Groovy examples:

source = emptyTable(20).update("X = i", "Y = 2 * i", "Z = 3 * i", "Letter = (X % 2 == 0) ? `A` : `B`")

result = source.aggBy([
    AggFormula("out_a=sqrt(5.0)"),
    AggFormula("out_b=min(X)"),
    AggFormula("out_c=min(X) + max(Y)"),
    AggFormula("out_d=sum(X + Y + Z)"),
], "Letter")

@lbooker42 lbooker42 self-assigned this Oct 16, 2024
@lbooker42 lbooker42 added this to the 0.37.0 milestone Oct 16, 2024
py/server/deephaven/agg.py Outdated Show resolved Hide resolved
Comment on lines 183 to 185
cols (Union[str, List[str]]): If provided, supplies the column(s) to aggregate on, can be renaming expressions,
i.e. "new_col = col". Default is None, which can be valid when the `formula` argument supplies the input
column names or when used in Table agg_all_by operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting cols as a List[str] seems to suggest that we might want to support a list of formulas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure

formula(['output_col1=(input_col1 + input_col2) * input_col3', 'output_col2=(input_col21 + input_col22) * input_col23'])

is better than

[formula('output_col1=(input_col1 + input_col2) * input_col3'), formula('output_col2=(input_col21 + input_col22) * input_col23')]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other methods do the second. I'm pointing out the inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a technical reason why we can't do that immediately for Java (already using String... var args to support Pair args). We could do it for Python but then we'd have a mismatch.

When/if we completely sunset the parameterized version, we can implement this without any conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this ticket to capture this request for multiple formula per call. #6392

py/client/pydeephaven/agg.py Outdated Show resolved Hide resolved
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worthwhile bringing up the gRPC structuring layer given work that Charles want to do there too...

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@lbooker42 lbooker42 requested a review from chipkent November 12, 2024 18:45
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all we're missing is proper Java-side testing in QueryTableAggregationTest.

rcaudy
rcaudy previously approved these changes Nov 16, 2024
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong, TestAggBy coverage looks adequate to me.

@lbooker42 lbooker42 merged commit a20cb19 into deephaven:main Nov 18, 2024
17 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Nov 18, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#363

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants