Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: can't handle aggregates within other value operations #507

Open
gforsyth opened this issue Feb 8, 2023 · 1 comment
Open

bug: can't handle aggregates within other value operations #507

gforsyth opened this issue Feb 8, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@gforsyth
Copy link
Member

gforsyth commented Feb 8, 2023

What happened?

We're handling certain aggregate functions incorrectly.

If the aggregate is within a separate ValueOp, like the below somecol.sum() / sumcol.sum() then we hit

TypeError: Parameter to MergeFrom() must be instance of same class: expected substrait.ibis.Expression got substrait.ibis.AggregateFunction.

because we're recursing down into the expression, but the AggregateRel needs to be handled outside of the current expression (i'm not exactly sure how at this point)

Here's a failing test case:

def test_value_op_aggregate(compiler):
    t = ibis.table([("a", "float"), ("b", "int32")], name="t")
    expr = t.aggregate(mkt_share=t.a.sum() / t.b.sum())

    result = translate(expr, compiler)

    assert result

What version of ibis-substrait are you using?

2.21.0

What substrait consumer(s) are you using, if any?

No response

Relevant log output

No response

@gforsyth gforsyth added the bug Something isn't working label Feb 8, 2023
@gforsyth
Copy link
Member Author

gforsyth commented Apr 7, 2023

The first part of this to handle is the Value dispatch when either/both of left and right are a reduction e.g. t.a.sum() / 2 or t.a.sum() / t.b.sum()

The topological sort of:

    t = ibis.table([("a", "float"), ("b", "int32")], name="t")
    expr = t.aggregate(mkt_share=t.a.sum() / 2 )

will be something like
table -> literal -> col -> sum -> divide -> alias -> aggregate

but we currently fail at divide because we are treating the sum as an input to a "regular" scalar function.

We instead want to translate that into (roughly):

    # The Literal -> Col -> Sum -> Divide should resolve to:
    # project:
    #   common:
    #     emit: ...
    #   input:
    #     aggregate:
    #        input: table
    #        measures: sum
    #   expressions:
    #     scalarFunction:
    #     arguments:
    #       value:
    #         selection:
    #           directReference:
    #             structField
    #               ...
    #       value:
    #         literal:
    #           ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: backlog
Development

No branches or pull requests

1 participant