-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(sql): avoid extraneous operator parenthesis #10007
Conversation
Nice! What does the performance versus DuckDB raw SQL look like? |
a84285c
to
b681554
Compare
from functools import reduce
from operator import add
from timeit import Timer
import pyarrow as pa
import ibis
con = ibis.duckdb.connect()
N = 100
data = pa.Table.from_pydict({f"x{i}": [1, 2, 3] for i in range(N)})
t = con.create_table("test", data)
expr = t.select(sum=reduce(add, map(t.__getitem__, t.columns)))
sql = f"SELECT {'+'.join(t.columns)} AS sum FROM test"
for name, func in [
("duckdb", lambda: con.raw_sql(sql).arrow()),
("ibis-duckdb", lambda: expr.to_pyarrow()),
]:
n, t = Timer(func).autorange()
print(f"{name}: {1000 * t / n:.1f} ms") On this PR:
With another upcoming PR:
|
b681554
to
6ea09d7
Compare
6ea09d7
to
e6f66c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but I do think the .associative
approach would be a bit cleaner.
@@ -469,6 +460,19 @@ def impl(self, _, *, _name: str = target_name, **kw): | |||
for op, target_name in cls.SIMPLE_OPS.items(): | |||
setattr(cls, methodname(op), make_impl(op, target_name)) | |||
|
|||
# Define binary op methods, only if BINARY_INFIX_OPS is set on the | |||
# compiler class. | |||
if binops := cls.__dict__.get("BINARY_INFIX_OPS", {}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this checking specifically for subclass overrides of this attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. If we did cls.BINARY_INFIX_OPS
it would define the methods in every subclass (since no subclasses currently define BINARY_INFIX_OPS
), clobbering any explicit visit_*
methods the subclass defined. We'd need to do this for SIMPLE_OPS
etc... too, except that every compiler subclass currently sets that attribute themselves with at least one override.
ops.Power, | ||
BINARY_INFIX_OPS = { | ||
# Numeric | ||
ops.Add: (sge.Add, True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bool here seems like it should be a property of the operation instead of a flag.
It's hard to tell what the boolean is referring to unless you know where it's being used.
What do you think about sticking a class attribute on the ops for this property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea. I'm going to merge this now but may refactor that in a follow-up.
# compiler class. | ||
if binops := cls.__dict__.get("BINARY_INFIX_OPS", {}): | ||
|
||
def make_binop(sge_cls, associative): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like most or all of the associative
arguments can go away if you make it an op attribute.
This PR is motivated by #9987, and is one part of decreasing the compilation overhead here.
This PR:
There are still a few cases where we excessively apply parenthesis (mostly in backends where a function is used instead of a binary operator), but this seemed good enough for now and is a nice improvement overall. After this PR the benchmarks now run successfully. I have some follow-ups to further improve performance in other places, but figured I'd split the PRs out by topic.