Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] ClickBench Improvements (Vanity Benchmark) #14586

Open
alamb opened this issue Feb 10, 2025 · 9 comments
Open

[EPIC] ClickBench Improvements (Vanity Benchmark) #14586

alamb opened this issue Feb 10, 2025 · 9 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Feb 10, 2025

Is your feature request related to a problem or challenge?

The ClickBench Benchmark measures the performance of filtering and aggregation

Being on top of ClickBench is somewhat of a vanity benchmark, as in my opinion I think all the engines within a factor of 2 of likely have similar user experiences (and the exact speed will depends on real user queries, etc)

That being said, the engine at the top of the benchmark is certainly good for publicity and DataFusion has used it as (see see our blog here Apache DataFusion is now the fastest single node engine for querying Apache Parquet files)

So this ticket tracks improving the ClickBench peformance even more

Recently, as @Dandandan has pointed out on #14246 (comment), DuckDB slipped past us in the most recent results

Image

Describe the solution you'd like

Get DataFusion back on top

Describe alternatives you've considered

While we could clearly implement ClickBench specific optimizations, I don't think that is really a valuable exercise for users. I would very much like to focus our efforts on actually useful optimization

Some ideas of real improvements:

What I would like is of people profile queries and try and find ways to improve the queries

Additional context

See related discussions on

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2025

I took a brief look at some results

Image

Q24 and Q26

I think this is Q24:

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY to_timestamp_seconds("EventTime"), "SearchPhrase" LIMIT 10;

Both have "ORDER BY to_timestamp_seconds("EventTime")` as a part of the query

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2025

Here is Q24:

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY to_timestamp_seconds("EventTime") LIMIT 10;

Here is 26:

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY to_timestamp_seconds("EventTime"), "SearchPhrase" LIMIT 10;

Both have "ORDER BY to_timestamp_seconds("EventTime")` as a part of the query

@Rachelint
Copy link
Contributor

Rachelint commented Feb 10, 2025

A low hanging fruit #13617, i plan to finish it in this week.

And maybe it is time to push #11943 forward...

I am trying a poc about support block approach by only modifying codes of group values(we also need to modifying codes of GroupAccumulatortoo in #11943).

It is really horrible if we need to implement block approach for all exist and new added GroupAccumulators...

@alamb
Copy link
Contributor Author

alamb commented Feb 10, 2025

I am trying a poc about support block approach by only modifying codes of group values(we also need to modifying codes of GroupAccumulatortoo in #11943).

If the performance gains are worth it I can potentially help organize a larger refactoring effort too (to incrementally port over the code). We are in much better shape test-wise now. If you have a good approach I'll find time to help coordinate

@Rachelint
Copy link
Contributor

Rachelint commented Feb 10, 2025

I am trying a poc about support block approach by only modifying codes of group values(we also need to modifying codes of GroupAccumulatortoo in #11943).

If the performance gains are worth it I can potentially help organize a larger refactoring effort too (to incrementally port over the code). We are in much better shape test-wise now. If you have a good approach I'll find time to help coordinate

I will find a query to measurement the performance in old implementation in #11943 and in the new implementation.
I guess they will have the similar performance but I am still not sure now.

And if approach about supporting this only by GroupValues can work, it may be easy to introduce another optimization #12526 .

@Rachelint
Copy link
Contributor

Rachelint commented Feb 10, 2025

On optimizer side, I am not sure if single_distinct_to_groupby can really improve performance in current version (it is an old rule introduced in long long ago), maybe we can check it again?

@qazxcdswe123
Copy link
Contributor

qazxcdswe123 commented Mar 8, 2025

On optimizer side, I am not sure if single_distinct_to_groupby can really improve performance in current version (it is an old rule introduced in long long ago), maybe we can check it again?

On my m1 mac 16g, removing it shows no differences. So i think we can maybe remove it?

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃   op-main ┃     op-rm ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 1507.03ms │ 1477.66ms │ no change │
│ QQuery 2     │  245.57ms │  238.16ms │ no change │
│ QQuery 3     │  636.13ms │  641.14ms │ no change │
│ QQuery 4     │  611.28ms │  589.80ms │ no change │
│ QQuery 5     │ 1048.39ms │ 1029.87ms │ no change │
│ QQuery 6     │  224.92ms │  218.53ms │ no change │
│ QQuery 7     │ 1591.94ms │ 1592.04ms │ no change │
│ QQuery 8     │ 1096.93ms │ 1084.95ms │ no change │
│ QQuery 9     │ 1863.67ms │ 1781.91ms │ no change │
│ QQuery 10    │  972.32ms │  939.49ms │ no change │
│ QQuery 11    │  220.96ms │  219.66ms │ no change │
│ QQuery 12    │  521.82ms │  505.83ms │ no change │
│ QQuery 13    │ 1024.18ms │  978.79ms │ no change │
│ QQuery 14    │  415.18ms │  407.40ms │ no change │
│ QQuery 15    │  682.46ms │  662.86ms │ no change │
│ QQuery 16    │  165.75ms │  165.36ms │ no change │
│ QQuery 17    │ 1899.49ms │ 1864.64ms │ no change │
│ QQuery 18    │ 2996.90ms │ 2976.30ms │ no change │
│ QQuery 19    │  712.68ms │  697.34ms │ no change │
│ QQuery 20    │  690.14ms │  670.94ms │ no change │
│ QQuery 21    │ 2388.29ms │ 2345.82ms │ no change │
│ QQuery 22    │  264.88ms │  265.54ms │ no change │
└──────────────┴───────────┴───────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (op-main)   │ 21780.91ms │
│ Total Time (op-rm)     │ 21354.03ms │
│ Average Time (op-main) │   990.04ms │
│ Average Time (op-rm)   │   970.64ms │
│ Queries Faster         │          0 │
│ Queries Slower         │          0 │
│ Queries with No Change │         22 │
└────────────────────────┴────────────┘

@alamb
Copy link
Contributor Author

alamb commented Mar 8, 2025

On my m1 mac 16g, removing it shows no differences. So i think we can maybe remove it?

Sounds good! Can you make a PR and we can get ready to make the change?

Removing optimizer passes in general I think helps planning performance too

@qazxcdswe123
Copy link
Contributor

qazxcdswe123 commented Mar 9, 2025

On my m1 mac 16g, removing it shows no differences. So i think we can maybe remove it?

Sounds good! Can you make a PR and we can get ready to make the change?

Removing optimizer passes in general I think helps planning performance too


Queries like avg(distinct a) rely on this rule, and without it, they cannot be executed anymore.

The following query is available on the main branch, but not on this PR branch.

DataFusion CLI v46.0.0
> select avg(distinct a) from values(1) t(a);
Execution error: avg(DISTINCT) aggregations are not available

#15099 (comment)

I didn't ran the full test at that time, but single_distinct_to_groupby is used somewhere else and after some more investigation I think it's easier to keep it 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants