You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run with SMALL=True for testing, then SMALL=False to run with the original dataset (full size)
Anyone fancy translating to SQL so we could check DuckDB too? My intuition is that this wouldn't be DuckDB's forte - which is fine, DuckDB is incredibly good at many other things - I think that making a friendly comparison involving this kind of benchmark would give a more complete picture than "DuckDB scales better than Polars because TPC-H!"
The text was updated successfully, but these errors were encountered:
The M5 Forecasting Competition was held on Kaggle in 2020, and top solutions generally featured a lot of heavy feature engineering
Doing that feature engineering in pandas was quite slow, so I'm benchmarking how much better Polars would have been at that task
I think this is good to benchmark, as:
I think this reflects the kinds of gains that people doing applied data science can expect from using Polars
Here's a notebook with the queries + data: https://www.kaggle.com/code/marcogorelli/m5-forecasting-feature-engineering-benchmark/notebook
Run with
SMALL=True
for testing, thenSMALL=False
to run with the original dataset (full size)Anyone fancy translating to SQL so we could check DuckDB too? My intuition is that this wouldn't be DuckDB's forte - which is fine, DuckDB is incredibly good at many other things - I think that making a friendly comparison involving this kind of benchmark would give a more complete picture than "DuckDB scales better than Polars because TPC-H!"
The text was updated successfully, but these errors were encountered: