This benchmark compares DataFusion to DuckDB performance with the ClickBench queries aganst the unmodified ClickBench parquet files.
- DataFusion 27.0.0
- DataFusion 28.0.0
- DuckDB 0.8.1
- Single parquet file (hits.parquet)
bash setup.sh
Install from crates.io:
cargo install datafusion-cli --version 28.0.0
Or build from source
git clone https://github.com/apache/arrow-datafusion.git
cd datafusion
cargo install --path datafusion-cli
python3 -m venv `pwd`/venv
source venv/bin/activate
pip install duckdb psutil
queres are run with run-datafusion.sh
or run-duckdb.sh
.
DuckDB:
CREATE=create-single-duckdb.sql bash run-duckdb.sh
DataFusion
DATAFUSION_CLI=./datafusion-cli.413eba1 CREATE=create-single-datafusion.sql bash run-datafusion.sh
More examples in benchmark.sh
Results are written into result.csv
The example python script is hash.py
python3 hash.py