Skip to content

Commit

Permalink
PySpark pivot results
Browse files Browse the repository at this point in the history
  • Loading branch information
SemyonSinchenko committed Jun 4, 2024
1 parent 79c6186 commit 7a355b9
Show file tree
Hide file tree
Showing 7 changed files with 22 additions and 7 deletions.
14 changes: 11 additions & 3 deletions docs/benchmark_results.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Results of Benchmark

![Results](https://raw.githubusercontent.com/SemyonSinchenko/feature-generation-benchmark/main/docs/static/results_overview.png)

## Setup

**EC2 m5.4xlarge**
Expand Down Expand Up @@ -38,7 +40,9 @@ See `src/lib.rs` for details of the implementation.
| Polars pivot | 4.54 |
| DuckDB pivot | 4.10 |
| DuckDB case-when | 36.59 |
| PySpark Comet case-when | 96.93 |
| PySpark Comet case-when | 94.06 |
| PySpark-4 polars-udf | 53.06 |
| PySpark pivot | 104.21 |


## Small Dataset
Expand All @@ -57,7 +61,9 @@ See `src/lib.rs` for details of the implementation.
| DuckDB case-when | 304.52 |
| PySpark pandas-udf | 516.38 |
| PySpark case-when | 1808.99 |
| PySpark Comet case-when | 651.84 |
| PySpark Comet case-when | 729.75 |
| PySpark-4 polars-udf | 356.19 |
| PySpark pivot | 151.60 |



Expand All @@ -76,4 +82,6 @@ See `src/lib.rs` for details of the implementation.
| DuckDB pivot | 2181.59 |
| PySpark pandas-udf | 5983.14 |
| PySpark case-when | 17653.46 |
| PySpark Comet case-when | 4635.37 |
| PySpark Comet case-when | 4873.54 |
| PySpark-4 polars-udf | 4704.73 |
| PySpark pivot | 455.49 |
2 changes: 2 additions & 0 deletions docs/benchmark_results.md.jinja2
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Results of Benchmark

![Results](https://raw.githubusercontent.com/SemyonSinchenko/feature-generation-benchmark/main/docs/static/results_overview.png)

## Setup

**EC2 m5.4xlarge**
Expand Down
Binary file modified docs/static/results_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion notebooks/Data Vizualization.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion results/results_medium.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@
"PySpark pandas-udf": 5983.137866973877,
"PySpark case-when": 17653.461864709854,
"PySpark Comet case-when": 4873.537589073181,
"PySpark-4 polars-udf": 4704.727216005325
"PySpark-4 polars-udf": 4704.727216005325,
"PySpark pivot": 455.48546409606934
}
3 changes: 2 additions & 1 deletion results/results_small.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@
"PySpark pandas-udf": 516.3818678855896,
"PySpark case-when": 1808.9883704185486,
"PySpark Comet case-when": 729.7482228279114,
"PySpark-4 polars-udf": 356.1914813518524
"PySpark-4 polars-udf": 356.1914813518524,
"PySpark pivot": 151.59994101524353
}
3 changes: 2 additions & 1 deletion results/results_tiny.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@
"DuckDB pivot": 4.098507404327393,
"DuckDB case-when": 36.58749794960022,
"PySpark Comet case-when": 94.0553047657013,
"PySpark-4 polars-udf": 53.060457944869995
"PySpark-4 polars-udf": 53.060457944869995,
"PySpark pivot": 104.20818734169006
}

0 comments on commit 7a355b9

Please sign in to comment.