Performance differences observed in r-polars (?) #293

eitsupi · 2024-09-06T09:04:08Z

I am not sure if this stems from the difference between extendr and savvy, so apologies if this is completely unrelated.

When comparing the already existing polars binding using extendr (polars) to the rewritten polars binding using savvy (neopolars), I noticed that the latter was orders of magnitude slower on both vector inputs and outputs.

pola-rs/r-polars#1079 (comment)

# Construct an Arrow array from an R vector
long_vec_1 <- 1:10^6

bench::mark(
  arrow = {
    arrow::as_arrow_array(long_vec_1)
  },
  nanoarrow = {
    nanoarrow::as_nanoarrow_array(long_vec_1)
  },
  polars = {
    polars::as_polars_series(long_vec_1)
  },
  neopolars = {
    neopolars::as_polars_series(long_vec_1)
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 arrow        2.62ms   2.92ms     328.    19.82MB     2.04
#> 2 nanoarrow  496.13µs 644.87µs    1252.   458.41KB     2.03
#> 3 polars       2.06ms   2.26ms     405.     6.33MB     0
#> 4 neopolars    84.6ms   90.1ms      10.9    1.59MB     0

# Export Arrow data as an R vector
arrow_array_1 <- arrow::as_arrow_array(long_vec_1)
nanoarrow_array_1 <- nanoarrow::as_nanoarrow_array(long_vec_1)
polars_series_1 <- polars::as_polars_series(long_vec_1)
neopolars_series_1 <- neopolars::as_polars_series(long_vec_1)

bench::mark(
  arrow = {
    as.vector(arrow_array_1)
  },
  nanoarrow = {
    as.vector(nanoarrow_array_1)
  },
  polars = {
    as.vector(polars_series_1)
  },
  neopolars = {
    as.vector(neopolars_series_1)
  },
  check = TRUE,
  min_iterations = 5
)
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 arrow       13.94µs  15.84µs  46309.      4.59KB     4.63
#> 2 nanoarrow   559.9µs   1.85ms    513.      3.85MB    72.8
#> 3 polars       6.45ms   8.79ms    112.      5.93MB     9.13
#> 4 neopolars  148.82ms 164.65ms      6.02    5.24MB     0

^{Created on 2024-09-05 with reprex v2.1.1}

If you could give me some advice on how to improve the performance in any way I would appreciate it.

The text was updated successfully, but these errors were encountered:

yutannihilation · 2024-09-06T10:41:47Z

Indeed neopolars is slower, but it seems it's not that slow on my Windows. Both polars and neopolars are freshly installed from GitHub by pak::pkg_install().

long_vec_1 <- 1:10^6

bench::mark(
  polars = {
    polars::as_polars_series(long_vec_1)
  },
  neopolars = {
    neopolars::as_polars_series(long_vec_1)
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars      196.1µs   1.11ms      833.   10.11MB     2.01
#> 2 neopolars    3.13ms   6.43ms      149.    1.03MB     0

polars_series_1 <- polars::as_polars_series(long_vec_1)
neopolars_series_1 <- neopolars::as_polars_series(long_vec_1)

bench::mark(
  polars = {
    as.vector(polars_series_1)
  },
  neopolars = {
    as.vector(neopolars_series_1)
  },
  check = TRUE,
  min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars       4.08ms   4.62ms      186.    5.85MB     22.0
#> 2 neopolars    7.18ms   8.53ms      117.    4.56MB     22.5

^{Created on 2024-09-06 with reprex v2.1.1}

eitsupi · 2024-09-06T13:29:38Z

Thanks for taking a look at this!
Perhaps the difference in my benchmark result could have been spread by different optimizations at build time in my installation process......

But even your results seem to show a difference of about 5x in construction and 2x in export, so I am wondering where the difference comes from.

yutannihilation · 2024-09-06T15:13:19Z

This repository is for checking if savvy is sufficiently fast, not for competing with extendr. I think a few ms is fast enough. Let's worry about the performance when we hit a problem with more real usages.

yutannihilation · 2024-09-09T00:36:24Z

One possible factor that might affect such a benchmark is that savvy always expands ALTREP vectors.

https://yutannihilation.github.io/savvy/guide/key_ideas.html#treating-external-sexp-and-owned-sexp-differently

In the code above, an ALTREP is created only once, so this shouldn't affect. But, future benchmark might show some bottleneck related to this.

# Construct an Arrow array from an R vector
long_vec_1 <- 1:10^6

daniellga · 2024-09-19T15:06:52Z

Wouldn't it be desirable for both projects to have a comparison benchmark? So we all could know if any update would result in a performance regression. I remember doing a simple one when switching to savvy and IIRC savvy was only a bit slower, nothing to worry about IMO...

yutannihilation transferred this issue from yutannihilation/savvy-benchmark Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance differences observed in r-polars (?) #293

Performance differences observed in r-polars (?) #293

eitsupi commented Sep 6, 2024

yutannihilation commented Sep 6, 2024

eitsupi commented Sep 6, 2024

yutannihilation commented Sep 6, 2024

yutannihilation commented Sep 9, 2024

daniellga commented Sep 19, 2024

Performance differences observed in r-polars (?) #293

Performance differences observed in r-polars (?) #293

Comments

eitsupi commented Sep 6, 2024

yutannihilation commented Sep 6, 2024

eitsupi commented Sep 6, 2024

yutannihilation commented Sep 6, 2024

yutannihilation commented Sep 9, 2024

daniellga commented Sep 19, 2024