Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance differences observed in r-polars (?) #293

Open
eitsupi opened this issue Sep 6, 2024 · 5 comments
Open

Performance differences observed in r-polars (?) #293

eitsupi opened this issue Sep 6, 2024 · 5 comments

Comments

@eitsupi
Copy link
Contributor

eitsupi commented Sep 6, 2024

I am not sure if this stems from the difference between extendr and savvy, so apologies if this is completely unrelated.

When comparing the already existing polars binding using extendr (polars) to the rewritten polars binding using savvy (neopolars), I noticed that the latter was orders of magnitude slower on both vector inputs and outputs.

pola-rs/r-polars#1079 (comment)

# Construct an Arrow array from an R vector
long_vec_1 <- 1:10^6

bench::mark(
  arrow = {
    arrow::as_arrow_array(long_vec_1)
  },
  nanoarrow = {
    nanoarrow::as_nanoarrow_array(long_vec_1)
  },
  polars = {
    polars::as_polars_series(long_vec_1)
  },
  neopolars = {
    neopolars::as_polars_series(long_vec_1)
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 arrow        2.62ms   2.92ms     328.    19.82MB     2.04
#> 2 nanoarrow  496.13µs 644.87µs    1252.   458.41KB     2.03
#> 3 polars       2.06ms   2.26ms     405.     6.33MB     0
#> 4 neopolars    84.6ms   90.1ms      10.9    1.59MB     0
# Export Arrow data as an R vector
arrow_array_1 <- arrow::as_arrow_array(long_vec_1)
nanoarrow_array_1 <- nanoarrow::as_nanoarrow_array(long_vec_1)
polars_series_1 <- polars::as_polars_series(long_vec_1)
neopolars_series_1 <- neopolars::as_polars_series(long_vec_1)

bench::mark(
  arrow = {
    as.vector(arrow_array_1)
  },
  nanoarrow = {
    as.vector(nanoarrow_array_1)
  },
  polars = {
    as.vector(polars_series_1)
  },
  neopolars = {
    as.vector(neopolars_series_1)
  },
  check = TRUE,
  min_iterations = 5
)
#> # A tibble: 4 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 arrow       13.94µs  15.84µs  46309.      4.59KB     4.63
#> 2 nanoarrow   559.9µs   1.85ms    513.      3.85MB    72.8
#> 3 polars       6.45ms   8.79ms    112.      5.93MB     9.13
#> 4 neopolars  148.82ms 164.65ms      6.02    5.24MB     0

Created on 2024-09-05 with reprex v2.1.1

If you could give me some advice on how to improve the performance in any way I would appreciate it.

@yutannihilation
Copy link
Owner

Indeed neopolars is slower, but it seems it's not that slow on my Windows. Both polars and neopolars are freshly installed from GitHub by pak::pkg_install().

long_vec_1 <- 1:10^6

bench::mark(
  polars = {
    polars::as_polars_series(long_vec_1)
  },
  neopolars = {
    neopolars::as_polars_series(long_vec_1)
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars      196.1µs   1.11ms      833.   10.11MB     2.01
#> 2 neopolars    3.13ms   6.43ms      149.    1.03MB     0

polars_series_1 <- polars::as_polars_series(long_vec_1)
neopolars_series_1 <- neopolars::as_polars_series(long_vec_1)

bench::mark(
  polars = {
    as.vector(polars_series_1)
  },
  neopolars = {
    as.vector(neopolars_series_1)
  },
  check = TRUE,
  min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars       4.08ms   4.62ms      186.    5.85MB     22.0
#> 2 neopolars    7.18ms   8.53ms      117.    4.56MB     22.5

Created on 2024-09-06 with reprex v2.1.1

@eitsupi
Copy link
Contributor Author

eitsupi commented Sep 6, 2024

Thanks for taking a look at this!
Perhaps the difference in my benchmark result could have been spread by different optimizations at build time in my installation process......

But even your results seem to show a difference of about 5x in construction and 2x in export, so I am wondering where the difference comes from.

@yutannihilation
Copy link
Owner

This repository is for checking if savvy is sufficiently fast, not for competing with extendr. I think a few ms is fast enough. Let's worry about the performance when we hit a problem with more real usages.

@yutannihilation yutannihilation transferred this issue from yutannihilation/savvy-benchmark Sep 9, 2024
@yutannihilation
Copy link
Owner

One possible factor that might affect such a benchmark is that savvy always expands ALTREP vectors.

https://yutannihilation.github.io/savvy/guide/key_ideas.html#treating-external-sexp-and-owned-sexp-differently

In the code above, an ALTREP is created only once, so this shouldn't affect. But, future benchmark might show some bottleneck related to this.

# Construct an Arrow array from an R vector
long_vec_1 <- 1:10^6

@daniellga
Copy link
Contributor

Wouldn't it be desirable for both projects to have a comparison benchmark? So we all could know if any update would result in a performance regression. I remember doing a simple one when switching to savvy and IIRC savvy was only a bit slower, nothing to worry about IMO...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants