Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour of filter and == #6920

Closed
Cidree opened this issue Aug 29, 2023 · 4 comments
Closed

Strange behaviour of filter and == #6920

Cidree opened this issue Aug 29, 2023 · 4 comments

Comments

@Cidree
Copy link

Cidree commented Aug 29, 2023

Hello,

I found a strange behaviour of the function filter when using the operator == and a vector. In the next example I would expect the function to give me an error because I should use %in%:

library(tidyverse)
mtcars %>% 
    as_tibble() %>% 
    filter(cyl == c(4, 6))

Instead, it is working and filtering 10 elements from the mtcars. However, the total elements with 4 or 6 cylinders in the mtcars dataset is 18. Why is this happening?
dplyr_error

Best,
Cidre

@manhnguyen48
Copy link

It seems cyl == c(6,4) returns a different result from cyl == c(4,6).

library(tidyverse)

as_tibble(mtcars, rownames = "Model") |> 
  mutate(`equal` = cyl == c(4,6), 
         `equal_2` = cyl == c(6,4), 
         `operation_in` = cyl %in% c(4,6)) |> 
  select(Model, cyl, equal:operation_in)
#> # A tibble: 32 × 5
#>    Model               cyl equal equal_2 operation_in
#>    <chr>             <dbl> <lgl> <lgl>   <lgl>       
#>  1 Mazda RX4             6 FALSE TRUE    TRUE        
#>  2 Mazda RX4 Wag         6 TRUE  FALSE   TRUE        
#>  3 Datsun 710            4 TRUE  FALSE   TRUE        
#>  4 Hornet 4 Drive        6 TRUE  FALSE   TRUE        
#>  5 Hornet Sportabout     8 FALSE FALSE   FALSE       
#>  6 Valiant               6 TRUE  FALSE   TRUE        
#>  7 Duster 360            8 FALSE FALSE   FALSE       
#>  8 Merc 240D             4 FALSE TRUE    TRUE        
#>  9 Merc 230              4 TRUE  FALSE   TRUE        
#> 10 Merc 280              6 TRUE  FALSE   TRUE        
#> # ℹ 22 more rows

Created on 2023-08-29 with reprex v2.0.2

@joranE
Copy link
Contributor

joranE commented Aug 29, 2023

This is just R's default vector recycling behavior. If you run:

mtcars$cyl == c(4,6)

you'll see that this generates a perfectly "valid" boolean vector of the correct length by recycling c(4,6) to the number of rows in mtcars.

While it may seem odd, some people probably do rely on this behavior in order to do things intentionally. In this instance, someone may want to compare cyl to a vector of alternating values of 4 and 6, and uses this as a shorthand for that.

Whether dplyr attempts to check for this and either warn or stop the user is a judgement call that the package maintainers would have to weigh in on.

@philibe
Copy link

philibe commented Aug 30, 2023

@Cidree

I would expect the function to give me an error because I should use %in%

The filter cyl == c(4, 6) is a pure missused R.

I don't expect that dplyr warn on everything, and less to make an error out of the scope of dplyr.

And as @joranE says, "some people probably do rely on this behavior": an other example could be the left hand side as vector.

@DavisVaughan
Copy link
Member

DavisVaughan commented Aug 30, 2023

Thanks @joranE, you've given the right explanation. I don't believe dplyr should error or warn on this, even though it is typically a user mistake. Unfortunately I think this one is too much of a slippery slope


If you really did want to try and avoid this, you could do:

`==` <- function(x, y) {
  vctrs::vec_equal(x, y)
}

dplyr::mutate(mtcars, cyl == c(4, 6))
#> Error in `dplyr::mutate()`:
#> ℹ In argument: `cyl == c(4, 6)`.
#> Caused by error in `vctrs::vec_equal()`:
#> ! Can't recycle `..1` (size 32) to match `..2` (size 2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants