Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: unmatched="error" for the left table (x) in the left_join() #6898

Closed
idavydov opened this issue Jul 28, 2023 · 1 comment
Closed

Comments

@idavydov
Copy link

idavydov commented Jul 28, 2023

When performing left_join() it would be nice to have an option to throw an error in case one of the rows in x didn't have a match.

Background

A common use-case is to have some kind of large metainformation table. You would like to join meta-information while making sure that every row of the original (left) table has exactly one match.

unmatched="error" in the left_join() will check if all the rows in y had a match.

Example

meta <- tribble(
  ~ Species, ~ Scientific.Name,
  "setosa", "Iris setosa",
  "virginica", "Iris virginica",
  # versicolor is missing
  "human", "Homo sapiens",
  "dog", "Canis lupus familiaris",
)
with_meta <- left_join(
  iris,
  meta,
  relationship = "many-to-one"
  # here we need something like unmatched = "x_error" , which will through an error because 
  # versicolor wasn't matched
)
# that's a work-around
stopifnot(any(is.na(with_meta$Scientific.Name)))
@idavydov idavydov changed the title feature request: unmatched="error" for the right table in the left_join feature request: unmatched="error" for the left table (x) in the left_join Jul 28, 2023
@idavydov idavydov changed the title feature request: unmatched="error" for the left table (x) in the left_join feature request: unmatched="error" for the left table (x) in the left_join() Jul 28, 2023
@DavisVaughan
Copy link
Member

That's an inner join with unmatched = c("error", "drop") (i.e. error if an x row is unmatched but it is ok if a y row is unmatched).

library(dplyr, warn.conflicts = FALSE)

meta <- tribble(
  ~ Species, ~ Scientific.Name,
  "setosa", "Iris setosa",
  "virginica", "Iris virginica",
  # versicolor is missing
  "human", "Homo sapiens",
  "dog", "Canis lupus familiaris",
)

inner_join(
  iris,
  meta,
  relationship = "many-to-one",
  unmatched = c("error", "drop")
)
#> Joining with `by = join_by(Species)`
#> Error in `inner_join()`:
#> ! Each row of `x` must have a match in `y`.
#> ℹ Row 51 of `x` does not have a match.

It doesn't make sense to add this to left join because the "left" part of left join is a statement about what you do with the rows of x, i.e. the "left" part means: "keep all rows of x whether or not they have a match". In other words, when you use left_join() you always get an implicit unmatched = "keep" for x, and then you get to specify unmatched for y.

For inner join, you don't automatically get any unmatched behavior, so you get to specify unmatched for both x and y separately as needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants