Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ fails when the condition contains NAs #92

Open
gogonzo opened this issue Nov 5, 2021 · 5 comments
Open

[ fails when the condition contains NAs #92

gogonzo opened this issue Nov 5, 2021 · 5 comments

Comments

@gogonzo
Copy link

gogonzo commented Nov 5, 2021

Hi there,
I have an issue related with the [ which is not consistent with the base R. Normally, when one pass the condition containing NAs into [, NAs will be returned for this row.

df <- data.frame(a = c(1, NA, 3), b = 1:3)
df[df$a > 0, ]

#     a  b
# 1   1  1
# NA NA NA
# 3   3  3

With S4Vectors is not the case

DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3)
DF[DF$a > 0, ]
# Error: logical subscript contains NAs

Do you have a plan to change this?

Regards,
DK

@hpages
Copy link
Contributor

hpages commented Nov 5, 2021

Hi,

Right. It seems that row selection of DataFrame objects only supports NAs in numeric or character subscripts at the moment:

DF <- S4Vectors::DataFrame(a = c(1, NA, 3), b = 1:3, row.names=LETTERS[1:3])
DF[c(3, NA), ]
# DataFrame with 2 rows and 2 columns
#              a         b
#      <numeric> <integer>
# C            3         3
# <NA>        NA        NA

DF[c("C", NA), ]
# DataFrame with 2 rows and 2 columns
#              a         b
#      <numeric> <integer>
# C            3         3
# <NA>        NA        NA

but not in logical subscripts:

DF[c(FALSE, NA, TRUE), ]
# Error: logical subscript contains NAs

It looks like when we added support for NA subscripts a few years ago (see commit 85c3a56), the logical case was overlooked. We'll work on this.

In the mean time, an easy workaround is to pass the logical subscript thru which():

DF[which(DF$a > 0), ]
# DataFrame with 2 rows and 2 columns
#           a         b
#   <numeric> <integer>
# A         1         1
# C         3         3

Note that this drops the rows corresponding to NAs in the logical subscript so does not behave exactly like df[df$a > 0, ], which you could see as a good thing. What are all these rows filled with NAs good for anyways?

If you really want to mimic exactly what df[df$a > 0, ] does:

DF[seq_len(nrow(DF))[DF$a > 0], ]
# DataFrame with 3 rows and 2 columns
#              a         b
#      <numeric> <integer>
# A            1         1
# <NA>        NA        NA
# C            3         3

Ouch... ugly! Hopefully this is still somewhat helpful?

Best,
H.

@gogonzo
Copy link
Author

gogonzo commented Nov 10, 2021

@hpages Thanks,
I'm fine with this so far, I'll implement (temporary I hope) workaround on my side.

Regards,
DK

@gogonzo
Copy link
Author

gogonzo commented Nov 11, 2021

Hi guys,
just to highlight the same problem but with NaN

DF <- S4Vectors::DataFrame(a = c(1, NaN))
DF[DF$a == 1, ]
# Error: logical subscript contains NAs

df <- data.frame(a = c(1, NaN))
df[df$a == 1, ]
# [1]  1 NA

Thanks for attention,
Regards,
DK

@danielinteractive
Copy link

@hpages @LiNk-NY thx a lot for your work on improving this - and pls let us know if we can help in any way (besides raising issues, thx to @gogonzo for that!)

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Jul 17, 2023

@hpages any updates on this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants