Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in dplyr::near? #6921

Open
kaz462 opened this issue Aug 31, 2023 · 6 comments
Open

bug in dplyr::near? #6921

kaz462 opened this issue Aug 31, 2023 · 6 comments

Comments

@kaz462
Copy link

kaz462 commented Aug 31, 2023

Is the following example a bug in near?

> near(1.1 * 100 * 10^200, 110 * 10^200)
[1] FALSE

Originally posted by @bundfussr in pharmaverse/admiral#2060 (comment)

The same example works as expected with base::all.equal()

> all.equal(1.1 * 100 * 10^200, 110 * 10^200)
[1] TRUE
@etiennebacher
Copy link

Slightly simpler reprex:

dplyr::near(1.1 * 100 * 10^5, 110 * 10^5)
#> [1] TRUE

dplyr::near(1.1 * 100 * 10^6, 110 * 10^6)
#> [1] FALSE

all.equal(1.1 * 100 * 10^6, 110 * 10^6)
#> [1] TRUE

@DavisVaughan
Copy link
Member

I don't think near() is broken, necessarily. This weirdness is mainly due to floating point differences in small numbers that are getting magnified when scaling up by 10^200.

options(digits = 22)

110
#> [1] 110
1.1 * 100
#> [1] 110.0000000000000142109

110 * 10^200
#> [1] 1.099999999999999891922e+202
1.1 * 100 * 10^200
#> [1] 1.100000000000000109476e+202

(1.1 * 100 * 10^200) - (110 * 10^200)
#> [1] 2.175541218577478036233e+186

# are they "near"?
((1.1 * 100 * 10^200) - (110 * 10^200)) < sqrt(.Machine$double.eps)
#> [1] FALSE

all.equal() works because it uses a relative difference rather than the absolute difference. Possibly near() should do the same, but it has had the current behavior for awhile now so I'm not sure if it should be changed. It might be better to just use all.equal() if you need this built in relative tolerance feature.

@bundfussr
Copy link

I think near() does what the name suggests: it tests if the absolute difference of two numbers is small.

However, the documentation is misleading.

This is a safe way of comparing if two vectors of floating point numbers are (pairwise) equal. This is safer than using ==, because it has a built in tolerance

The description suggests that the functions solves the issue of testing floating point numbers on equality. The example above show that this is not the case. Another example is:

> near(1 * 10^-8, 2 * 10^-8)
[1] TRUE

Thus I would propose that the description of near() is updated to clarify its purpose.

@kaz462
Copy link
Author

kaz462 commented Sep 20, 2023

@etiennebacher @DavisVaughan @bundfussr Thank you so much for the discussion!
Good to know that all.equal() uses relative differences, while near uses the absolute differences.

@StefanThoma
Copy link

StefanThoma commented Oct 10, 2023

near(1 * 10^-8, 2 * 10^-8)
@bundfussr
Same is TRUE for all.equal() in this example:

> all.equal(10^-8, 2 * 10^-8)
[1] TRUE

@NicChr
Copy link

NicChr commented Oct 20, 2023

It's worth noting that all.equal() is not commutative and so the order of arguments matters. See the example below.

all.equal(10^-8, 2 * 10^-8)
#> [1] TRUE
all.equal(2 * 10^-8, 10^-8)
#> [1] "Mean relative difference: 0.5"

I typically use something similar to the below function (written in C++).

near2 <- function(x, y, tol = sqrt(.Machine$double.eps)){
  adiff <- abs(x - y)
  ax <- abs(x)
  ay <- abs(y)
  any_close_to_zero <- (ax < tol) | (ay < tol)
  both_same_inf <- (x == Inf & y == Inf) | (x == -Inf & y == -Inf)
  different_inf <- (x == Inf & y == -Inf) | (x == -Inf & y == Inf)
  amax <- pmax(ax, ay)
  rdiff <- adiff / amax
  out <- dplyr::if_else(any_close_to_zero,
                        ( adiff < tol ),
                        ( rdiff < tol ))
  out[both_same_inf] <- TRUE
  out[different_inf] <- FALSE
  out
}
# Lower tolerance for small numbers
near2(10^-8, 2 * 10^-8, tol = sqrt(.Machine$double.eps)/10^4)
#> [1] FALSE
near2(2 * 10^-8, 10^-8, tol = sqrt(.Machine$double.eps)/10^4)
#> [1] FALSE
near2(1.1 * 100 * 10^200, 110 * 10^200)
#> [1] TRUE
near2(110 * 10^200, 1.1 * 100 * 10^200)
#> [1] TRUE

Created on 2023-10-20 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants