bug in `dplyr::near`? #6921

kaz462 · 2023-08-31T00:49:25Z

Is the following example a bug in near?

> near(1.1 * 100 * 10^200, 110 * 10^200)
[1] FALSE

Originally posted by @bundfussr in pharmaverse/admiral#2060 (comment)

The same example works as expected with base::all.equal()

> all.equal(1.1 * 100 * 10^200, 110 * 10^200)
[1] TRUE

The text was updated successfully, but these errors were encountered:

etiennebacher · 2023-08-31T08:02:05Z

Slightly simpler reprex:

dplyr::near(1.1 * 100 * 10^5, 110 * 10^5)
#> [1] TRUE

dplyr::near(1.1 * 100 * 10^6, 110 * 10^6)
#> [1] FALSE

all.equal(1.1 * 100 * 10^6, 110 * 10^6)
#> [1] TRUE

DavisVaughan · 2023-08-31T13:31:26Z

I don't think near() is broken, necessarily. This weirdness is mainly due to floating point differences in small numbers that are getting magnified when scaling up by 10^200.

options(digits = 22)

110
#> [1] 110
1.1 * 100
#> [1] 110.0000000000000142109

110 * 10^200
#> [1] 1.099999999999999891922e+202
1.1 * 100 * 10^200
#> [1] 1.100000000000000109476e+202

(1.1 * 100 * 10^200) - (110 * 10^200)
#> [1] 2.175541218577478036233e+186

# are they "near"?
((1.1 * 100 * 10^200) - (110 * 10^200)) < sqrt(.Machine$double.eps)
#> [1] FALSE

all.equal() works because it uses a relative difference rather than the absolute difference. Possibly near() should do the same, but it has had the current behavior for awhile now so I'm not sure if it should be changed. It might be better to just use all.equal() if you need this built in relative tolerance feature.

bundfussr · 2023-09-01T14:08:08Z

I think near() does what the name suggests: it tests if the absolute difference of two numbers is small.

However, the documentation is misleading.

This is a safe way of comparing if two vectors of floating point numbers are (pairwise) equal. This is safer than using ==, because it has a built in tolerance

The description suggests that the functions solves the issue of testing floating point numbers on equality. The example above show that this is not the case. Another example is:

> near(1 * 10^-8, 2 * 10^-8)
[1] TRUE

Thus I would propose that the description of near() is updated to clarify its purpose.

kaz462 · 2023-09-20T05:04:56Z

@etiennebacher @DavisVaughan @bundfussr Thank you so much for the discussion!
Good to know that all.equal() uses relative differences, while near uses the absolute differences.

StefanThoma · 2023-10-10T11:04:18Z

near(1 * 10^-8, 2 * 10^-8)
@bundfussr
Same is TRUE for all.equal() in this example:

> all.equal(10^-8, 2 * 10^-8)
[1] TRUE

NicChr · 2023-10-20T11:19:29Z

It's worth noting that all.equal() is not commutative and so the order of arguments matters. See the example below.

all.equal(10^-8, 2 * 10^-8)
#> [1] TRUE
all.equal(2 * 10^-8, 10^-8)
#> [1] "Mean relative difference: 0.5"

I typically use something similar to the below function (written in C++).

near2 <- function(x, y, tol = sqrt(.Machine$double.eps)){
  adiff <- abs(x - y)
  ax <- abs(x)
  ay <- abs(y)
  any_close_to_zero <- (ax < tol) | (ay < tol)
  both_same_inf <- (x == Inf & y == Inf) | (x == -Inf & y == -Inf)
  different_inf <- (x == Inf & y == -Inf) | (x == -Inf & y == Inf)
  amax <- pmax(ax, ay)
  rdiff <- adiff / amax
  out <- dplyr::if_else(any_close_to_zero,
                        ( adiff < tol ),
                        ( rdiff < tol ))
  out[both_same_inf] <- TRUE
  out[different_inf] <- FALSE
  out
}
# Lower tolerance for small numbers
near2(10^-8, 2 * 10^-8, tol = sqrt(.Machine$double.eps)/10^4)
#> [1] FALSE
near2(2 * 10^-8, 10^-8, tol = sqrt(.Machine$double.eps)/10^4)
#> [1] FALSE
near2(1.1 * 100 * 10^200, 110 * 10^200)
#> [1] TRUE
near2(110 * 10^200, 1.1 * 100 * 10^200)
#> [1] TRUE

^{Created on 2023-10-20 with reprex v2.0.2}

kaz462 mentioned this issue Sep 20, 2023

relative vs absolute difference PredictiveEcology/fpCompare#6

Open

DavisVaughan added the documentation label Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in `dplyr::near`? #6921

bug in `dplyr::near`? #6921

kaz462 commented Aug 31, 2023

etiennebacher commented Aug 31, 2023

DavisVaughan commented Aug 31, 2023

bundfussr commented Sep 1, 2023

kaz462 commented Sep 20, 2023

StefanThoma commented Oct 10, 2023 •

edited

Loading

NicChr commented Oct 20, 2023 •

edited

Loading

bug in dplyr::near? #6921

bug in dplyr::near? #6921

Comments

kaz462 commented Aug 31, 2023

etiennebacher commented Aug 31, 2023

DavisVaughan commented Aug 31, 2023

bundfussr commented Sep 1, 2023

kaz462 commented Sep 20, 2023

StefanThoma commented Oct 10, 2023 • edited Loading

NicChr commented Oct 20, 2023 • edited Loading

bug in `dplyr::near`? #6921

bug in `dplyr::near`? #6921

StefanThoma commented Oct 10, 2023 •

edited

Loading

NicChr commented Oct 20, 2023 •

edited

Loading