-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bd tree #19
Comments
Thanks for the bug report. There seems to be an issue with the ANN library's bd method when there are duplicate points in the target matrix. It's not really something I can debug in a hurry and I have previously considers inactivating the bd method for this reason. |
you could simply test for duplicated points if the user chooses "bd" |
This is exactly what is done by another ANN wrapper: https://github.com/cran/yaImpute/blob/master/R/ann.R#L17 but the time spent for the check makes using the bd tree pointless. So that is why I hesitated the last time this bug was reported. My feeling is that if the functionality is either unsafe or uneconomically slow it should just be eliminated. The only other alternative I see is to add an argument like |
bd tree should really be able to handle this. Can we request a patch upstream, or do it ourselves? |
You're right of course @krlmlr. I'm on holiday right now so won't be doing this myself, but if you want to try, I think you could contact the senior author of ANN for input. Best, Greg. Sent from my iPhone
|
@jefferis I wanted to ask if there was any progress on this? The bug seems to persist in the CRAN version. At minimum, there should be a safeguard for the code not to give segfaults. |
Hi, In my case I get a segfault with |
To reiterate, the bug is in the upstream library and I don't have time to chase it down. There must be a comparison logic error somewhere that is revealed with two identical points. I can either remove bd functionality altogether or provide a check for identical points that can be turned off at the user's discretion. But as a reminder, checking for identical points is so expensive that it makes the bd tree pointless. Votes? |
@jefferis then maybe: remove, document it in help and e-mail the author of the algorithm? |
Well a segfault is a big problem! I think you could document this bug. I don't know how |
I made few benchmark with 300k 3D points RANN::nn2(X, X, k = k, treetype = "kd", searchtype = "radius", radius = 2)
Maybe I miss something. Maybe 300k is not big enough to take advantage of |
I haven't had a compelling use case for bd but I expect there will be some situation in which it is faster.
How are you checking for dupes and what is your estimate for the tree build time (can estimate by time taken to search for just one point)?
…Sent from my iPhone
On 2 Oct 2017, at 22:22, Jean-Romain Roussel ***@***.***> wrote:
I made few benchmark with 300k 3D points
RANN::nn2(X, X, k = k, treetype = "kd", searchtype = "radius", radius = 2)
Check for duplicates 22 ms
nn2 with kd 1.0 s for k = 10
nn2 with bd 1.6 s for k = 10
nn2 with kd 1.6 s for k = 60
nn2 with bd 2.2 s for k = 60
Maybe I miss something. Maybe 300k is not big enough to take advantage of bd. In any case checking for duplicates is ways faster than the search. I made other tests and kdout performed bd every time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I check dupes with the
|
Hi, Just as some extra info, this issue pops up in another function that uses These are the steps I took
Perhaps I'm searching for duplicates the wrong way? Either way, we've switched to using |
You are search for duplicated over the 15 columns. I don't have a clue of what your data contains but I guess over the 15 columns there are no duplicates i.e not line with twice the same 15 numbers. The question is: is there duplicates over the 2 or 3 columns used as coordinates. |
So I tried a little debugging and it seems that the problem is in the recursive function building the bd tree i.e. library(RANN)
nn2(query=matrix(c(1,1),ncol=2), data=matrix(rep(2, 4),ncol=2), k=1L, treetype="bd")
|
the following line leaves my R process unresponsive, no segfault, no 100% cpu:
treetype="kd"
works fine.The text was updated successfully, but these errors were encountered: