Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trees for non-Metrics? #75

Closed
oxinabox opened this issue Sep 28, 2018 · 2 comments
Closed

Trees for non-Metrics? #75

oxinabox opened this issue Sep 28, 2018 · 2 comments

Comments

@oxinabox
Copy link

For NLP it is common to want to use CosineDist,
which is a SemiMetric.

This is not going to be compatible with the BallTree, I think.

but it should be fine with the BruteTree.

@davidbp
Copy link

davidbp commented Mar 4, 2023

This would come in handy for Clustering.jl, what do you think @KristofferC ? Currently cluster assigment is performed computing and storing all pairwise distances, which is quite bad in terms of memory (and it ends up beeing slower as well), it would be nice to use a BruteTree to get cluster assigments. Something similar to

using NearestNeighbors

function get_cluster_assignments_nearest_neighbors(
   X::Matrix{T}, 
   centers::Matrix{T}, 
   distance::SemiMetric=SqEuclidean(),       # in: function to calculate distance with
   ) where {F<:Function,T}

   brutetree = BruteTree(centers, distance)
   idx, distances = knn(brutetree, X, 1) 
   
   return idx
end

I asked for this in this PR JuliaStats/Clustering.jl#238 but the idea was to leverage something like Distances.jl or NearestNeighbors and not implement this within the package.

@KristofferC
Copy link
Owner

BruteTree is more or less "useless" and is only there to basically check results vs the other trees. I think this is a bit out of scope of the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants