Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying number of distance evaluations #159

Open
fcdimitr opened this issue Jun 25, 2023 · 3 comments
Open

Querying number of distance evaluations #159

fcdimitr opened this issue Jun 25, 2023 · 3 comments
Labels

Comments

@fcdimitr
Copy link

Is there a way to determine the number of distance calculations performed during the kNN computation? The brute-force approach requires evaluating the distances between every query and corpus point, resulting in $n_q \times n_c$ calculations. However, methods like KDTree or BallTree can potentially reduce this number, especially in low-dimensional spaces.

Does the package provide a mechanism to retrieve the actual number of distance evaluations during the kNN computation?

Thank you!

@KristofferC
Copy link
Owner

That's a good idea!

There used to be a "debug" mode that you could activate that added some stats but it was removed in https://github.com/KristofferC/NearestNeighbors.jl/pull/98/files#diff-3d112593fe19c53f85f19846cc9d0406c17b39736378b65e7d12821d20207302.

With more experienced eyes, I think a better alternative is to allow one to pass something like a

struct EvaluationData
    n_distance_evals
    n_tree_intersection_evals
    ...
end

to inrange and knn and have that struct be populated. By default that argument would just be nothing and unused.

@fcdimitr
Copy link
Author

I made a minor modification on my fork to include a global counter:

master...fcdimitr:NearestNeighbors.jl:master

It works correctly, and some preliminary benchmarking shows that it does not affect the running time.

In any case, I agree that a more structured approach to collecting statistics should be preferred.

@KristofferC
Copy link
Owner

Another thing that can be used is GFlops.jl (requires the https://github.com/charleskawczynski/GFlops.jl/tree/ck/julia1.10 branch).

julia> tree = BallTree(rand(3,10^5));

julia> v = rand(3)
3-element Vector{Float64}:
 0.38248598898980024
 0.6378296367207168
 0.9255061078525778

julia> @count_ops knn(tree, v, 4)
Flop Counter: 9124 flop
┌──────┬─────────┐
│      │ Float64 │
├──────┼─────────┤
│  add │    2184 │
│  sub │    3300 │
│  mul │    2912 │
│ sqrt │     728 │
└──────┴─────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants