Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute pairwise distances only once instead of each turn. #68

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AlexandreAbraham
Copy link

This is a proposition of optimization for the ranked batch method.
In the ranked batch methods, the distance between all unlabeled samples is computed at each iteration which is computationally prohibitive on large datasets. This modification simply update the minimum distance vector with each selected sample. Note that I do not remove the labelled sample from the vector to avoid memory reallocation and unnecessary complexity. However, one sample will not be selected twice as the distance to itself is 0.

I observe a difference in the performance on the ranked_batch example though.

Here is the performance history given master:

[0.3333333333333333, 0.8933333333333333, 0.84, 0.9266666666666666, 0.9333333333333333, 0.9466666666666667, 0.9533333333333334]

Here is the performance history on my branch:
[0.3333333333333333, 0.8733333333333333, 0.8733333333333333, 0.9333333333333333, 0.9333333333333333, 0.9466666666666667, 0.9733333333333334]

I am still investigating this. Any insight is welcome.

@cosmic-cortex
Copy link
Member

Hi! Thanks for the PR! I'll try to remove it soon and get back to you.

@cosmic-cortex
Copy link
Member

Just a quick update, I am still working on reviewing it, not sure if I understand everything correctly, so I need to spend more time with it.

I have also noticed that in my previous comment, I wrote "I'll try to remove it soon" :D I actually meant to say I'll try to review it soon. Sorry for the potential confusion :)

@AlexandreAbraham
Copy link
Author

Hey,
No problem, let me know of I can add comments to make it more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants