-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can MatchFinder/InferenceModel return distances for all classes? #718
Comments
Apologies, after some more work, I've realised I have misunderstood the InferenceModel and that it is returning the indices of a |
MatchFinder is only used in the If I understand correctly, your first snippet of code enables you to find the class_distances = {}
for label in unique_labels:
curr_subset = dataset[labels == label]
subset_embeddings = ...
knn = FaissKNN(reset_before=False, reset_after=False)
knn.train(subset_embeddings)
distances, _ = inference_model.get_nearest_neighbors(data, k=1)
class_distances[label] = distances[0][0]
# closest class is the key in class_distances with the smallest distance As for this issue:
I'm not sure what would cause it to return fewer than |
Thanks for the reply! I will give your code a shot. In the end, after realising I was misunderstanding the MatchFinder, I wrote something compares a sample to all embeddings and only takes the smallest distance:
The big drawback is of course this is very slow for large datasets, but it gives me the same accuracy as in training for the individual modalities, and a distance for every class. I will see if your example is a faster though and maybe the accuracy is comparable. As for the fewer than k results - I think maybe because it was pulling 213 neighbours, but from individual samples and not an entire class, it would overwrite the class name in the dictionary with a new distance. The batch size has no impact, so I think it was the former but I'll double check. |
As always, thanks again for the library and apologies for asking another question!
I was wondering if the MatchFinder or InferenceModel can return all of the distances for all classes or if it is expected that it will only return distances of 'matches'. I am using a late fusion technique which takes the distance of a query image to all classes and uses dynamic weighting to produce a more accurate prediction. I train two seperate modalities, one which uses the text and one only the images, to produce two different models, and use the distance of both to dynamically weight each modality for a query image.
Using class centers in this discussion, the dynamic weighting works extremely well and I can use the InferenceModel to return all the neighbours (e.g., a distance for the query image to all classes in the dataset). However, the accuracy of the individual modalities is different from in training (it is using the same dataset), though I expect this behaviour since I did not use the same class centre method in training. This code below returns distances for all classes:
MatchFinder will provide me with the same accuracy as in training. However, it will only return a certain number of classes and their distances, rather than the distances between the query and all classes, even if I set the number of neighbours to the total number of classes, modify the MatchFinder threshold, or remove the MatchFinder entirely. I'm not sure if this is expected behaviour but wanted to check or perhaps I'm misunderstanding the InferenceModel?
For example:
The above code will return only a few distances (32 as opposed to 213, which is the total number of classes) even when adjusting or removing the MatchFinder. Usually, I get between 20 - 32 distances rather than 213.
I would like to use the InferenceModel as it is above instead of calculating the class centers, as I can compare my fusion method to my original results. It seems the InferenceModel method gives comparable results to training which I expected, but the fusion method requires all classes to have a distance calculated which I can't seem to retrieve even when increasing the number of neighbours.
The text was updated successfully, but these errors were encountered: