You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I apologize in advance if this is a nonsensical issue. I fear that it might be as nobody else seems to have an issue with it. Regardless, I'll ask just in case.
I am working with the MultiSimilarityMiner to select hard negative pairs for Contrastive Learning. When attempting to tune hyperparameters of the model I noted that aggressive changes to the epsilon did not reduce the number of hard pairs to always be none. The documentation states the following about the epsilon parameter: "Positive pairs are chosen if they have similarity less than the hardest negative pair, plus this margin (epsilon). " For me, this reads to say that increases to the epsilon parameter will result in pairs which are an "epsilon" less similar than the hardest negative pair. The documentation also states that the default distance function for MultiSimilarityMiner is CosineSimilarity.
I debugged the issue into pytorch-metric-learning source and found the code block which is responsible for identifying the the hard mined samples in "multi-similarity-miner.py", lines 43-56.
According to my interpretation of the docs (increases to epsilon result in harder mined pairs), the expected condition would be in the else portion of this if-else statement.
"self.distance" is set by default to CosineSimilarity, which inherits from the BaseDistance class. The CosineSimilarity object is required have the is_inverted flag set to True (which makes sense based on the operation of class member functions like "smallest_dist". When using CosineSimilarity as the distance metric we will always drop into the first condition of this if-else statement. In english, the interpretation of the torch.where call is "select any positive sample which is less similar than the least similar negative sample, minus epsilon". This is causing increases to the epsilon parameter to select easier samples rather than harder samples.
I don't have a dog in the fight about whether or not epsilon should be making the mined samples more or less hard. I will also happily admit defeat if I have done this analysis wrong. Just trying to be helpful! Massively grateful for all of the hard work that has been done to make this awesome library. Cheers!
Epsilon set to 10
The first value in this row of pos_sorted is a candidate value (not set to infinity)
After subtracting epsilon, the value becomes less similar than any negative sample could possibly be (cosine similarity produces values between -1 and 1).
The cosine similarity of the corresponding negative sample
The operation results in this sample being selected as a hard positive, despite an epsilon of 10. I believe the correct functionality would be for this hard positive to be filtered out.
The text was updated successfully, but these errors were encountered:
I apologize in advance if this is a nonsensical issue. I fear that it might be as nobody else seems to have an issue with it. Regardless, I'll ask just in case.
I am working with the MultiSimilarityMiner to select hard negative pairs for Contrastive Learning. When attempting to tune hyperparameters of the model I noted that aggressive changes to the epsilon did not reduce the number of hard pairs to always be none. The documentation states the following about the epsilon parameter: "Positive pairs are chosen if they have similarity less than the hardest negative pair, plus this margin (epsilon). " For me, this reads to say that increases to the epsilon parameter will result in pairs which are an "epsilon" less similar than the hardest negative pair. The documentation also states that the default distance function for MultiSimilarityMiner is CosineSimilarity.
I debugged the issue into pytorch-metric-learning source and found the code block which is responsible for identifying the the hard mined samples in "multi-similarity-miner.py", lines 43-56.
According to my interpretation of the docs (increases to epsilon result in harder mined pairs), the expected condition would be in the else portion of this if-else statement.
"self.distance" is set by default to CosineSimilarity, which inherits from the BaseDistance class. The CosineSimilarity object is required have the is_inverted flag set to True (which makes sense based on the operation of class member functions like "smallest_dist". When using CosineSimilarity as the distance metric we will always drop into the first condition of this if-else statement. In english, the interpretation of the torch.where call is "select any positive sample which is less similar than the least similar negative sample, minus epsilon". This is causing increases to the epsilon parameter to select easier samples rather than harder samples.
I don't have a dog in the fight about whether or not epsilon should be making the mined samples more or less hard. I will also happily admit defeat if I have done this analysis wrong. Just trying to be helpful! Massively grateful for all of the hard work that has been done to make this awesome library. Cheers!
Epsilon set to 10
The first value in this row of pos_sorted is a candidate value (not set to infinity)
After subtracting epsilon, the value becomes less similar than any negative sample could possibly be (cosine similarity produces values between -1 and 1).
The cosine similarity of the corresponding negative sample
The operation results in this sample being selected as a hard positive, despite an epsilon of 10. I believe the correct functionality would be for this hard positive to be filtered out.
The text was updated successfully, but these errors were encountered: