Clarification on Neural Network Architecture with TrainWithClassifier Class #655
-
Hello Kevin and the community! I have a question regarding the neural network architecture when using TrainWithClassifier with triplet margin loss. I understand it takes a triplet of an anchor, positive, and negative image as input during training. However, the library abstracts away a lot of these details so elegantly that I wanted to confirm my understanding :) If I were to draw a diagram of the neural network architecture, would it be correct to show the input as 3 images (anchor, positive, negative) that then pass through the CNN tracker, embedding layers, and classifier? Or does the library handle the triplet sampling under the hood in a way that the neural network itself just sees a batch of images as input? I would greatly appreciate any clarification or confirmation on the architecture. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
During both training and testing, the neural network just sees a batch of images. For each image, the neural network computes an embedding, independently of all the other images. You could have a batch size of 1, and the neural network wouldn't know the difference. The triplet construction happens only to compute the loss value during training. Given the embeddings of a batch, the loss function is calculated by comparing the distances between anchor embedding, positive embedding, and negative embedding. But this is after the embeddings for the batch are computed. So the steps are:
The addition of the classifier layer just means that the triplet loss is computed after the 2nd-last layer instead of the final layer, and the classification loss is computed at the final layer. |
Beta Was this translation helpful? Give feedback.
-
Hi Kevin, thanks for your detailed explanations. It is very clear. As per my understanding, Triplet Loss and Siamese Neural Networks can also be used to learn a good distance function for the dataset. Do you have any intuitions about the similarities and differences between using Triplet Loss + Siamese Neural Networks and Triplet Loss + 'normal' CNN ? |
Beta Was this translation helpful? Give feedback.
During both training and testing, the neural network just sees a batch of images. For each image, the neural network computes an embedding, independently of all the other images. You could have a batch size of 1, and the neural network wouldn't know the difference.
The triplet construction happens only to compute the loss value during training. Given the embeddings of a batch, the loss function is calculated by comparing the distances between anchor embedding, positive embedding, and negative embedding. But this is after the embeddings for the batch are computed.
So the steps are: