You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently training a Sentence Transformer on my dataset using triplet loss, but I am encountering an issue where the gradient norm (grad_norm) is consistently 0.0 during training. This problem persists when using the recommended group_by_label batch sampler for triplet loss.
Details
Current Setup:
Model: Alibaba-NLP/gte-base-en-v1.5
Loss Function: Triplet Loss
Batch Sampler: group_by_label (recommended for triplet loss)
Observations
When I switch the batch sampler to either batch_sampler or no_duplicate, I notice an improvement in the training logs, and the grad_norm values become non-zero.
However, I want to utilize the group_by_label sampler as it is suggested for triplet loss, and I need assistance in understanding why this specific configuration is causing issues.
What could be causing the grad_norm to be 0.0 when using the group_by_label sampler?
Are there any adjustments or configurations you would recommend to resolve this issue while still using the recommended batch sampler?
Thank you!
The text was updated successfully, but these errors were encountered:
AmoghM
changed the title
grad_norm 0.0 while finetuning sentence transformer
grad_norm 0.0 while finetuning using group_by_label batch sampler
Dec 10, 2024
I am currently training a Sentence Transformer on my dataset using triplet loss, but I am encountering an issue where the gradient norm (
grad_norm
) is consistently 0.0 during training. This problem persists when using the recommendedgroup_by_label
batch sampler for triplet loss.Details
group_by_label
(recommended for triplet loss)Observations
batch_sampler
orno_duplicate
, I notice an improvement in the training logs, and thegrad_norm
values become non-zero.group_by_label
sampler as it is suggested for triplet loss, and I need assistance in understanding why this specific configuration is causing issues.Below is the sample code:
Tensorboard viz of training with different batch sampler. Orange line is
no_duplicate
. Blue line isgroup_label
. Red line isbatch_sampler
Training logs for no_duplicate batch sampler:
vs training logs for
group_by_label
batch sampler:Questions
grad_norm
to be 0.0 when using thegroup_by_label
sampler?Thank you!
The text was updated successfully, but these errors were encountered: