Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimCSE supervised #2

Open
Muennighoff opened this issue Jan 3, 2022 · 4 comments
Open

SimCSE supervised #2

Muennighoff opened this issue Jan 3, 2022 · 4 comments

Comments

@Muennighoff
Copy link

Did you try SimCSE's supervised training objective in-domain on USEB?
Would be interesting to compare to SBERT-supervised...!

@kwang2049
Copy link
Member

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

@Muennighoff
Copy link
Author

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Nice thanks for this! I hadn't realized SimCSE's supervised objective was equivalent to SBERT-base-nli-stsb-v2's objective
And it seems it's also equivalent to Contrastive Multiview Coding (https://arxiv.org/pdf/1906.05849.pdf) except they optionally take hard negatives from anywhere via a memory buffer, not just the current batch~

So ignoring that all the below are the same

SBERT with MultipleNegativesRankingLoss
Screenshot 2022-01-03 at 18 52 04

SimCSE Supervised
image

Multiview Contrastive Coding
Screenshot 2022-01-03 at 18 51 12

@Muennighoff
Copy link
Author

Could you provide the training code for SBERT-supervised?
(I.e. the training on USEB)

@kwang2049
Copy link
Member

kwang2049 commented Jan 4, 2022

Hi @Muennighoff,

  • For AskUbuntu, CQADupStack and SciDocs (since all of them have only binary labels), one can follow the SBERT example examples/training/nli/training_nli_v2.py (with modification at line 79 to load each pair of gold paraphrase in USEB) and the labeled data can be loaded from data-train/${dataset_name}/supervised/train.org and train.para (each two parallel lines are corresponding to one pair of gold paraphrase);
  • For Twitter (since it has fine-grained labels), one can follow the SBERT example examples/training/sts/training_stsbenchmark.py and the labeled data can be loaded from data-train/twitter/supervised/train.s1 train.s2 and train.lbl (each three parallel lines are corresponding to sentence 1, sentence 2 and gold label for these two sentences).

For hyper-parameters, I trained all these SBERT-supervised models for 10 epochs, with 0.1 * #total steps of linear warmup and early-stopping on the dev score if possible. All the other hyper-parameters are used as the default setting in SentenceTransformer.fit.

If you have further questions about this, I can give you more hints:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants