SimCSE supervised #2

Muennighoff · 2022-01-03T14:02:25Z

Did you try SimCSE's supervised training objective in-domain on USEB?
Would be interesting to compare to SBERT-supervised...!

kwang2049 · 2022-01-03T14:52:31Z

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Muennighoff · 2022-01-03T18:00:29Z

Hi @Muennighoff,

Yeah, we tried that. Actually what you said seems to be exactly SBERT-base-nli-v2, SBERT-base-nli-stsb-v2 (zero-shot models) and SBERT-supervised (in-domain supervised) in Table 2. All of them were trained with Mutiple-Negative-Ranking-Loss, which is equivalent to SimCSE's supervised objective. The description can be found in Section 5.1 Baseline Method in the paper. For the training code, one can refer to it here: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py.

Nice thanks for this! I hadn't realized SimCSE's supervised objective was equivalent to SBERT-base-nli-stsb-v2's objective
And it seems it's also equivalent to Contrastive Multiview Coding (https://arxiv.org/pdf/1906.05849.pdf) except they optionally take hard negatives from anywhere via a memory buffer, not just the current batch~

So ignoring that all the below are the same

SBERT with MultipleNegativesRankingLoss

SimCSE Supervised

Multiview Contrastive Coding

Muennighoff · 2022-01-04T14:59:16Z

Could you provide the training code for SBERT-supervised?
(I.e. the training on USEB)

kwang2049 · 2022-01-04T22:12:20Z

Hi @Muennighoff,

For AskUbuntu, CQADupStack and SciDocs (since all of them have only binary labels), one can follow the SBERT example examples/training/nli/training_nli_v2.py (with modification at line 79 to load each pair of gold paraphrase in USEB) and the labeled data can be loaded from data-train/${dataset_name}/supervised/train.org and train.para (each two parallel lines are corresponding to one pair of gold paraphrase);
For Twitter (since it has fine-grained labels), one can follow the SBERT example examples/training/sts/training_stsbenchmark.py and the labeled data can be loaded from data-train/twitter/supervised/train.s1 train.s2 and train.lbl (each three parallel lines are corresponding to sentence 1, sentence 2 and gold label for these two sentences).

For hyper-parameters, I trained all these SBERT-supervised models for 10 epochs, with 0.1 * #total steps of linear warmup and early-stopping on the dev score if possible. All the other hyper-parameters are used as the default setting in SentenceTransformer.fit.

If you have further questions about this, I can give you more hints:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimCSE supervised #2

SimCSE supervised #2

Muennighoff commented Jan 3, 2022

kwang2049 commented Jan 3, 2022

Muennighoff commented Jan 3, 2022

Muennighoff commented Jan 4, 2022

kwang2049 commented Jan 4, 2022 •

edited

Loading

SimCSE supervised #2

SimCSE supervised #2

Comments

Muennighoff commented Jan 3, 2022

kwang2049 commented Jan 3, 2022

Muennighoff commented Jan 3, 2022

Muennighoff commented Jan 4, 2022

kwang2049 commented Jan 4, 2022 • edited Loading

kwang2049 commented Jan 4, 2022 •

edited

Loading