Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSMARCO training with SentenceTransformersTrainer instead of deprecated training scripts #3128

Open
sirCamp opened this issue Dec 10, 2024 · 1 comment

Comments

@sirCamp
Copy link

sirCamp commented Dec 10, 2024

Hi there, first thanks for you great work!
I'm wodering what is the correct approach to train on msmarco with a similar approach to train_bi-encoder_margin-mse.py, where both positive and negative are sampled diffrently everytime using the SentenceTransformersTrainer instead of the deprecated training method to be able to use the multigpu and a more structured approach.

I'm also wondering what is the exact procedure to use the evaluators when using accelerate or torch.distributed.

Thanks!

@tomaarsen
Copy link
Collaborator

Hello!
Apologies for the delay, I've been working on a release.

The exact approach from that script is tricky to reproduce, because Sentence Transformers now works with Dataset instances, with which it's trickier to fully change them up every epoch. Instead, you can now train with multiple negatives at a time (by creating a column for each, see the Loss Overview docs)
image

Or you can create a Dataset with all triplets, a bit like e.g. https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-tas-b/viewer/triplet-hard. You can use this one out of the box:

train_dataset = load_dataset(
    "sentence-transformers/msmarco-msmarco-distilbert-base-tas-b",
    "triplet-hard",
    split="train",
)

Regarding evaluator instances - sadly they simply don't work well on multi-GPU right now. They only run on process 0 during training, and if you want to run an evaluator prior to training, you could add if trainer.is_local_process_zero() so it only has to compute on one of the GPUs, but that won't make it quicker.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants