-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[misc][Long Context] feat: support ulysses for long context training #109
Conversation
Almost finished. I wonder what kind of examples shall we add? We can add some scripts in the next PR. |
Quick question @PeterSH6 - would this Ulysses PR supports gradient checkpointing? I'm trying to use context parallel implemented here for SFT, but I seems keep running into shape mismatch issue during |
Nevermind! I figured it out: it happens when you do |
@xingyaoww Cool! So you implemented Ulysses in the SFT trainer? |
@PeterSH6 yep! most changes are here (but a lot of unrelated changes as well, e.g. lora) I'm still testing it :) but so far it seems to work pretty well. Can send some PR later |
@xingyaoww It would be really nice. I've seen your LoRA PR. It looks great. |
FSDPUlyssesShardingManager
to manage the SP Parallel states of different models. And we utilize device mesh to manage the SP parallel groups.