Skip to content

GPU Memory Requirement for Stage2 #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LaVieEnRoseSMZ opened this issue Mar 11, 2025 · 3 comments
Open

GPU Memory Requirement for Stage2 #19

LaVieEnRoseSMZ opened this issue Mar 11, 2025 · 3 comments

Comments

@LaVieEnRoseSMZ
Copy link

Hi, Thank you for your great work! I’m wondering about the GPU memory requirements for the second stage. I’m encountering an OOM issue when training at 1080p resolution on an 80GB A100. Do you have any recommendations to optimize memory usage?

Also, I’m looking forward to any updates on the training code.

@jshilong
Copy link
Collaborator

Thank you for your interest in our work.
Training with a single GPU is not sufficient, as the parameters need to be distributed across multiple GPUs during training to save memory usage.

I will organize and release the training code soon; currently, it is on another branch , but you are welcome to use it as a reference for now.
#12

@LaVieEnRoseSMZ
Copy link
Author

Hi, thanks for your prompt response! I understand the idea of distributing different models across multiple GPUs. In this case, would multiple 80GB A100 still be feasible, or is H100 necessary for training? I assume running 1080p resolution on the transformer — even after offloading other parts to different GPUs — still demands significant memory.

I’ve already incorporated the latest pull request into my implementation. I really appreciate the continued work and look forward to the upcoming release!

@jshilong
Copy link
Collaborator

There should be no difference in training between the A100 and H100, as both have 80GB of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants