Skip to content

Issues: microsoft/Megatron-DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Universal checkpoint for megatron
#453 opened Dec 7, 2024 by JiashuWu
Model conversion problem
#449 opened Sep 26, 2024 by yuanzhiyong1999
Async allreduce for tensor-parallel
#447 opened Sep 23, 2024 by drcanchi
llama3 and llama3.1 support
#443 opened Sep 10, 2024 by fmiao2372
MOE TFLOPS calculation
#398 opened Jun 5, 2024 by yingzhao27
why moe can not use zero3
#397 opened Jun 4, 2024 by kuangdao
about the optimizer param group
#387 opened May 17, 2024 by L-hongbin
Expert deepcopy raises PickleError
#380 opened Apr 23, 2024 by sxontheway
Pipeline parallelism + CPU offload?
#369 opened Mar 21, 2024 by webber26232
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.