-
Notifications
You must be signed in to change notification settings - Fork 236
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
BUG: early_step_in_backward with pipeline parallelism and len(model_parts) > 1
#777
opened Jan 7, 2025 by
cassanof
PP hangs when pipeline_parallel_microbatches < pipeline_parallel_degree
bug
Something isn't working
#775
opened Jan 6, 2025 by
cassanof
PP InterleavedZeroBubble schedule shows low TPS and high memory usage
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
CP hangs when degree is more than 8
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
PP-related issues
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
Can I load from non-FSDP optimizer state with FSDP2?
question
Further information is requested
#765
opened Dec 31, 2024 by
syncdoth
FSDP 2 doesn't pad tensors?
question
Further information is requested
#764
opened Dec 29, 2024 by
cassanof
Memory grows due to keeping losses on device
better_engineering
Repo code quality improvements
good first issue
Good for newcomers
#763
opened Dec 27, 2024 by
carmocca
Checkpoint conversion
question
Further information is requested
#758
opened Dec 20, 2024 by
MaxiBoether
[question]can't disable CP for specific (unsupported) SDPA op
context_parallel
enhancement
New feature or request
#757
opened Dec 20, 2024 by
FindDefinition
Any plans to support DPO training?
enhancement
New feature or request
#756
opened Dec 20, 2024 by
xs1997zju
JobConfig does not support typing
enhancement
New feature or request
#753
opened Dec 18, 2024 by
greeneggsandyaml
Model init with HuggingFace model
bug
Something isn't working
question
Further information is requested
#743
opened Dec 16, 2024 by
neeldani
Low bit Optimizers & FA-3
bug
Something isn't working
question
Further information is requested
#742
opened Dec 16, 2024 by
asahni04
using fsdp2 wrapper Flux(text to image) model , gradient is inconsistent with fsdp1
question
Further information is requested
#734
opened Dec 13, 2024 by
yanmj0601
Issue: Loss Discrepancy Between FSDP1 and FSDP2 with AdamW Optimizer
question
Further information is requested
#724
opened Dec 9, 2024 by
Teng-xu
Context parallelism understanding
context_parallel
question
Further information is requested
#723
opened Dec 9, 2024 by
jinsong-mao
First Shard Group Save and Load Checkpoint for HSDP
question
Further information is requested
#709
opened Nov 29, 2024 by
qsh-zh
[rfc] torchtitan release practices
release_blocking
Issues that are blocking the milestone / release completion
torch.compile(sync_float8_amax_and_scale_history) not working with triton latest main
bug
Something isn't working
#681
opened Nov 19, 2024 by
goldhuang
[Parallelism] Implement vocabulary parallelism
enhancement
New feature or request
#680
opened Nov 15, 2024 by
casper-hansen
Previous Next
ProTip!
Updated in the last three days: updated:>2025-01-04.