Skip to content

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Nov 8, 2025

Stacked PRs:


Log weight hashes for DSv3 w/ pp vs w/o pp

Intended usage:

> torchrun --nproc-per-node=8 examples/example_ds3_pp.py --rng-seed=42; torchrun --nproc-per-node=4 examples/example_ds3_local_map.py --rng-seed=42

> diff out/0/pp_weights.log  out/1/weights.log 
--- out/0/pp_weights.log        2025-11-07 20:31:34.447960867 -0800
+++ out/1/weights.log   2025-11-07 20:32:52.499859593 -0800
@@ -60,12 +60,9 @@
 name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.0.moe.expert_bias' hash=DTensor(0)
 name='layers.0.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.1.moe.expert_bias' hash=DTensor(0)
 name='layers.1.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.2.moe.expert_bias' hash=DTensor(0)
 name='layers.2.moe.tokens_per_expert' hash=DTensor(0)
-name='freqs_cis' hash=DTensor(real=54976837666734080, imag=9351734845035773952))
 name='layers.3.moe.expert_bias' hash=DTensor(0)
 name='layers.3.moe.tokens_per_expert' hash=DTensor(0)

Current difference is due to model implementation, where the pp stages each have freqs_cis, but for the non-pp version there's only 1 freqs_cis buffer on the root model class

Remove the per_op logging since numerics aren't diff friendly yet.

xmfan added a commit that referenced this pull request Nov 8, 2025
stack-info: PR: #240, branch: xmfan/stack/18
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 8, 2025
stack-info: PR: #240, branch: xmfan/stack/18
@sanketpurandare
Copy link
Contributor

Current difference is due to model implementation, where the pp stages each have freqs_cis, but for the non-pp version there's only 1 freqs_cis buffer on the root model class

Is it difficult to have the same freq_cis for each pp_stage?

@xmfan
Copy link
Member Author

xmfan commented Nov 11, 2025

not difficult, but annoying if we want to reuse the exact same model code for both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants