Skip to content

Conversation

@xmfan
Copy link
Member

@xmfan xmfan commented Nov 11, 2025

Stacked PRs:


Log forward intermediates hashes w/pp vs w/o pp

Intended usage:

> torchrun --nproc-per-node=8 examples/example_ds3_pp.py --rng-seed=42; torchrun --nproc-per-node=4 examples/example_ds3_local_map.py --rng-seed 42
> diff out/0/fw_out.log out/1/fw_out.log 
--- out/0/fw_out.log    2025-11-11 21:00:06.005435737 -0800
+++ out/1/fw_out.log    2025-11-11 21:01:18.998207745 -0800
@@ -1,17 +1,17 @@
 hash=4581379079919370240, norm=214698.34375
-hash=18352098112790593536, norm=2016.0
-hash=9269111720570257408, norm=2016.0
-hash=9146071971375611904, norm=2016.0
-hash=9276605991825178624, norm=2016.0
-hash=9142025768585396224, norm=2016.0
-hash=18384115891391430656, norm=2016.0
-hash=18355862840604098560, norm=2016.0
-hash=9360063322419888128, norm=2016.0
-hash=18320150702933934080, norm=2016.0
-hash=9209896422344753152, norm=2016.0
-hash=21075438881210368, norm=2016.0
-hash=18326941286747078656, norm=2024.0
-hash=9134179653609586688, norm=2024.0
-hash=18431966637432242176, norm=2024.0
-hash=18356249868697075712, norm=2016.0
-hash=9121302173425074176, norm=2024.0
+hash=80466658967158784, norm=2016.0
+hash=9326427062702964736, norm=2016.0
+hash=9192198683184070656, norm=2016.0
+hash=9094632419381739520, norm=2016.0
+hash=9277907813592465408, norm=2016.0
+hash=13440430137933824, norm=2016.0
+hash=72902018968059904, norm=2016.0
+hash=18440621992966094848, norm=2016.0
+hash=9148605246166007808, norm=2016.0
+hash=6157265115545600, norm=2016.0
+hash=9358691131908423680, norm=2016.0
+hash=9360731825489575936, norm=2024.0
+hash=9127494622912708608, norm=2024.0
+hash=18348438938093355008, norm=2016.0
+hash=9250604740851531776, norm=2016.0
+hash=18346538982000558080, norm=2016.0

Currently, fw ins are the same, but the forward is being ran with different rng state between the two setups so there's some numerical differences

xmfan added a commit that referenced this pull request Nov 11, 2025
stack-info: PR: #246, branch: xmfan/stack/20
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2025
@xmfan xmfan changed the title Log forward intermediates hashes w/pp vs w/o pp Log forward intermediates/output hashes w/o pp Nov 11, 2025
@xmfan xmfan changed the base branch from xmfan/stack/19 to main November 12, 2025 00:04
xmfan added a commit that referenced this pull request Nov 12, 2025
stack-info: PR: #246, branch: xmfan/stack/20
@xmfan xmfan changed the title Log forward intermediates/output hashes w/o pp Log forward intermediates hashes w/pp vs w/o pp Nov 12, 2025
@xmfan xmfan changed the base branch from main to xmfan/stack/19 November 12, 2025 00:05
@xmfan xmfan changed the base branch from xmfan/stack/19 to main November 12, 2025 05:02
xmfan added a commit that referenced this pull request Nov 12, 2025
stack-info: PR: #246, branch: xmfan/stack/20
@xmfan xmfan changed the base branch from main to xmfan/stack/19 November 12, 2025 05:02
@xmfan xmfan changed the base branch from xmfan/stack/19 to main November 12, 2025 05:09
xmfan added a commit that referenced this pull request Nov 12, 2025
stack-info: PR: #246, branch: xmfan/stack/20
@xmfan xmfan changed the base branch from main to xmfan/stack/19 November 12, 2025 05:09
stack-info: PR: #246, branch: xmfan/stack/20
@xmfan xmfan changed the base branch from xmfan/stack/19 to main November 12, 2025 06:50
@xmfan xmfan changed the base branch from main to xmfan/stack/19 November 12, 2025 06:50
@xmfan xmfan marked this pull request as ready for review November 12, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants