[Cherry-pick] Add doc images for auto-sharding (#6323) and Update spm…

…d.md with SPMD debug tool (#6358) (#6395) Co-authored-by: Yeounoh Chung <[email protected]>
pytorch · Jan 27, 2024 · 6fef86e · 6fef86e
1 parent 79fd3d9
commit 6fef86e
Show file tree

Hide file tree

Showing 7 changed files with 28 additions and 0 deletions.
diff --git a/docs/assets/gpt2_2b_step_time_vs_batch.png b/docs/assets/gpt2_2b_step_time_vs_batch.png
diff --git a/docs/assets/gpt2_v4_8_mfu_batch.png b/docs/assets/gpt2_v4_8_mfu_batch.png
diff --git a/docs/assets/llama2_2b_bsz128.png b/docs/assets/llama2_2b_bsz128.png
diff --git a/docs/assets/perf_auto_vs_manual.png b/docs/assets/perf_auto_vs_manual.png
diff --git a/docs/assets/spmd_debug_1.png b/docs/assets/spmd_debug_1.png
diff --git a/docs/assets/spmd_debug_2.png b/docs/assets/spmd_debug_2.png
diff --git a/docs/spmd.md b/docs/spmd.md
@@ -401,3 +401,31 @@ XLA_USE_SPMD=1 python test/spmd/test_train_spmd_imagenet.py --fake_data --batch_
 ```
 
 Note that I used a batch size 4 times as large since I am running it on a TPU v4 which has 4 TPU devices attached to it. You should see the throughput becomes roughly 4x the non-spmd run.
+
+### SPMD Debugging Tool
+
+We provide a `shard placement visualization debug tool` for PyTorch/XLA SPMD user on TPU/GPU/CPU with single-host/multi-host: you could use `visualize_tensor_sharding` to visualize sharded tensor, or you could use `visualize_sharding` to visualize sharing string. Here are two code examples on TPU single-host(v4-8) with `visualize_tensor_sharding` or `visualize_sharding`:
+- Code snippet used `visualize_tensor_sharding` and visualization result:
+```python
+import rich
+
+# Here, mesh is a 2x2 mesh with axes 'x' and 'y'
+t = torch.randn(8, 4, device='xla')
+xs.mark_sharding(t, mesh, ('x', 'y'))
+
+# A tensor's sharding can be visualized using the `visualize_tensor_sharding` method
+from torch_xla.distributed.spmd.debugging import visualize_tensor_sharding
+generated_table = visualize_tensor_sharding(t, use_color=False)
+```
+![alt_text](assets/spmd_debug_1.png "visualize_tensor_sharding example on TPU v4-8(single-host)")
+- Code snippet used `visualize_sharding` and visualization result:
+```python
+from torch_xla.distributed.spmd.debugging import visualize_sharding
+sharding = '{devices=[2,2]0,1,2,3}'
+generated_table = visualize_sharding(sharding, use_color=False)
+```
+![alt_text](assets/spmd_debug_2.png "visualize_sharding example on TPU v4-8(single-host")
+
+You could use these examples on TPU/GPU/CPU single-host and modify it to run on multi-host. And you could modify it to sharding-style `tiled`, `partial_replication` and `replicated`.
+
+