Skip to content

Commit

Permalink
fixed tables
Browse files Browse the repository at this point in the history
  • Loading branch information
ASKabalan committed Jul 22, 2024
1 parent 6993ea8 commit 9688447
Showing 1 changed file with 11 additions and 5 deletions.
16 changes: 11 additions & 5 deletions joss-paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,13 @@ While it is technically feasible to implement distributed FFTs using native JAX,

The following steps outline the distributed FFT algorithm in jaxDecomp, which uses 2D domain decomposition to distribute 3D data across GPUs.

| Steps | Local Operation | Global Operation |
|------------------|-------------------------------------------------|--------------------------------------------------------------------------------|
| FFT along X | Perform batched FFT along the X axis. | - |
| Steps | Local Operation | Global Operation |
|------------------|-------------------------------------------------|--------------------------------------------------------------------------------------------|
| FFT along X | Perform batched FFT along the X axis. | - |
| Transpose X to Y | Local transpose to $Y \times X \times Z$ | All-to-all communication to concatenate $Y$: $Y \times \frac{X}{P_y} \times \frac{Z}{P_z}$ |
| FFT along Y | Perform batched FFT along the Y axis. | - |
| FFT along Y | Perform batched FFT along the Y axis. | - |
| Transpose Y to Z | Local transpose to $Z \times X \times Y$ | All-to-all communication to concatenate $Z$: $Z \times \frac{X}{P_z} \times \frac{Y}{P_y}$ |
| FFT along Z | Perform batched FFT along the Z axis. | - |
| FFT along Z | Perform batched FFT along the Z axis. | - |



Expand All @@ -84,6 +84,12 @@ Using cuDecomp, we can also change the communication backend to `NCCL`, `MPI`, o

For each axis, a slice of data of size equal to the halo extent is exchanged between neighboring subdomains.

| Send | Receive |
|--------------------------------------------------|-----------------------------------------------------|
| $[ \text{Size} - 2 \times \text{Halo} \rightarrow \text{Size} - \text{Halo} ]$ is sent to the next slice | $[ 0 \rightarrow \text{Halo} ]$ is received from the previous slice |
| $[ \text{Halo} \rightarrow 2 \times \text{Halo} ]$ is sent to the previous slice | $[ \text{Size} - \text{Halo} \rightarrow \text{Size} ]$ is received from the next slice |


![Visualization of the distributed halo exchange process in jaxDecomp](assets/halo-exchange.svg)

### Efficient State Management in jaxDecomp
Expand Down

0 comments on commit 9688447

Please sign in to comment.