Logics in Firefly when handling the Alltoall(v) motif #2326

ziyuezzy · 2024-03-04T15:32:26Z

New Issue for sst-elements

1 - Detailed description of problem or enhancement

Hi,

As far as I undertood SST-Firelfy, this following method defines how Firefly executes EmberAlltoallMotif and EmberAlltoallvMotif:

sst-elements/src/sst/elements/firefly/funcSM/alltoallv.cc

Line 58 in 54843c2

void AlltoallvFuncSM::handleEnterEvent( Retval& retval )

In this method, the read request from one NIC (Irecv) are pipelined such that the next read request always need to wait for the completion of the previous one. The consequences that I observed (using debug output) from this are that, the network is likely to be idle during the waiting, and there are time gaps that the NICs do nothing.
However, is this really true for MPI alltoall collective?
"MPI_Alltoall" should contain independent point-to-point communications among all NICs, according to the official MPI doc (https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Alltoall.3.html @ 17.2.16.4. DESCRIPTION).

Therefore, I reckon that an NIC should send read requests to as many other NICs as possible at the same time, so that they are independent. Do you agree?

Thanks!
Best regards,
Z.

2 - Describe how to reproduce

run sst with /sst-elements/sst-elements-src/src/sst/elements/ember/tests/dragon_128_allreduce.py, but change the motif to 'Alltoall' or 'Alltoallv'.

3 - What Operating system(s) and versions

4 - What version of external libraries (Boost, MPI)

5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)

official latest repos

6 - Fill out Labels, Milestones, and Assignee fields as best possible
SST-Firefly; SST-Ember; enhancement; help_wanted

ziyuezzy · 2024-03-04T20:41:52Z

I found the implementation of 'alltoall' in openMPI: https://github.com/open-mpi/ompi/blob/main/ompi/mca/coll/basic/coll_basic_alltoall.c

It seems that it is true that the requests are sent at the same time and each rank waits for all replies at the same time. So I think this is a necessary (?) enhancement of SST-firefly, because some important motifs (such as FFT3D) are heavily based on alltoall and alltoallv.

ziyuezzy · 2024-05-14T07:51:57Z

An update on this issue:
I have tried to measure realistic OpenMPI MPI_Alltoall traffic, with four remote servers that are connected via ethernet. They communicate through TCP/IP, therefore I used tcp-dump to monitor the traffic among the nodes.
The result is surprisingly similar to what is obtained by sst-ember+firely+merlin simulation: the inter-node traffic is a shifting from the first diagonal to the last.

I wrote a python script to illustrate the inter-node traffic, the following two videos are therefore obtained:
The traffic that I monitored from SST simulator:
https://github.com/sstsimulator/sst-elements/assets/102291257/0f194c43-3215-4c5d-8a4f-244d7e52a1f9

The traffic that I monitored from OpenMPI real hardware test:
https://github.com/sstsimulator/sst-elements/assets/102291257/f93ac228-0047-4e5b-a209-ab8d5def5445

So this was surprising for me. Maybe there is a reason that the traffic is shifting in alltoall, but I don't really get the reason now.

TimJZ · 2024-07-09T20:44:55Z

New Issue for sst-elements

1 - Detailed description of problem or enhancement

Hi,

As far as I undertood SST-Firelfy, this following method defines how Firefly executes EmberAlltoallMotif and EmberAlltoallvMotif:

sst-elements/src/sst/elements/firefly/funcSM/alltoallv.cc

Line 58 in 54843c2

void AlltoallvFuncSM::handleEnterEvent( Retval& retval )

In this method, the read request from one NIC (Irecv) are pipelined such that the next read request always need to wait for the completion of the previous one. The consequences that I observed (using debug output) from this are that, the network is likely to be idle during the waiting, and there are time gaps that the NICs do nothing. However, is this really true for MPI alltoall collective? "MPI_Alltoall" should contain independent point-to-point communications among all NICs, according to the official MPI doc (https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Alltoall.3.html @ 17.2.16.4. DESCRIPTION).

Therefore, I reckon that an NIC should send read requests to as many other NICs as possible at the same time, so that they are independent. Do you agree?

Thanks! Best regards, Z.

2 - Describe how to reproduce

run sst with /sst-elements/sst-elements-src/src/sst/elements/ember/tests/dragon_128_allreduce.py, but change the motif to 'Alltoall' or 'Alltoallv'.

3 - What Operating system(s) and versions

4 - What version of external libraries (Boost, MPI)

5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)

official latest repos

6 - Fill out Labels, Milestones, and Assignee fields as best possible SST-Firefly; SST-Ember; enhancement; help_wanted

Hi ziyuezzy,

Thank you very much for your comments and sharing.

I'm new to SST and have been struggling to understand the allreduce motif execution process and its internal logics. One thing I did not find much documentation for is the "debug output" you mentioned in your previous reply. I wonder if there are any documentations on how to use it correctly? I would like to follow the execution trace in a more systematic way and I was hoping this can help (so far I've been using gdb). Any help would be greatly appreciated. Thanks!

ziyuezzy · 2024-07-10T07:25:34Z

New Issue for sst-elements

1 - Detailed description of problem or enhancement
Hi,
As far as I undertood SST-Firelfy, this following method defines how Firefly executes EmberAlltoallMotif and EmberAlltoallvMotif:

sst-elements/src/sst/elements/firefly/funcSM/alltoallv.cc

Line 58 in 54843c2

void AlltoallvFuncSM::handleEnterEvent( Retval& retval )

In this method, the read request from one NIC (Irecv) are pipelined such that the next read request always need to wait for the completion of the previous one. The consequences that I observed (using debug output) from this are that, the network is likely to be idle during the waiting, and there are time gaps that the NICs do nothing. However, is this really true for MPI alltoall collective? "MPI_Alltoall" should contain independent point-to-point communications among all NICs, according to the official MPI doc (https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Alltoall.3.html @ 17.2.16.4. DESCRIPTION).
Therefore, I reckon that an NIC should send read requests to as many other NICs as possible at the same time, so that they are independent. Do you agree?
Thanks! Best regards, Z.
2 - Describe how to reproduce
run sst with /sst-elements/sst-elements-src/src/sst/elements/ember/tests/dragon_128_allreduce.py, but change the motif to 'Alltoall' or 'Alltoallv'.
3 - What Operating system(s) and versions
4 - What version of external libraries (Boost, MPI)
5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)
official latest repos
6 - Fill out Labels, Milestones, and Assignee fields as best possible SST-Firefly; SST-Ember; enhancement; help_wanted

Hi ziyuezzy,

Thank you very much for your comments and sharing.

I'm new to SST and have been struggling to understand the allreduce motif execution process and its internal logics. One thing I did not find much documentation for is the "debug output" you mentioned in your previous reply. I wonder if there are any documentations on how to use it correctly? I would like to follow the execution trace in a more systematic way and I was hoping this can help (so far I've been using gdb). Any help would be greatly appreciated. Thanks!

Hi, the 'debug output' is documented in sst-core . You need to set the debug/verbose output level mask in the python config file, in order to get the corresponding information printed out.

TimJZ · 2024-07-11T02:04:40Z

Thank you for your fast response! I was able to track the process of NIC using the verbose level, but was not able to get that of the functionSM or ctrlMsg even though I saw there are debug information logged.

Here's my setup in the Python driver script:
`PlatformDefinition.setCurrentPlatform("firefly-defaults")
cur_platform = PlatformDefinition.getCurrentPlatform()
firefly_params = cur_platform.getParamSet("firefly")
firefly_params['verboseLevel'] = 10

functionsm_params = cur_platform.getParamSet("firefly.functionsm")
functionsm_params['verboseLevel'] = 10

ctrl_params = cur_platform.getParamSet("firefly.ctrl")
ctrl_params['verboseLevel'] = 10

nic_params = cur_platform.getParamSet("nic")
nic_params['verboseLevel'] = 10 `

And I was running sst --debug-file=FILE ember-merlin-example.py but was only able to see the NIC information. The FILE txt file was also empty. I wonder if you've run into this issue by any chance before.

Thank you so much for your help!

TimJZ · 2024-07-11T21:57:19Z

A little update: I think I figured what was wrong. When configuring the compilation, I only set the --enable-debug flag for elements in the beginning and only the NIC message was working. I tried it again with both core and elements and recompiled the project. Now the debugging feature is working fine. Thank you very much for the help!

ziyuezzy mentioned this issue Mar 4, 2024

Weird behavior of EmberAlltoall(v)Motif #2324

Closed

jwilso assigned mjleven, feldergast and jpkenny Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logics in Firefly when handling the Alltoall(v) motif #2326

Logics in Firefly when handling the Alltoall(v) motif #2326

ziyuezzy commented Mar 4, 2024

ziyuezzy commented Mar 4, 2024

ziyuezzy commented May 14, 2024

TimJZ commented Jul 9, 2024

New Issue for sst-elements

ziyuezzy commented Jul 10, 2024

New Issue for sst-elements

TimJZ commented Jul 11, 2024

TimJZ commented Jul 11, 2024

Logics in Firefly when handling the Alltoall(v) motif #2326

Logics in Firefly when handling the Alltoall(v) motif #2326

Comments

ziyuezzy commented Mar 4, 2024

New Issue for sst-elements

ziyuezzy commented Mar 4, 2024

ziyuezzy commented May 14, 2024

TimJZ commented Jul 9, 2024

New Issue for sst-elements

ziyuezzy commented Jul 10, 2024

New Issue for sst-elements

TimJZ commented Jul 11, 2024

TimJZ commented Jul 11, 2024