To test your .xclbin
on XACC do the following:
-
copy on XACC some of the ACCL directories maintaining the same structure as in the repo.
ssh alveo0 mkdir ACCL mkdir -p ACCL/demo/build/tcp_u280 mkdir -p ACCL/demo/host/measurements/accl exit scp -r ACCL/demo/build/link*/ccl_offload.* alveo0:~/ACCL/demo/build/tcp_u280 scp -r ACCL/driver alveo0:~/ACCL/ scp -r ACCL/demo/host alveo0:~/ACCL/demo/
-
login on one XACC node and go in the ACCL/demo/host
-
modify
run_test_tcp.sh
to target a number of XACC#alveo3b 10.1.212.126 #alveo3c 10.1.212.127 #alveo4b 10.1.212.129 #alveo4c 10.1.212.130 mpiexec --host 10.1.212.123,10.1.212.127,10.1.212.129,10.1.212.130 -- bash mpiscript.sh
-
login to those hosts copying your ssh id
ssh-copy-id 10.1.212.123 ssh-copy-id 10.1.212.127 ssh-copy-id 10.1.212.129 ssh-copy-id 10.1.212.130
-
when you run
run_test_tcp.sh
it will executempiscript.sh
that will recall the python test script with the parameters. Example ofmpiscript.sh
source /opt/xilinx/xrt/setup.sh source /opt/tools/Xilinx/Vitis/2020.2/.settings64-Vitis.sh source /opt/tools/external/anaconda/bin/activate pynq-dask #python test_mpi4py.py cd ~/ACCL/demo/host python test_tcp_cmac_seq_mpi.py --xclbin ../build/tcp_u280_debug/ccl_offload.xclbin --device 0 --nruns 10 --segment_size 1024 --bsize 512 1024 2048 --send --bcast --gather --allgather --allreduce --reduce
-
the
test_tcp_cmac_seq_mpi.py
will create a .csv
under./measurements/accl
that logs the execution time for the collectives