-
Notifications
You must be signed in to change notification settings - Fork 60
Random Notes
Benjamin Zaitlen edited this page Dec 16, 2021
·
4 revisions
Welcome to the ucx-py wiki!
UCX_PY_LOG_LEVEL=DEBUG # TRACE UCX_LOG_LEVEL=DEBUG # TRACE
UCX Memory optimization known issues. UCX-PY regularly sets this to n
-- toggles whether UCX library intercepts cualloc calls.
UCX_MEMTYPE_CACHE=n
UCX_RNDV_SCHEME=put_zcopy
- rc = ibv_post_send, ibv_post_recv, ibv_poll_cq
- cuda_copy = cuMemHostRegister, cuMemcpyAsync
- cuda_ipc = cuIpcCloseMemHandle , cuIpcOpenMemHandle, cuMemcpyAsync
- sockcm = connection management over sockets
- tcp = communication over TCP
UCX_RNDV_SCHEME=put_zcopy UCX_MEMTYPE_CACHE=n UCX_TLS=rc,cuda_copy,cuda_ipc
UCX_MEMTYPE_CACHE=n UCX_TLS=tcp,cuda_copy,sockcm UCX_SOCKADDR_TLS_PRIORITY=sockcm <SCRIPT>
UCX_MEMTYPE_CACHE=n UCX_TLS=tcp,cuda_copy,cuda_ipc,sockcm UCX_SOCKADDR_TLS_PRIORITY=sockcm <SCRIPT>
Benchmark send receive on one machine (UCX < 1.10):
UCX_TLS=tcp,sockcm,cuda_copy,cuda_ipc UCX_SOCKADDR_TLS_PRIORITY=sockcm python \
send-recv-core.py --server-dev 2 --client-dev 1 \
--object_type rmm --reuse-alloc --n-bytes 1GB
Benchmark send receive on one machine (UCX >= 1.10):
UCX_TLS=tcp,cuda_copy,cuda_ipc python send-recv-core.py \
--server-dev 2 --client-dev 1 --object_type rmm \
--reuse-alloc --n-bytes 1GB
Benchmark send receive on two machines (IB testing, UCX < 1.10):
# server process
UCX_NET_DEVICES=mlx5_0:1 UCX_TLS=tcp,sockcm,cuda_copy,rc \
UCX_SOCKADDR_TLS_PRIORITY=sockcm python send-recv-core.py \
--server-dev 0 --client-dev 5 --object_type rmm --reuse-alloc \
--n-bytes 1GB --server-only --port 13337 --n-iter 100
# client process
UCX_NET_DEVICES=mlx5_2:1 UCX_TLS=tcp,sockcm,cuda_copy,rc \
UCX_SOCKADDR_TLS_PRIORITY=sockcm python send-recv-core.py \
--server-dev 0 --client-dev 5 --object_type rmm --reuse-alloc \
--n-bytes 1GB --client-only --server-address SERVER_IP --port 13337 \
--n-iter 100
Benchmark send receive on two machines (IB testing, UCX >= 1.10):
# server process
UCX_MAX_RNDV_RAILS=1 UCX_TLS=tcp,cuda_copy,rc python send-recv-core.py \
--server-dev 0 --client-dev 5 --object_type rmm --reuse-alloc \
--n-bytes 1GB --server-only --port 13337 --n-iter 100
# client process
UCX_MAX_RNDV_RAILS=1 UCX_TLS=tcp,cuda_copy,rc python send-recv-core.py \
--server-dev 0 --client-dev 5 --object_type rmm --reuse-alloc \
--n-bytes 1GB --client-only --server-address SERVER_IP --port 13337 \
--n-iter 100