Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Object (MO) UCX backend implementation (rebased over code restructuring PR) #30

Closed
wants to merge 2 commits into from

Conversation

artpol84
Copy link
Contributor

No description provided.

Isolate header files from compile-time dependencies
(may not be required if global config.h is distributed
along with header files)

Signed-off-by: Artem Y. Polyakov <[email protected]>
@artpol84
Copy link
Contributor Author

Backend Unit-test

[artemp@login01 test]$ ./ucx_mo_backend_test


****************************************************
    Inter-agent memory transfer test P-Thr=OFF
         (DRAM -> DRAM)
****************************************************


Synchronous handshake complete

READ test (10) iterations
        READ
                NOTE: Testing non-inline Transfer path!
                Data verification: OK
...
        READ
                NOTE: Testing non-inline Transfer path!
                Data verification: OK

WRITE test (10) iterations
        WRITE
                NOTE: Testing non-inline Transfer path!
                Data verification: OK
...
        WRITE
                NOTE: Testing non-inline Transfer path!
                Data verification: OK

READ/NOTIF test (10) iterations
        READ/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK
...
        READ/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK

WRITE/NOTIF test (10) iterations
        WRITE/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK
...
        WRITE/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK

Test genNotif operation
         gnNotif to Agent2
                Checking notification flow: OK
...
         gnNotif to Agent2
                Checking notification flow: OK


****************************************************
    Inter-agent memory transfer test P-Thr=ON
         (DRAM -> DRAM)
****************************************************


Synchronous handshake complete

READ test (10) iterations
        READ
                NOTE: Testing non-inline Transfer path!
                Data verification: OK
...
        READ
                NOTE: Testing non-inline Transfer path!
                Data verification: OK

WRITE test (10) iterations
        WRITE
                NOTE: Testing non-inline Transfer path!
                Data verification: OK
...
        WRITE
                NOTE: Testing non-inline Transfer path!
                Data verification: OK

READ/NOTIF test (10) iterations
        READ/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK
...
        READ/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK

WRITE/NOTIF test (10) iterations
        WRITE/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK
...
        WRITE/NOTIF
                NOTE: Testing non-inline Transfer path!
                Checking notification flow: OK
                Data verification: OK

Test genNotif operation
         gnNotif to Agent2
                Checking notification flow: OK

...
         gnNotif to Agent2
                Checking notification flow: OK

Add new UCX-based backend that allows associating NIXL
logical "devices" with different UCX workers.
The primary motivation is that UCX v1.18 doesn't
support more than one GPGPU per UCX context.

NOTE: this is expected to be fixed in UCX v1.19
so this backend might be viewed as a workaround
unless other uses will be found.

Signed-off-by: Artem Y. Polyakov <[email protected]>
@artpol84
Copy link
Contributor Author

nixl example (hacked)

diff --git a/test/nixl/agent_example.cpp b/test/nixl/agent_example.cpp
index c450814..59cd290 100644
--- a/test/nixl/agent_example.cpp
+++ b/test/nixl/agent_example.cpp
@@ -368,8 +368,8 @@ int main()
     printParams(init2);

     nixlBackendH* ucx1, *ucx2;
-    ret1 = A1.createBackend("UCX", init1, ucx1);
-    ret2 = A2.createBackend("UCX", init2, ucx2);
+    ret1 = A1.createBackend("UCX_MO", init1, ucx1);
+    ret2 = A2.createBackend("UCX_MO", init2, ucx2);

     assert (ret1 == NIXL_SUCCESS);
     assert (ret2 == NIXL_SUCCESS);
[artemp@login01 nixl]$ ./agent_example
Using plugin file: /global/home/users/artemp/nixl/nixl/build0/pluginlist
Found plugin: UCX at path: /global/home/users/artemp/nixl/nixl/build0/src/plugins/ucx/libplugin_UCX.so
Found plugin: UCX_MO at path: /global/home/users/artemp/nixl/nixl/build0/src/plugins/ucx/libplugin_UCX.so
Successfully loaded plugin 'UCX' version 1.0.0 from /global/home/users/artemp/nixl/nixl/build0/src/plugins/ucx/libplugin_UCX.so
Successfully loaded plugin 'UCX_MO' version 1.0.0 from /global/home/users/artemp/nixl/nixl/build0/src/plugins/ucx/libplugin_UCX.so
Registering built-in static plugins...
Available plugins:
UCX
UCX_MO
Params before init:
Parameters:
  ucx_devices =
Parameters:
  ucx_devices =
Plugin already loaded: UCX_MO
Plugin already loaded: UCX_MO
Params after init:
Parameters:
  ucx_devices =
Parameters:
  ucx_devices =
Agent1's Metadata: nixlSerDes|AgenAgent001|Conn|tUCX_MO|cp ����=\!&
        ��Q8�S�LY��6#��w!n�m
?�����Q8S�>K��77#��!&�
                      ��Q8Wt�NTv�6#����|
MemSection|nixlSecElm|bkndUCX_MO|nixlDList
nixlSDList|t|u|s||$`l�|$�n�|
Agent2's Metadata: nixlSerDes|AgenAgent002|Conn|tUCX_MO|cp �#@8�,!&
        ��Q8�S�LY��6#��!n�m
?�����Q8S�>K��77#���!&�
                       ��Q8Wt�NTv�6#��k��|
MemSection|nixlSecElm|bkndUCX_MO|nixlDList
nixlSDList|t|u|s||$pm�|
Transfer request from 0x9a6c60 to 0x9a6d70
Transfer was posted
Transfer verified
performing sideXferTest with backends 0x978910 0x9a27a0
Starting sideXferTest
Got backend
perf setup done
prepXferSide, total time for 10 iters: 1s 995447us
time per 2 preps 199545us
 src 0 0x9d3990
 dst 0 0x9d31e0
 src 1 0x9d4940
 dst 1 0x9d4d50
 src 2 0x9d5160
 dst 2 0x9ed2d0
 src 3 0x9ed6e0
 dst 3 0x9edaf0
Ready to prepare side
prep done, starting transfers
transfer 1 done
transfer 2 done
transfer 3 done
Performing local test
Local transfer was posted
Test done

get_plugin_version,
get_backend_options
};
#ifdef STATIC_PLUGIN_UCX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If plugin name is UCX_MO the flag here should be STATIC_PLUGIN_UCX_MO

@mkhazraee
Copy link
Contributor

Recreated in PR 58.

@mkhazraee mkhazraee closed this Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants