Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add heterogeneous computing capabilities to UADK #638

Closed
wants to merge 0 commits into from

Conversation

Liulongfang
Copy link
Collaborator

In the current UADK framework, the hardware acceleration function and
the software acceleration functioIn the current UADK framework, the hardware acceleration function and
the software acceleration function are merged to ensure that the software
function of instruction acceleration and the hardware function of hardware
offload can run at the same time, thus providing users with stronger performance

Under the heterogeneous scheduling mode enabled in the current scheduler,
the test performance data is as follows:

Alg Mode(1KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 454 1322 100% 200.00%
init2(HW+CE) 1445.1 1864 100% 195.00%
increase 218.30% 41.00% 0.00% -2.50%
sm3 init1(HW) 153.1 1481 99% 199.80%
init2(HW+CE) 431.5 508 100% 199.80%
increase 181.84% -65.70% 0.91% 0.00%

Alg Mode(8KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 1407.5 9092 100% 198.00%
init2(HW+CE) 3626.8 6021 100% 199.80%
increase 157.68% -33.78% 0.00% 0.91%
sm3 init1(HW) 960.4 5161.1 100% 183.80%
init2(HW+CE) 549.6 530.1 100% 199.80%
increase -42.77% -89.73% -0.40% 8.71%

Without increasing the CPU usage, the performance improvement of the
synchronous mode is very huge.
In the asynchronous mode, the performance is reduced because the CPU is
used for soft calculations, which can be solved by creating dedicated
calculation threads later.

@Liulongfang
Copy link
Collaborator Author

Liulongfang commented Nov 27, 2024

Performance test results of the new framework:

    SM3 1024B Performance(MB/s)                

tds------init1(HW)-----init1(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%

    SM4 1024B Performance(MB/s)                

tds-------init1(HW)----init1(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant