[RFC] add heterogeneous computing capabilities to UADK #638
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the current UADK framework, the hardware acceleration function and
the software acceleration functioIn the current UADK framework, the hardware acceleration function and
the software acceleration function are merged to ensure that the software
function of instruction acceleration and the hardware function of hardware
offload can run at the same time, thus providing users with stronger performance
Under the heterogeneous scheduling mode enabled in the current scheduler,
the test performance data is as follows:
Alg Mode(1KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 454 1322 100% 200.00%
init2(HW+CE) 1445.1 1864 100% 195.00%
increase 218.30% 41.00% 0.00% -2.50%
sm3 init1(HW) 153.1 1481 99% 199.80%
init2(HW+CE) 431.5 508 100% 199.80%
increase 181.84% -65.70% 0.91% 0.00%
Alg Mode(8KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 1407.5 9092 100% 198.00%
init2(HW+CE) 3626.8 6021 100% 199.80%
increase 157.68% -33.78% 0.00% 0.91%
sm3 init1(HW) 960.4 5161.1 100% 183.80%
init2(HW+CE) 549.6 530.1 100% 199.80%
increase -42.77% -89.73% -0.40% 8.71%
Without increasing the CPU usage, the performance improvement of the
synchronous mode is very huge.
In the asynchronous mode, the performance is reduced because the CPU is
used for soft calculations, which can be solved by creating dedicated
calculation threads later.