[WIP][Platform] Add support for Ascend 310P #914
+238
−28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it?
pos_encoding_kernels.cpp
,utils.h
,attention.py
,layernorm.py
,platform.py
, andutils.py
files to support Ascend 310P devices.loadSize
constant based on the Ascend AI Core version and adds conditional compilation directives forbfloat16_t
support.Does this PR introduce any user-facing change?
How was this patch tested?
The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended.
ENV information
CANN, NNAL version: 8.1.RC1
Important
Because the current PTA 2.5.1 version cannot pass parameters in the NZ format as required when calling NNAL operators on 310P, we used a temporary debugging version provided by the PTA team for testing.
Code example
Build vllm-ascend from source code
Run offline inference