Skip to content

[WIP][Platform] Add support for Ascend 310P #914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

farawayboat
Copy link

@farawayboat farawayboat commented May 21, 2025

What this PR does / why we need it?

  • This PR introduces changes to adapt the pos_encoding_kernels.cpp, utils.h, attention.py, layernorm.py, platform.py, and utils.py files to support Ascend 310P devices.
  • Specifically, it adjusts the loadSize constant based on the Ascend AI Core version and adds conditional compilation directives for bfloat16_t support.
  • It also includes modifications to handle specific behaviors required for the Ascend 310P, such as tensor alignment and format casting.
  • The purpose of these changes is to ensure compatibility and optimal performance on Ascend 310P devices.

Does this PR introduce any user-facing change?

  • Yes, this PR introduces changes that affect the behavior of the library when running on Ascend 310P devices.
  • Users of Ascend 310P will see improved performance and compatibility due to the added support and optimizations.

How was this patch tested?

The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended.

ENV information

npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0.1                                 Version: 24.1.0.1                                     |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 1536    310P3                 | OK              | NA           65                0     / 0             |
| 0       0                     | 0000:06:00.0    | 0            1524 / 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 1536    310P3                 | OK              | NA           64                17452 / 17452         |
| 1       1                     | 0000:06:00.0    | 0            36314/ 43693                            |
+===============================+=================+======================================================+
| 1792    310P3                 | OK              | NA           72                17636 / 17636         |
| 0       2                     | 0000:07:00.0    | 95           36982/ 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 1792    310P3                 | OK              | NA           68                0     / 0             |
| 1       3                     | 0000:07:00.0    | 0            1216 / 43693                            |
+===============================+=================+======================================================+
| 2048    310P3                 | OK              | NA           60                0     / 0             |
| 0       4                     | 0000:08:00.0    | 0            1411 / 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 2048    310P3                 | OK              | NA           57                0     / 0             |
| 1       5                     | 0000:08:00.0    | 0            1494 / 43693                            |
+===============================+=================+======================================================+
| 2304    310P3                 | OK              | NA           53                18200 / 18200         |
| 0       6                     | 0000:09:00.0    | 0            37812/ 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 2304    310P3                 | OK              | NA           49                18194 / 18194         |
| 1       7                     | 0000:09:00.0    | 0            37958/ 43693                            |
+===============================+=================+======================================================+

CANN, NNAL version: 8.1.RC1

Important

Because the current PTA 2.5.1 version cannot pass parameters in the NZ format as required when calling NNAL operators on 310P, we used a temporary debugging version provided by the PTA team for testing.

Code example

Build vllm-ascend from source code
# download source code as vllm-ascend
cd vllm-ascend
export SOC_VERSION=Ascend310P3
pip install -v -e .
cd ..
Run offline inference
from vllm import LLM, SamplingParams
prompts = ["水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。",
           "水的沸点是100摄氏度吗?请回答是或者否。", "若腋下体温为38摄氏度,请问这人是否发烧?请回答是或者否。"]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10)
# Create an LLM.
llm = LLM(
    model="Qwen/Qwen2.5-7B-Instruct",
    max_model_len=4096,
    max_num_seqs=4,
    dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P
    disable_custom_all_reduce=True,
    trust_remote_code=True,
    tensor_parallel_size=2,
    compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]},
)

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Signed-off-by: Vincent Yuan <[email protected]>
return new_tensor


def communication_adaptation_310p():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move patch func to vllm_ascend/patch module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants