Skip to content

Commit c8c315a

Browse files
committed
Create sparse_flash_attention_service_vector_mla.h
1 parent 9fb6568 commit c8c315a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+5791
-1391
lines changed

csrc/dsa/.gitkeep

Whitespace-only changes.

csrc/dsa/CMakeLists.txt

Lines changed: 670 additions & 0 deletions
Large diffs are not rendered by default.

csrc/dsa/README.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
## 概述
2+
3+
此项目是基于昇腾Atlas A3的融合算子库,当前项目中包括[SparseFlashAttention](./docs/custom-npu_sparse_flash_attention.md)[LightningIndexer](./docs/custom-npu_lightning_indexer.md)两个算子。
4+
5+
## 目录结构说明
6+
7+
融合算子代码目录结构如下:
8+
9+
```
10+
├── cmake # 项目工程编译目录
11+
├── docs # 算子使用说明和资料
12+
├── examples # 算子的使用示例代码
13+
├── src # 算子的源代码
14+
| ├── sparse_flash_attention # 推理SparseFlashAttention(简称sfa)算子示例代码
15+
| | ├── op_host # 算子信息库、Tiling、InferShape相关实现目录
16+
| | ├── op_kernel # 算子Kernel目录
17+
| ├── lightning_indexer # 推理LightningIndexer(简称li)算子示例代码
18+
| | ├── op_host # 算子信息库、Tiling、InferShape相关实现目录
19+
| | ├── op_kernel # 算子Kernel目录
20+
|
21+
├── torch_ops_extension # torch_ops_extension目录
22+
├── custom_ops
23+
│ ├── csrc # 自定义算子适配层c++代码目录
24+
│ └── converter # 自定义算子包python侧converter代码
25+
├── setup.py # wheel包编译文件
26+
├── build_and_install.sh # 自定义算子wheel包编译与安装脚本
27+
|
28+
├── build.sh # 项目工程编译脚本
29+
├── CMakeList.txt # 项目工程编译配置文件
30+
├── README.md
31+
├── version.info # 项目版本信息
32+
```
33+
昇腾社区Ascend C自定义算子开发资料:[Ascend C自定义算子开发](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha002/devguide/opdevg/ascendcopdevg/atlas_ascendc_10_0001.html)
34+
35+
36+
## 环境准备<a name="1"></a>
37+
### 下载源码
38+
39+
执行如下命令下载 cann-recipes-infer 源码。
40+
```shell
41+
mkdir -p /home/code; cd /home/code/
42+
git clone [email protected]:cann/cann-recipes-infer.git
43+
cd cann-recipes-infer
44+
```
45+
46+
### 获取 docker 镜像
47+
48+
[ARM镜像地址](https://ascend-cann.obs.cn-north-4.myhuaweicloud.com/cann8.3.rc1.alpha002/pt2.5.1/aarch/ascendc/cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image.tar)中下载 docker 镜像,然后上传到A3服务器上,并通过命令导入镜像 `docker load -i cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image.tar`
49+
50+
### 拉起 docker 容器
51+
52+
通过如下脚本拉起容器,默认容器名为 cann_recipes_infer。
53+
```
54+
docker run -u root -itd --name cann_recipes_infer --ulimit nproc=65535:65535 --ipc=host \
55+
--device=/dev/davinci0 --device=/dev/davinci1 \
56+
--device=/dev/davinci2 --device=/dev/davinci3 \
57+
--device=/dev/davinci4 --device=/dev/davinci5 \
58+
--device=/dev/davinci6 --device=/dev/davinci7 \
59+
--device=/dev/davinci8 --device=/dev/davinci9 \
60+
--device=/dev/davinci10 --device=/dev/davinci11 \
61+
--device=/dev/davinci12 --device=/dev/davinci13 \
62+
--device=/dev/davinci14 --device=/dev/davinci15 \
63+
--device=/dev/davinci_manager --device=/dev/devmm_svm \
64+
--device=/dev/hisi_hdc \
65+
-v /home/:/home \
66+
-v /data:/data \
67+
-v /etc/localtime:/etc/localtime \
68+
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
69+
-v /etc/ascend_install.info:/etc/ascend_install.info -v /var/log/npu/:/usr/slog \
70+
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
71+
-v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/sbin:/usr/local/sbin \
72+
-v /etc/hccn.conf:/etc/hccn.conf -v /root/.pip:/root/.pip -v /etc/hosts:/etc/hosts \
73+
-v /usr/bin/hostname:/usr/bin/hostname \
74+
--net=host \
75+
--shm-size=128g \
76+
--privileged \
77+
cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image:v0.1 /bin/bash
78+
```
79+
通过如下命令进入容器:
80+
```
81+
docker attach cann_recipes_infer
82+
```
83+
84+
### 设置环境变量
85+
86+
```bash
87+
source /usr/local/Ascend/ascend-toolkit/set_env.sh
88+
```
89+
90+
## 编译执行
91+
92+
### 自定义融合算子编译
93+
94+
执行如下命令编译所有自定义算子:
95+
96+
```bash
97+
cd /home/code/cann-recipes-infer/ops/ascendc
98+
bash build.sh
99+
```
100+
101+
**说明:**
102+
103+
若提示如下信息,则说明编译成功。
104+
105+
```
106+
Self-extractable archive "CANN-custom_ops-<cann_version>-linux.<arch>.run" successfully created.
107+
```
108+
109+
编译成功后在 `output` 目录生成自定义算子包:`CANN-custom_ops-<cann_version>-linux.<arch>.run`。其中,\<cann_version>表示软件版本号,\<arch>表示操作系统架构。
110+
111+
### 自定义融合算子安装
112+
113+
安装前,需确保所安装的自定义算子包与所安装CANN开发套件包CPU架构一致,安装命令如下:
114+
115+
```bash
116+
cd /home/code/cann-recipes-infer/ops/ascendc/output
117+
chmod +x CANN-custom_ops-<cann_version>-linux.<arch>.run
118+
./CANN-custom_ops-<cann_version>-linux.<arch>.run --quiet --install-path=/usr/local/Ascend/ascend-toolkit/latest/opp
119+
source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/bin/set_env.bash
120+
```
121+
122+
执行上述命令后,自定义融合算子对应的run包会安装到对应的CANN软件包目录:`/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/`
123+
124+
### torch_ops_extension算子包编译与安装
125+
编译与安装命令如下:
126+
```shell
127+
cd /home/code/cann-recipes-infer/ops/ascendc/torch_ops_extension
128+
bash build_and_install.sh
129+
```
130+
131+
编译成功后在 `dist` 目录生成自定义custom-ops算子包:`custom_ops-1.0-<python_version>-<python_version>-<arch>.whl`。其中,\<python_version>表示python版本号,\<arch>表示操作系统架构。
132+
133+
134+
### examples用例运行
135+
examples用例运行命令如下:
136+
```shell
137+
cd /home/code/cann-recipes-infer/ops/ascendc/examples
138+
python3 test_npu_lightning_indexer.py
139+
python3 test_npu_sparse_flash_attention.py
140+
```

0 commit comments

Comments
 (0)