|
| 1 | +## 概述 |
| 2 | + |
| 3 | +此项目是基于昇腾Atlas A3的融合算子库,当前项目中包括[SparseFlashAttention](./docs/custom-npu_sparse_flash_attention.md)和[LightningIndexer](./docs/custom-npu_lightning_indexer.md)两个算子。 |
| 4 | + |
| 5 | +## 目录结构说明 |
| 6 | + |
| 7 | +融合算子代码目录结构如下: |
| 8 | + |
| 9 | + ``` |
| 10 | + ├── cmake # 项目工程编译目录 |
| 11 | + ├── docs # 算子使用说明和资料 |
| 12 | + ├── examples # 算子的使用示例代码 |
| 13 | + ├── src # 算子的源代码 |
| 14 | + | ├── sparse_flash_attention # 推理SparseFlashAttention(简称sfa)算子示例代码 |
| 15 | + | | ├── op_host # 算子信息库、Tiling、InferShape相关实现目录 |
| 16 | + | | ├── op_kernel # 算子Kernel目录 |
| 17 | + | ├── lightning_indexer # 推理LightningIndexer(简称li)算子示例代码 |
| 18 | + | | ├── op_host # 算子信息库、Tiling、InferShape相关实现目录 |
| 19 | + | | ├── op_kernel # 算子Kernel目录 |
| 20 | + | |
| 21 | + ├── torch_ops_extension # torch_ops_extension目录 |
| 22 | + ├── custom_ops |
| 23 | + │ ├── csrc # 自定义算子适配层c++代码目录 |
| 24 | + │ └── converter # 自定义算子包python侧converter代码 |
| 25 | + ├── setup.py # wheel包编译文件 |
| 26 | + ├── build_and_install.sh # 自定义算子wheel包编译与安装脚本 |
| 27 | + | |
| 28 | + ├── build.sh # 项目工程编译脚本 |
| 29 | + ├── CMakeList.txt # 项目工程编译配置文件 |
| 30 | + ├── README.md |
| 31 | + ├── version.info # 项目版本信息 |
| 32 | + ``` |
| 33 | +昇腾社区Ascend C自定义算子开发资料:[Ascend C自定义算子开发](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha002/devguide/opdevg/ascendcopdevg/atlas_ascendc_10_0001.html) |
| 34 | + |
| 35 | + |
| 36 | +## 环境准备<a name="1"></a> |
| 37 | +### 下载源码 |
| 38 | + |
| 39 | + 执行如下命令下载 cann-recipes-infer 源码。 |
| 40 | + ```shell |
| 41 | + mkdir -p /home/code; cd /home/code/ |
| 42 | + git clone [email protected]:cann/cann-recipes-infer.git |
| 43 | + cd cann-recipes-infer |
| 44 | + ``` |
| 45 | + |
| 46 | +### 获取 docker 镜像 |
| 47 | + |
| 48 | + 从[ARM镜像地址](https://ascend-cann.obs.cn-north-4.myhuaweicloud.com/cann8.3.rc1.alpha002/pt2.5.1/aarch/ascendc/cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image.tar)中下载 docker 镜像,然后上传到A3服务器上,并通过命令导入镜像 `docker load -i cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image.tar`。 |
| 49 | + |
| 50 | +### 拉起 docker 容器 |
| 51 | + |
| 52 | + 通过如下脚本拉起容器,默认容器名为 cann_recipes_infer。 |
| 53 | + ``` |
| 54 | + docker run -u root -itd --name cann_recipes_infer --ulimit nproc=65535:65535 --ipc=host \ |
| 55 | + --device=/dev/davinci0 --device=/dev/davinci1 \ |
| 56 | + --device=/dev/davinci2 --device=/dev/davinci3 \ |
| 57 | + --device=/dev/davinci4 --device=/dev/davinci5 \ |
| 58 | + --device=/dev/davinci6 --device=/dev/davinci7 \ |
| 59 | + --device=/dev/davinci8 --device=/dev/davinci9 \ |
| 60 | + --device=/dev/davinci10 --device=/dev/davinci11 \ |
| 61 | + --device=/dev/davinci12 --device=/dev/davinci13 \ |
| 62 | + --device=/dev/davinci14 --device=/dev/davinci15 \ |
| 63 | + --device=/dev/davinci_manager --device=/dev/devmm_svm \ |
| 64 | + --device=/dev/hisi_hdc \ |
| 65 | + -v /home/:/home \ |
| 66 | + -v /data:/data \ |
| 67 | + -v /etc/localtime:/etc/localtime \ |
| 68 | + -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ |
| 69 | + -v /etc/ascend_install.info:/etc/ascend_install.info -v /var/log/npu/:/usr/slog \ |
| 70 | + -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ |
| 71 | + -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/sbin:/usr/local/sbin \ |
| 72 | + -v /etc/hccn.conf:/etc/hccn.conf -v /root/.pip:/root/.pip -v /etc/hosts:/etc/hosts \ |
| 73 | + -v /usr/bin/hostname:/usr/bin/hostname \ |
| 74 | + --net=host \ |
| 75 | + --shm-size=128g \ |
| 76 | + --privileged \ |
| 77 | + cann8.3.rc1.alpha002_pt2.5.1_dsv3.2_aarch_image:v0.1 /bin/bash |
| 78 | + ``` |
| 79 | + 通过如下命令进入容器: |
| 80 | + ``` |
| 81 | + docker attach cann_recipes_infer |
| 82 | + ``` |
| 83 | + |
| 84 | +### 设置环境变量 |
| 85 | + |
| 86 | + ```bash |
| 87 | + source /usr/local/Ascend/ascend-toolkit/set_env.sh |
| 88 | + ``` |
| 89 | + |
| 90 | +## 编译执行 |
| 91 | + |
| 92 | +### 自定义融合算子编译 |
| 93 | + |
| 94 | +执行如下命令编译所有自定义算子: |
| 95 | + |
| 96 | + ```bash |
| 97 | + cd /home/code/cann-recipes-infer/ops/ascendc |
| 98 | + bash build.sh |
| 99 | + ``` |
| 100 | + |
| 101 | +**说明:** |
| 102 | + |
| 103 | +若提示如下信息,则说明编译成功。 |
| 104 | + |
| 105 | + ``` |
| 106 | + Self-extractable archive "CANN-custom_ops-<cann_version>-linux.<arch>.run" successfully created. |
| 107 | + ``` |
| 108 | + |
| 109 | +编译成功后在 `output` 目录生成自定义算子包:`CANN-custom_ops-<cann_version>-linux.<arch>.run`。其中,\<cann_version>表示软件版本号,\<arch>表示操作系统架构。 |
| 110 | + |
| 111 | +### 自定义融合算子安装 |
| 112 | + |
| 113 | +安装前,需确保所安装的自定义算子包与所安装CANN开发套件包CPU架构一致,安装命令如下: |
| 114 | + |
| 115 | + ```bash |
| 116 | + cd /home/code/cann-recipes-infer/ops/ascendc/output |
| 117 | + chmod +x CANN-custom_ops-<cann_version>-linux.<arch>.run |
| 118 | + ./CANN-custom_ops-<cann_version>-linux.<arch>.run --quiet --install-path=/usr/local/Ascend/ascend-toolkit/latest/opp |
| 119 | + source /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/bin/set_env.bash |
| 120 | + ``` |
| 121 | + |
| 122 | +执行上述命令后,自定义融合算子对应的run包会安装到对应的CANN软件包目录:`/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/` |
| 123 | + |
| 124 | +### torch_ops_extension算子包编译与安装 |
| 125 | +编译与安装命令如下: |
| 126 | + ```shell |
| 127 | + cd /home/code/cann-recipes-infer/ops/ascendc/torch_ops_extension |
| 128 | + bash build_and_install.sh |
| 129 | + ``` |
| 130 | + |
| 131 | +编译成功后在 `dist` 目录生成自定义custom-ops算子包:`custom_ops-1.0-<python_version>-<python_version>-<arch>.whl`。其中,\<python_version>表示python版本号,\<arch>表示操作系统架构。 |
| 132 | + |
| 133 | + |
| 134 | +### examples用例运行 |
| 135 | +examples用例运行命令如下: |
| 136 | + ```shell |
| 137 | + cd /home/code/cann-recipes-infer/ops/ascendc/examples |
| 138 | + python3 test_npu_lightning_indexer.py |
| 139 | + python3 test_npu_sparse_flash_attention.py |
| 140 | + ``` |
0 commit comments