-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #188 from zifeng-radxa/main
docs: add Llama3 docs with zh/en
- Loading branch information
Showing
6 changed files
with
346 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
Llama3 ChatBot-TPU 是使用 Sophon SDK 将 Meta 开源 [Llama3](https://ai.meta.com/blog/meta-llama-3/) 模型移植到 SG2300X 芯片系列产品上, 使其能利用本地 TPU 进行硬件加速推理,并使用 Gradio 设计成聊天机器人, 用户可以向其询问一些实际问题 | ||
|
||
## Llama3 部署 | ||
|
||
- 克隆仓库 | ||
|
||
```bash | ||
git clone https://github.com/zifeng-radxa/LLM-TPU.git | ||
``` | ||
|
||
- 打开 Llama3 项目路径 | ||
```bash | ||
cd LLM-TPU/models/Llama3/python_demo | ||
``` | ||
- 本案例提供 Llama-3-8B-Instruct 4bit 量化模型 llama3-8b_int4_1dev_512.bmodel 与 C++ 预编译文件下载 | ||
|
||
用户可以参考 [Llama3 模型转换](#llama3-模型转换)自行转换不同量化方式的 Llama3 模型 | ||
|
||
用户可以参考 [Llama3 cpython 文件编译](#llama3-cpython-文件编译) 自行编译 cpython 接口绑定文件 | ||
|
||
```bash | ||
# llama3-8b_int4_1dev_512.bmodel | ||
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh | ||
bash tar_downloader.sh | ||
tar -xvf llama3-8b_int4_1dev_512.tar.gz | ||
``` | ||
|
||
- 配置环境 | ||
|
||
**必须创建虚拟环境,否则可能会影响其他应用的正常运行**, 虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage) | ||
|
||
```bash | ||
python3 -m virtualenv .venv | ||
source .venv/bin/activate | ||
``` | ||
|
||
- 安装依赖包 | ||
|
||
```bash | ||
pip3 install --upgrade pip | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
- 导入环境变量 | ||
|
||
请使用 ldd 命令检查 chat.cpython-38-aarch64-linux-gnu.so 链接的 libbmlib.so 的路径是否为 `LLM-TPU/support/lib_soc/libbmlib.so` | ||
|
||
如 `libbmlib.so` 链接路径有误可运行下面的命令 | ||
|
||
```bash | ||
export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH | ||
``` | ||
|
||
- 启动 Llama3 | ||
|
||
**终端模式** | ||
|
||
```bash | ||
python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config | ||
``` | ||
|
||
`-m`: 指定模型路径 | ||
|
||
`-t`: 指定 token_config 文件夹路径 | ||
|
||
<img src="/img/general-tutorial/tpu_ai/llama3_pipeline.webp" /> | ||
|
||
**Gradio 模式** | ||
|
||
```bash | ||
python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config | ||
``` | ||
|
||
`-m`: 指定模型路径 | ||
|
||
`-t`: 指定 token_config 文件夹路径 | ||
浏览器访问 Airbox ip 地址的 8003 端口 | ||
|
||
{" "} | ||
|
||
<img src="/img/general-tutorial/tpu_ai/llama3_web_demo.webp" /> | ||
|
||
## Llama3 模型转换 | ||
|
||
用户可以参考本文自行转换不同量化类型的 Llama3 模型到 bmodel | ||
|
||
- X86 工作站中准备环境 | ||
|
||
请参考 [TPU-MLIR 安装](../../model-compile/tpu_mlir_env) 配置 TPU-MLIR 环境 | ||
克隆仓库 | ||
|
||
```bash | ||
git clone https://github.com/zifeng-radxa/LLM-TPU.git | ||
``` | ||
|
||
- 通过 [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main) 填写申请表下载 Llama3 开源模型 | ||
|
||
- 在工作目录 `LLM-TPU/models/Llama3` 中创建虚拟环境 | ||
|
||
虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage) | ||
|
||
```bash | ||
python3 -m virtualenv .venv | ||
source .venv/bin/activate | ||
pip3 install --upgrade pip | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
- 对齐模型环境 | ||
|
||
将 `LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` 复制到 transformers 库中, | ||
注意此时 transformers 库应在 .venv 里 | ||
|
||
```bash | ||
cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py | ||
``` | ||
|
||
同时将 `./compile/files/Meta-Llama-3-8B-Instruct/config.json` 替换下载好的 Llama-3-8B-Instruct 路径下的同名文件。 | ||
|
||
- 生成 onnx 文件 | ||
|
||
```bash | ||
cd compile | ||
python export_onnx.py --model_path your_model_path --seq_length 512 | ||
``` | ||
|
||
`--model_path`: 是下载的 meta llama3 文件夹路径 | ||
|
||
`--seq_length`: 是固定导出的 sequence length, 根据需要可选 512, 1024,2048 等长度 | ||
|
||
- 生成 bmodel 文件 | ||
|
||
生成 bmodel 之前需要退出虚拟环境 | ||
|
||
```bash | ||
deactivate | ||
``` | ||
|
||
编译模型 | ||
|
||
```bash | ||
./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8 | ||
``` | ||
|
||
`--mode`:量化模式,可选 int4, int8 | ||
|
||
`--seq_length`: 序列长度,需和生成 onnx 文件时指定相同 seq_length | ||
|
||
`--name`: 模型名称,此处必须为 llama3-8b | ||
|
||
生成 bmodel 耗时大概 2 小时以上,建议 64G 内存以及 200G 以上硬盘空间,不然很可能 OOM 或者 no space left | ||
|
||
## Llama3 cpython 文件编译 | ||
|
||
在 Airbox 中编译可执行文件, 预编译文件已经包含在 llama3-8b_int4_1dev_512.tar.gz 下载包中,如已下载无需编译 | ||
|
||
```bash | ||
cd python_demo | ||
mkdir build | ||
cd build | ||
cmake .. | ||
make | ||
cp *chat* .. | ||
``` |
9 changes: 9 additions & 0 deletions
9
docs/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
sidebar_position: 11 | ||
--- | ||
|
||
# Llama3 Chatbot-TPU | ||
|
||
import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx'; | ||
|
||
<Chatbotllama3 /> |
164 changes: 164 additions & 0 deletions
164
i18n/en/docusaurus-plugin-content-docs/current/common/ai/_chatbot_llama3.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
Llama3 ChatBot-TPU uses the Sophon SDK to port Meta's open-source [Llama3](https://ai.meta.com/blog/meta-llama-3/) model to the SG2300X series chips. This enables hardware-accelerated inference using local TPU and designs it into a chatbot with Gradio, allowing users to ask practical questions. | ||
|
||
## Llama3 Deployment | ||
|
||
- Clone the repository | ||
|
||
```bash | ||
git clone https://github.com/zifeng-radxa/LLM-TPU.git | ||
``` | ||
|
||
- Open the Llama3 project directory | ||
|
||
```bash | ||
cd LLM-TPU/models/Llama3/python_demo | ||
``` | ||
|
||
- This example provides the Llama-3-8B-Instruct 4-bit quantized model `llama3-8b_int4_1dev_512.bmodel` and C++ precompiled files for download. | ||
|
||
Users can refer to [Llama3 Model Conversion](#llama3-model-conversion) to convert Llama3 models to different quantization methods. | ||
|
||
Users can refer to [Llama3 Cpython File Compilation](#llama3-cpython-file-compilation) to compile the cpython interface binding files themselves. | ||
|
||
```bash | ||
# llama3-8b_int4_1dev_512.bmodel | ||
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh | ||
bash tar_downloader.sh | ||
tar -xvf llama3-8b_int4_1dev_512.tar.gz | ||
``` | ||
|
||
- Configure the environment | ||
|
||
**A virtual environment must be created to avoid affecting the normal operation of other applications.** For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage). | ||
|
||
```bash | ||
python3 -m virtualenv .venv | ||
source .venv/bin/activate | ||
``` | ||
|
||
- Install dependencies | ||
|
||
```bash | ||
pip3 install --upgrade pip | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
- Import environment variables | ||
|
||
Use the `ldd` command to check if the `chat.cpython-38-aarch64-linux-gnu.so` is linked to the `libbmlib.so` at `LLM-TPU/support/lib_soc/libbmlib.so`. | ||
|
||
If the `libbmlib.so` link path is incorrect, run the following command: | ||
|
||
```bash | ||
export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH | ||
``` | ||
|
||
- Start Llama3 | ||
|
||
**Terminal Mode** | ||
|
||
```bash | ||
python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config | ||
``` | ||
|
||
`-m`: Specify the model path | ||
|
||
`-t`: Specify the token_config folder path | ||
|
||
<img src="../../../../img/general-tutorial/tpu_ai/llama3_pipeline.webp" /> | ||
|
||
**Gradio Mode** | ||
|
||
```bash | ||
python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config | ||
``` | ||
|
||
`-m`: Specify the model path | ||
|
||
`-t`: Specify the token_config folder path | ||
|
||
Access the Airbox IP address at port 8003 in your browser. | ||
|
||
<img src="../../../../img/general-tutorial/tpu_ai/llama3_web_demo.webp" /> | ||
|
||
## Llama3 Model Conversion | ||
|
||
Users can refer to this document to convert Llama3 models of different quantization types to bmodel. | ||
|
||
- Prepare the environment on an X86 workstation | ||
|
||
Please refer to [TPU-MLIR Installation](../../model-compile/tpu_mlir_env) to configure the TPU-MLIR environment. | ||
Clone the repository | ||
|
||
```bash | ||
git clone https://github.com/zifeng-radxa/LLM-TPU.git | ||
``` | ||
|
||
- Download the open-source Llama3 model by filling out the application form on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main). | ||
|
||
- Create a virtual environment in the `LLM-TPU/models/Llama3` directory. | ||
|
||
For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage). | ||
|
||
```bash | ||
python3 -m virtualenv .venv | ||
source .venv/bin/activate | ||
pip3 install --upgrade pip | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
- Align the model environment | ||
|
||
Copy `LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` to the transformers library, noting that the transformers library should be in the .venv. | ||
|
||
```bash | ||
cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py | ||
``` | ||
|
||
Replace the `./compile/files/Meta-Llama-3-8B-Instruct/config.json` with the same file in the downloaded Llama-3-8B-Instruct path. | ||
|
||
- Generate the onnx file | ||
|
||
```bash | ||
cd compile | ||
python export_onnx.py --model_path your_model_path --seq_length 512 | ||
``` | ||
|
||
`--model_path`: Path to the downloaded meta llama3 folder | ||
|
||
`--seq_length`: Fixed sequence length to export, selectable as 512, 1024, 2048, etc., as needed | ||
|
||
- Generate the bmodel file | ||
|
||
Exit the virtual environment before generating the bmodel | ||
|
||
```bash | ||
deactivate | ||
``` | ||
|
||
Compile the model | ||
|
||
```bash | ||
./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8 | ||
``` | ||
|
||
`--mode`: Quantization mode, options are int4, int8 | ||
|
||
`--seq_length`: Sequence length, should match the seq_length specified when generating the onnx file | ||
|
||
`--name`: Model name, must be llama3-8b | ||
|
||
Generating the bmodel takes about 2 hours or more. It is recommended to have 64G memory and over 200G of disk space, otherwise OOM or no space left errors are likely. | ||
|
||
## Llama3 Cpython File Compilation | ||
|
||
Compile executable files in the Airbox. Precompiled files are included in the `llama3-8b_int4_1dev_512.tar.gz` download package. If already downloaded, no need to compile. | ||
|
||
```bash | ||
cd python_demo | ||
mkdir build | ||
cd build | ||
cmake .. | ||
make | ||
cp *chat* .. | ||
``` |
9 changes: 9 additions & 0 deletions
9
...ontent-docs/current/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
sidebar_position: 11 | ||
--- | ||
|
||
# Llama3 Chatbot-TPU | ||
|
||
import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx'; | ||
|
||
<Chatbotllama3 /> |
Binary file not shown.
Binary file not shown.