Skip to content

Commit

Permalink
Merge pull request #188 from zifeng-radxa/main
Browse files Browse the repository at this point in the history
docs: add Llama3 docs with zh/en
  • Loading branch information
akgnah authored May 22, 2024
2 parents c1a0285 + a4dd21c commit 82b0c31
Show file tree
Hide file tree
Showing 6 changed files with 346 additions and 0 deletions.
164 changes: 164 additions & 0 deletions docs/common/ai/_chatbot_llama3.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
Llama3 ChatBot-TPU 是使用 Sophon SDK 将 Meta 开源 [Llama3](https://ai.meta.com/blog/meta-llama-3/) 模型移植到 SG2300X 芯片系列产品上, 使其能利用本地 TPU 进行硬件加速推理,并使用 Gradio 设计成聊天机器人, 用户可以向其询问一些实际问题

## Llama3 部署

- 克隆仓库

```bash
git clone https://github.com/zifeng-radxa/LLM-TPU.git
```

- 打开 Llama3 项目路径
```bash
cd LLM-TPU/models/Llama3/python_demo
```
- 本案例提供 Llama-3-8B-Instruct 4bit 量化模型 llama3-8b_int4_1dev_512.bmodel 与 C++ 预编译文件下载

用户可以参考 [Llama3 模型转换](#llama3-模型转换)自行转换不同量化方式的 Llama3 模型

用户可以参考 [Llama3 cpython 文件编译](#llama3-cpython-文件编译) 自行编译 cpython 接口绑定文件

```bash
# llama3-8b_int4_1dev_512.bmodel
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh
bash tar_downloader.sh
tar -xvf llama3-8b_int4_1dev_512.tar.gz
```

- 配置环境

**必须创建虚拟环境,否则可能会影响其他应用的正常运行**, 虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage)

```bash
python3 -m virtualenv .venv
source .venv/bin/activate
```

- 安装依赖包

```bash
pip3 install --upgrade pip
pip3 install -r requirements.txt
```

- 导入环境变量

请使用 ldd 命令检查 chat.cpython-38-aarch64-linux-gnu.so 链接的 libbmlib.so 的路径是否为 `LLM-TPU/support/lib_soc/libbmlib.so`

`libbmlib.so` 链接路径有误可运行下面的命令

```bash
export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH
```

- 启动 Llama3

**终端模式**

```bash
python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
```

`-m`: 指定模型路径

`-t`: 指定 token_config 文件夹路径

<img src="/img/general-tutorial/tpu_ai/llama3_pipeline.webp" />

**Gradio 模式**

```bash
python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
```

`-m`: 指定模型路径

`-t`: 指定 token_config 文件夹路径
浏览器访问 Airbox ip 地址的 8003 端口

{" "}

<img src="/img/general-tutorial/tpu_ai/llama3_web_demo.webp" />

## Llama3 模型转换

用户可以参考本文自行转换不同量化类型的 Llama3 模型到 bmodel

- X86 工作站中准备环境

请参考 [TPU-MLIR 安装](../../model-compile/tpu_mlir_env) 配置 TPU-MLIR 环境
克隆仓库

```bash
git clone https://github.com/zifeng-radxa/LLM-TPU.git
```

- 通过 [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main) 填写申请表下载 Llama3 开源模型

- 在工作目录 `LLM-TPU/models/Llama3` 中创建虚拟环境

虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage)

```bash
python3 -m virtualenv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
```

- 对齐模型环境

`LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` 复制到 transformers 库中,
注意此时 transformers 库应在 .venv 里

```bash
cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py
```

同时将 `./compile/files/Meta-Llama-3-8B-Instruct/config.json` 替换下载好的 Llama-3-8B-Instruct 路径下的同名文件。

- 生成 onnx 文件

```bash
cd compile
python export_onnx.py --model_path your_model_path --seq_length 512
```

`--model_path`: 是下载的 meta llama3 文件夹路径

`--seq_length`: 是固定导出的 sequence length, 根据需要可选 512, 1024,2048 等长度

- 生成 bmodel 文件

生成 bmodel 之前需要退出虚拟环境

```bash
deactivate
```

编译模型

```bash
./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8
```

`--mode`:量化模式,可选 int4, int8

`--seq_length`: 序列长度,需和生成 onnx 文件时指定相同 seq_length

`--name`: 模型名称,此处必须为 llama3-8b

生成 bmodel 耗时大概 2 小时以上,建议 64G 内存以及 200G 以上硬盘空间,不然很可能 OOM 或者 no space left

## Llama3 cpython 文件编译

在 Airbox 中编译可执行文件, 预编译文件已经包含在 llama3-8b_int4_1dev_512.tar.gz 下载包中,如已下载无需编译

```bash
cd python_demo
mkdir build
cd build
cmake ..
make
cp *chat* ..
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
sidebar_position: 11
---

# Llama3 Chatbot-TPU

import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx';

<Chatbotllama3 />
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
Llama3 ChatBot-TPU uses the Sophon SDK to port Meta's open-source [Llama3](https://ai.meta.com/blog/meta-llama-3/) model to the SG2300X series chips. This enables hardware-accelerated inference using local TPU and designs it into a chatbot with Gradio, allowing users to ask practical questions.

## Llama3 Deployment

- Clone the repository

```bash
git clone https://github.com/zifeng-radxa/LLM-TPU.git
```

- Open the Llama3 project directory

```bash
cd LLM-TPU/models/Llama3/python_demo
```

- This example provides the Llama-3-8B-Instruct 4-bit quantized model `llama3-8b_int4_1dev_512.bmodel` and C++ precompiled files for download.

Users can refer to [Llama3 Model Conversion](#llama3-model-conversion) to convert Llama3 models to different quantization methods.

Users can refer to [Llama3 Cpython File Compilation](#llama3-cpython-file-compilation) to compile the cpython interface binding files themselves.

```bash
# llama3-8b_int4_1dev_512.bmodel
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh
bash tar_downloader.sh
tar -xvf llama3-8b_int4_1dev_512.tar.gz
```

- Configure the environment

**A virtual environment must be created to avoid affecting the normal operation of other applications.** For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage).

```bash
python3 -m virtualenv .venv
source .venv/bin/activate
```

- Install dependencies

```bash
pip3 install --upgrade pip
pip3 install -r requirements.txt
```

- Import environment variables

Use the `ldd` command to check if the `chat.cpython-38-aarch64-linux-gnu.so` is linked to the `libbmlib.so` at `LLM-TPU/support/lib_soc/libbmlib.so`.

If the `libbmlib.so` link path is incorrect, run the following command:

```bash
export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH
```

- Start Llama3

**Terminal Mode**

```bash
python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
```

`-m`: Specify the model path

`-t`: Specify the token_config folder path

<img src="../../../../img/general-tutorial/tpu_ai/llama3_pipeline.webp" />

**Gradio Mode**

```bash
python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
```

`-m`: Specify the model path

`-t`: Specify the token_config folder path

Access the Airbox IP address at port 8003 in your browser.

<img src="../../../../img/general-tutorial/tpu_ai/llama3_web_demo.webp" />

## Llama3 Model Conversion

Users can refer to this document to convert Llama3 models of different quantization types to bmodel.

- Prepare the environment on an X86 workstation

Please refer to [TPU-MLIR Installation](../../model-compile/tpu_mlir_env) to configure the TPU-MLIR environment.
Clone the repository

```bash
git clone https://github.com/zifeng-radxa/LLM-TPU.git
```

- Download the open-source Llama3 model by filling out the application form on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main).

- Create a virtual environment in the `LLM-TPU/models/Llama3` directory.

For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage).

```bash
python3 -m virtualenv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
```

- Align the model environment

Copy `LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` to the transformers library, noting that the transformers library should be in the .venv.

```bash
cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py
```

Replace the `./compile/files/Meta-Llama-3-8B-Instruct/config.json` with the same file in the downloaded Llama-3-8B-Instruct path.

- Generate the onnx file

```bash
cd compile
python export_onnx.py --model_path your_model_path --seq_length 512
```

`--model_path`: Path to the downloaded meta llama3 folder

`--seq_length`: Fixed sequence length to export, selectable as 512, 1024, 2048, etc., as needed

- Generate the bmodel file

Exit the virtual environment before generating the bmodel

```bash
deactivate
```

Compile the model

```bash
./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8
```

`--mode`: Quantization mode, options are int4, int8

`--seq_length`: Sequence length, should match the seq_length specified when generating the onnx file

`--name`: Model name, must be llama3-8b

Generating the bmodel takes about 2 hours or more. It is recommended to have 64G memory and over 200G of disk space, otherwise OOM or no space left errors are likely.

## Llama3 Cpython File Compilation

Compile executable files in the Airbox. Precompiled files are included in the `llama3-8b_int4_1dev_512.tar.gz` download package. If already downloaded, no need to compile.

```bash
cd python_demo
mkdir build
cd build
cmake ..
make
cp *chat* ..
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
sidebar_position: 11
---

# Llama3 Chatbot-TPU

import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx';

<Chatbotllama3 />
Binary file not shown.
Binary file not shown.

0 comments on commit 82b0c31

Please sign in to comment.