LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Build LLMServingSim

1. Git Clone

git clone --recurse-submodules https://github.com/casys-kaist/LLMServingSim.git
cd LLMServingSim

2. `Conda` install (Optional)

Conda can be downloaded from the following link.

curl -O https://repo.anaconda.com/archive/Anaconda3-2024.06-1-Linux-x86_64.sh
bash Anaconda3-2024.06-1-Linux-x86_64.sh

3. Install Dependency (tested in python 3.9, GCC, G++ 7.5.0)

Using `conda` environment.yml (Recommended)

conda env create -p ./env -f ./environment.yml
conda activate ./env

Clean `conda` Install

conda create -n env_name python=3.9
conda activate env_name
conda install conda-forge::libprotobuf=3.6.1
conda install conda-forge::cmake=3.15
conda install cctbx202208::boost-cpp=1.74.0

pip install -r requirements.txt

4. Build ASTRA-Sim, Chakra, Polymath

Common issues while building ASTRA-Sim. If error regarding version of protoc happens see here.

cd astra-sim
./build/astra_analytical/build.sh
cd extern/graph_frontend/chakra
pip install .
cd ../../../../execution_engine/polymath
pip install .
cd ../..

Run LLMServingSim

1. Set Input Configurations

Config & Dataset Path:

Network config path: astra-sim/inputs/network/analytical/{config_name}.json
NPU config path: execution_engine/codelets_src/codelets/examples/genesys/configs/{config_name}.json
Dataset path: astra-sim/dataset/{dataset_name}.tsv

2. Run LLMServingSim

Test Run

python3 -u main.py --model_name 'gpt3-6.7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'

python3 -u main.py --model_name 'llama-7b' --npu_num 1 --npu_group 1 --npu_mem 24 --dataset 'dataset/share-gpt-req100-rate10.tsv'

Parameters of `main.py`

Parameters	Supporting Options	Default Value	Notes
model_name	'gpt2', 'gpt3-6.7b', 'gpt3-125m', 'gpt3-350m', 'gpt3-760m', 'gpt3-1.3bm', 'gpt3-2.7b', 'gpt3-6.7b', 'gpt3-13b', 'gpt3-30b', 'gpt3-175b', 'opt-125m', 'opt-350m', 'opt-1.3b', 'opt-2.7b', 'opt-2.7b', 'opt-6.7b', 'opt-13b', 'opt-30b', 'opt-66b', 'opt-175b', 'llama-7b', 'llama-30b', 'llama-70b'	'gpt2'
npu_num	Integer	16
max_batch	Integer	0	0: no limit
batch_delay	Integer	0
scheduling	'none', 'orca'	'orca'
parallel	'pipeline', 'tensor', 'hybrid'	'hybrid'
npu_group	Integer	1
npu_mem	Integer	40
kv_manage	'max', 'pow2', 'oracle', 'vllm'	'vllm'
block_size	Integer	8
pim_type	'none', 'local', 'pool'	'none'
sub_batch	Flag	False	Sub-batch Scheduling On/Off
dataset	Dataset Path	None	None: manually add requests in main.py
network	JSON File Name	None	None: following convention "fully_connected_{network_dim}d_{number_of_NPUs}d.json"
output	Output TSV Path	None	None: no tsv output only stdout
gen	Flag	False	Skip initiation phase On/Off
fast_run	Flag	False	Skip all compilation and force to use cached trace for fast simulation

Outputs of `main.py`

In all outputs, the unit of throughput is tokens/second, and the unit of simulation time is milliseconds.

1. Standard Output

The standard output shows which requests are being processed in each iteration of the simulator and displays the measured throughput at regular intervals. Additionally, it provides a summary of throughput and simulation time at the end.

2. Throughput TSV File

{output_filename}-throughput.tsv contains the prompt and generation throughput at each interval.

3. Simulation Time TSV File

{output_filename}-simulation-time.tsv contains the simulation time of each components.

Evaluation

Move to Evaluation Folder

cd evaluation

Run Each Evaluation Script

./evaluation1.sh
./evaluation2.sh
...
./evaluation5.sh

Run All Evaluation Script

./evaluation_all.sh

For detailed information about the evaluation, please refer to the README file in the evaluation folder.

Common Errors

Error Example

If your error is similar to this, you can use the below solution.

/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
   17 | #error This file was generated by an older version of protoc which is
      |  ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
   18 | #error incompatible with your Protocol Buffer headers.  Please
      |  ^~~~~
/home/<user>/LLMServingSim/astra-sim/extern/graph_frontend/chakra/et_def/et_def.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
   19 | #error regenerate this file with a newer version of protoc.
      |  ^~~~~

Method 1: Setting Environment Variables

This method explicitly sets the conda environment for CMake to use.

Activate the Conda Environment: First, activate the desired conda environment.
```
conda activate your_env_name
```
Set the CMAKE_PREFIX_PATH Environment Variable: Add the path of the activated conda environment to the CMAKE_PREFIX_PATH environment variable.
```
export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH
```

Method 2: Setting the Activation Script

Activate the Conda Environment: First, activate the conda environment you want to modify.
```
conda activate your_env_name
```
Navigate to the Environment's Activation Script Directory: The activation scripts are located in the etc/conda/activate.d directory within your conda environment. If this directory does not exist, create it along with the deactivation directory.
```
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d
```

Create and Edit the Activation Script: Create a script named set_cmake_prefix.sh to set the CMAKE_PREFIX_PATH when the environment is activated.

nano $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.sh

Add the following content to this file:

#!/bin/bash
export OLD_CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$CONDA_PREFIX:$CMAKE_PREFIX_PATH

Create and Edit the Deactivation Script: Create a script named unset_cmake_prefix.sh to reset the CMAKE_PREFIX_PATH when the environment is deactivated.
```
nano $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.sh
```
Add the following content to this file:
```
#!/bin/bash
export CMAKE_PREFIX_PATH=$OLD_CMAKE_PREFIX_PATH
unset OLD_CMAKE_PREFIX_PATH
```

Set Script Permissions: Ensure the scripts are executable.

chmod +x $CONDA_PREFIX/etc/conda/activate.d/set_cmake_prefix.sh
chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/unset_cmake_prefix.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Build LLMServingSim

1. Git Clone

2. `Conda` install (Optional)

3. Install Dependency (tested in python 3.9, GCC, G++ 7.5.0)

Using `conda` environment.yml (Recommended)

Clean `conda` Install

4. Build ASTRA-Sim, Chakra, Polymath

Run LLMServingSim

1. Set Input Configurations

2. Run LLMServingSim

Parameters of `main.py`

Outputs of `main.py`

1. Standard Output

2. Throughput TSV File

3. Simulation Time TSV File

Evaluation

Move to Evaluation Folder

Run Each Evaluation Script

Run All Evaluation Script

Common Errors

Error Example

Method 1: Setting Environment Variables

Method 2: Setting the Activation Script

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

Build LLMServingSim

1. Git Clone

2. Conda install (Optional)

3. Install Dependency (tested in python 3.9, GCC, G++ 7.5.0)

Using conda environment.yml (Recommended)

Clean conda Install

4. Build ASTRA-Sim, Chakra, Polymath

Run LLMServingSim

1. Set Input Configurations

2. Run LLMServingSim

Parameters of main.py

Outputs of main.py

1. Standard Output

2. Throughput TSV File

3. Simulation Time TSV File

Evaluation

Move to Evaluation Folder

Run Each Evaluation Script

Run All Evaluation Script

Common Errors

Error Example

Method 1: Setting Environment Variables

Method 2: Setting the Activation Script

2. `Conda` install (Optional)

Using `conda` environment.yml (Recommended)

Clean `conda` Install

Parameters of `main.py`

Outputs of `main.py`