update README

llm-db · Apr 26, 2024 · b942e0d · b942e0d
1 parent 807bbd8
commit b942e0d
Show file tree

Hide file tree

Showing 5 changed files with 35 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,29 @@
-# FineInfer
+<h1 align="center">
+FineInfer
+</h1>
+
+<p align="center">
+| <a href="https://dl.acm.org/doi/10.1145/3642970.3655835"><b>Paper</b></a> |
+</p>
+
+FineInfer is a research prototype for fine-tuning and serving large language models.
+
+FineInfer supports concurrent parameter-efficient fine-tuning and inference through the following features:
+* Deferred continuous batching
+* Hybrid system architecture
+* Heterogeneous batching
+
+## Installation and examples
+[See here](https://github.com/llm-db/FineInfer/blob/main/benchmarks/fineinfer/README.md)
+
+## Citation
+```
+@inproceedings{FineInfer,
+  author = {He, Yongjun and Lu, Yao and Alonso, Gustavo},
+  title = {Deferred Continuous Batching in Resource-Efficient Large Language Model Serving},
+  year = {2024},
+  booktitle = {Proceedings of the 4th Workshop on Machine Learning and Systems},
+  pages = {98–106},
+  series = {EuroMLSys '24}
+}
+```
diff --git a/benchmarks/colossalai/README.md b/benchmarks/colossalai/README.md
@@ -23,7 +23,7 @@ CUDA_VISIBLE_DEVICES=0 python colossalai-offload-peft-gen.py -m meta-llama/Llama
 CUDA_VISIBLE_DEVICES=0 python colossalai-offload-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
 ```
 
-ColossalAI-Heterogeneous
+ColossalAI-heterogeneous
 ```
 CUDA_VISIBLE_DEVICES=0 python colossalai-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 CUDA_VISIBLE_DEVICES=0 python colossalai-offload-ht.py -m meta-llama/Llama-2-13b-hf --batch_size 1 --cpu_offload

diff --git a/benchmarks/deepspeed/README.md b/benchmarks/deepspeed/README.md
@@ -16,14 +16,14 @@ deepspeed --num_gpus 1 zero-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch_size
 deepspeed --num_gpus 1 zero-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 ```
 
-ZeRO-Offload
+ZeRO-offload
 ```
 deepspeed --num_gpus 1 zero-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
 deepspeed --num_gpus 1 zero-offload-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
 deepspeed --num_gpus 1 zero-offload-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
 ```
 
-ZeRO-Heterogeneous
+ZeRO-heterogeneous
 ```
 deepspeed --num_gpus 1 zero-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 deepspeed --num_gpus 1 zero-offload-ht.py -m meta-llama/Llama-2-13b-hf --batch_size 1 --cpu_offload

diff --git a/benchmarks/fineinfer/README.md b/benchmarks/fineinfer/README.md
@@ -9,16 +9,12 @@ pip install bitsandbytes peft
 conda deactivate
 ```
 
-FineInfer
+FineInfer-inference
 ```
 CUDA_VISIBLE_DEVICES=0 python fi-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 ```
 
-FineInfer-Offload
-```
-```
-
-FineInfer-Heterogeneous
+FineInfer-heterogeneous
 ```
 CUDA_VISIBLE_DEVICES=0 python baseline-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 CUDA_VISIBLE_DEVICES=0 python fi-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1

diff --git a/benchmarks/huggingface/README.md b/benchmarks/huggingface/README.md
@@ -16,7 +16,7 @@ CUDA_VISIBLE_DEVICES=0 python hf-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch
 CUDA_VISIBLE_DEVICES=0 python hf-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1
 ```
 
-HuggingFace-Offload
+HuggingFace-offload
 ```
 CUDA_VISIBLE_DEVICES=0 python hf-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
 CUDA_VISIBLE_DEVICES=0 python hf-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --disk_offload