Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
YongjunHe committed Apr 26, 2024
1 parent 807bbd8 commit b942e0d
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 11 deletions.
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,29 @@
# FineInfer
<h1 align="center">
FineInfer
</h1>

<p align="center">
| <a href="https://dl.acm.org/doi/10.1145/3642970.3655835"><b>Paper</b></a> |
</p>

FineInfer is a research prototype for fine-tuning and serving large language models.

FineInfer supports concurrent parameter-efficient fine-tuning and inference through the following features:
* Deferred continuous batching
* Hybrid system architecture
* Heterogeneous batching

## Installation and examples
[See here](https://github.com/llm-db/FineInfer/blob/main/benchmarks/fineinfer/README.md)

## Citation
```
@inproceedings{FineInfer,
author = {He, Yongjun and Lu, Yao and Alonso, Gustavo},
title = {Deferred Continuous Batching in Resource-Efficient Large Language Model Serving},
year = {2024},
booktitle = {Proceedings of the 4th Workshop on Machine Learning and Systems},
pages = {98–106},
series = {EuroMLSys '24}
}
```
2 changes: 1 addition & 1 deletion benchmarks/colossalai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ CUDA_VISIBLE_DEVICES=0 python colossalai-offload-peft-gen.py -m meta-llama/Llama
CUDA_VISIBLE_DEVICES=0 python colossalai-offload-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
```

ColossalAI-Heterogeneous
ColossalAI-heterogeneous
```
CUDA_VISIBLE_DEVICES=0 python colossalai-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
CUDA_VISIBLE_DEVICES=0 python colossalai-offload-ht.py -m meta-llama/Llama-2-13b-hf --batch_size 1 --cpu_offload
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/deepspeed/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ deepspeed --num_gpus 1 zero-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch_size
deepspeed --num_gpus 1 zero-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1
```

ZeRO-Offload
ZeRO-offload
```
deepspeed --num_gpus 1 zero-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
deepspeed --num_gpus 1 zero-offload-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
deepspeed --num_gpus 1 zero-offload-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
```

ZeRO-Heterogeneous
ZeRO-heterogeneous
```
deepspeed --num_gpus 1 zero-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
deepspeed --num_gpus 1 zero-offload-ht.py -m meta-llama/Llama-2-13b-hf --batch_size 1 --cpu_offload
Expand Down
8 changes: 2 additions & 6 deletions benchmarks/fineinfer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,12 @@ pip install bitsandbytes peft
conda deactivate
```

FineInfer
FineInfer-inference
```
CUDA_VISIBLE_DEVICES=0 python fi-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1
```

FineInfer-Offload
```
```

FineInfer-Heterogeneous
FineInfer-heterogeneous
```
CUDA_VISIBLE_DEVICES=0 python baseline-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
CUDA_VISIBLE_DEVICES=0 python fi-ht.py -m meta-llama/Llama-2-7b-hf --batch_size 1
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/huggingface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ CUDA_VISIBLE_DEVICES=0 python hf-peft-gen.py -m meta-llama/Llama-2-7b-hf --batch
CUDA_VISIBLE_DEVICES=0 python hf-peft.py -m meta-llama/Llama-2-7b-hf --batch_size 1
```

HuggingFace-Offload
HuggingFace-offload
```
CUDA_VISIBLE_DEVICES=0 python hf-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --cpu_offload
CUDA_VISIBLE_DEVICES=0 python hf-offload-gen.py -m meta-llama/Llama-2-7b-hf --batch_size 1 --disk_offload
Expand Down

0 comments on commit b942e0d

Please sign in to comment.