diff --git a/.gitignore b/.gitignore index 5955b349..f1b4d33f 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ build slurm* logs .vscode +.DS_Store diff --git a/README.md b/README.md index 70b22c42..0c9f5a61 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,8 @@ # README -EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs), which is built upon [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). +[EE-LLM](https://arxiv.org/abs/2312.04916) is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs), which is built upon [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and currently under active development. +![](images/ee_architecture.png) ## Installation @@ -92,4 +93,20 @@ Below are some parameters for early-exit LLM inference, which can be found in `t - `early_exit_thres`: The confidence threshold used to determine whether to execute early exiting, ranging from 0.0 to 1.0. -- `print_max_prob`: If set, the inference server will print the token with the highest confidence and the confidence values at all exits. \ No newline at end of file +- `print_max_prob`: If set, the inference server will print the token with the highest confidence and the confidence values at all exits. + + +## BibTeX + +``` +@misc{chen2023eellm, + title={EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism}, + author={Yanxi Chen and Xuchen Pan and Yaliang Li and Bolin Ding and Jingren Zhou}, + year={2023}, + eprint={2312.04916}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} +``` + + diff --git a/images/ee_architecture.png b/images/ee_architecture.png new file mode 100644 index 00000000..20d9fe30 Binary files /dev/null and b/images/ee_architecture.png differ