Dynamic Early Exit

This repository implements a dynamic early exit strategy aiming to enhance the computational efficiency of large language models (LLMs) while maintaining prediction quality. The framework extends the LayerSkip methodology with novel heuristics, including Repeated Tokens, Cosine Similarity, Token Confidence Convergence, Entropy-Based Threshold, and Max Probability, to determine stabilization in token predictions.

This repository is built off of the repository provided for the implementation of LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.

Files added and/or changed from original repository:

self_speculation/early_exit_utils.py
self_speculation/llama_model_utils.py
self_speculation/generator_basy.py
self_speculation/autoregressive_generator.py

Authors:

Juan D. Castano ([email protected])
Amir Voloshin ([email protected])
Daniel Carrera ([email protected])

Getting Started

Clone repo:

$ git clone https://github.com/Amir-Voloshin/DynamicEarlyExit.git
$ cd DynamicEarlyExit

Setup environment:

$ conda create --name layer_skip python=3.10
$ conda activate dynamic_early_exit

$ pip install -r requirements.txt

Access models: In order to observe speedup, you need to access LLMs that have been trained using the LayerSkip recipe. We provide 6 checkpoints on HuggingFace of different Llama models continually pretrained using the LayerSkip recipe:

In order to access each model:

Visit the model's corresponding link above, make sure you are logged on the HuggingFace website with your account.
Fill the request form and submit it. Approval may take a while and you should receive an email notification to notify you that permission to the model is granted.
Follow the steps here to obtain a user access token.
In the command-line run huggingface-cli login, and you will be prompted to provide the token you have obtained in Step 3.

Once you run those steps, the commands below to run the LayerSkip checkpoints should work.

Generate

To run a model in interactive mode using regular autoregressive decoding:

$ torchrun generate.py --model facebook/layerskip-llama3.2-1B \
    --sample True \
    --max_steps 512

To perform dynamic early exit, you need to specify --criteria. Criteria options are: "cosine_similarity", "token_repeat", "entropy_based", "max_probability", or "convergence".

$ torchrun generate.py --model facebook/layerskip-llama3.2-1B \
    --sample True \
    --max_steps 512 \
    --generation_strategy autoregressive \
    --criteria "cosine_similarity"

Tips:

You may change --model to any HuggingFace model
By default we enable sampling. You may change the sampling behaviour using the --sample, --temperature, --top_p, and --top_k arguments.
You may run python generate.py --help for details on different command-line arguments.

Benchmark

To benchmark on a dataset:

$ torchrun benchmark.py --model facebook/layerskip-llama3.2-1B \
    --dataset cnn_dm_summarization \
    --num_samples 100 \
    --generation_strategy autoregressive \
    --output_dir ./logs

Tips:

You can specify different tasks by modifying the --dataset argument:
- cnn_dm_summarization: CNN/DM Summarization
- xsum_summarization: XSUM Summarization
- cnn_dm_lm: CNN/DM Language Modeling (given the first few words of an article, generate the remaining article)
- human_eval: HumanEval Coding
By default, the tasks run as 0-shot. You can change to any specified n-shot by specifying the --n_shot argument.
By default we enable sampling, while the results reported in the paper were greedy decoding without sampling. You may change the sampling behaviour using the --sample, --temperature, --top_p, and --top_k arguments.
You may run python benchmark.py --help for details on different command-line arguments.

Using Docker

Kindly check DOCKER.md to setup the project using docker

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Picture		Picture
self_speculation		self_speculation
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
benchmark.py		benchmark.py
data.py		data.py
entrypoint.sh		entrypoint.sh
eval.py		eval.py
exited_layers.csv		exited_layers.csv
generate.py		generate.py
requirements.txt		requirements.txt
run_benchmarks.sh		run_benchmarks.sh
test.md		test.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Early Exit

Getting Started

Generate

Benchmark

Using Docker

About

Releases

Packages

Contributors 3

Languages

License

Amir-Voloshin/DynamicEarlyExit

Folders and files

Latest commit

History

Repository files navigation

Dynamic Early Exit

Getting Started

Generate

Benchmark

Using Docker

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages