Blazing-Fast Code Editing via Multi-Layer Speculation

📙About • 🔥Installation • 🚀Commands • 📜Citation • 🙏Acknowledgement

🚀 We propose Blazedit, an extremely simple yet general speculative decoding method that accelerate whole-file code editing by up to 7.7x over a comprehensive set of editing scenarios.

This README provides a quick overview and usage for our technique. A more detailed introduction of Blazedit can be find in our blog post.

Multi-Layer Speculation

We start the introduction with the limitation of existing methods:

High Overhead in Assisted Decoding: Draft model can generate meaningful draft tokens during real edits instead of simply copying, leading to higher acceptance rates. Nonetheless, draft generation is still autoregressive and thus of non-negligible overhead, especially when the draft length is long.
Low Acceptance Rate in Prompt Lookup Decoding (PLD): PLD is efficient as the cost of drafting is negiligble. However, the "copying" mechanism can lead to very low acceptance rate in the validation step when the target model is making real edits.

Blazedit addresses these limitations using an elegant multi-layer speculative decoding strategy. In the high level, similar to assisted decoding, Blazedit uses a draft model to propose draft tokens, validated by the target model, for good acceptance rates. Meanwhile, Blazedit uses PLD to accelerate the draft model, reducing the overhead of draft-model generation. Specifically, the PLD step is performed multiple times to accumulate draft tokens before invoking a target-model forward pass. This allows the draft model to propose an adaptive number of draft tokens, which optimizes the target-model acceptance rate:

It detects the copy-intensive scenario when the PLD layer gets a high acceptance rate, such that the draft model can propose more draft tokens.
It detects the edit-intensive scenario when the PLD layer gets a low acceptance rate, such that the draft model can propose fewer draft tokens.

We evaluated Blazedit and baselines on A100 GPUs under a comprehensive set of editing scenarios.

Target Model		Regular	Assisted	PLD	Ours	Speedup (Worst)	Speedup (SOTA)
Qwen2.5-Coder-32B	Avg.	74.6	134.2	379.3	434.8	5.8x	1.15x
	P90	60.7	100.3	130.7	169.0	2.8x	1.29x
DeepSeekCoder-33B	Avg.	55.3	123.4	364.2	424.5	7.7x	1.17x
	P90	45.1	97.8	120.9	173.4	3.8x	1.43x

Installation

git clone [email protected]:ise-uiuc/blazedit.git --recurse-submodules
cd blazedit
conda create -n spec-edit python=3.12
conda activate spec-edit
pip install -e submodules/transformers
pip install -r requirements.txt

Evaluation Commands

Generating experiment configurations for grid searching:

export PYTHONPATH=$(pwd)

# Grid-searching Blazedit configurations
python eval/configs/gen_2layer_controlled_experiment.py     \
  --draft-model  "deepseek-ai/deepseek-coder-1.3b-instruct" \
  --target-model "deepseek-ai/deepseek-coder-33b-instruct"

# Grid-searching baseline configurations (PLD, regular, assited)
python eval/configs/gen_baseline_controlled_experiment.py   \
  --draft-model  "deepseek-ai/deepseek-coder-1.3b-instruct" \
  --target-model "deepseek-ai/deepseek-coder-33b-instruct"

Commands above generates bash files over different GPUs so that experiments are run in a batched, parallelized, and balanced manner.

# Blazedit experiments
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_2layer_g0.sh   # GPU 0
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_2layer_g1.sh   # GPU 1
# ...

# Baseline experiments
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_baseline_g0.sh  # GPU 0
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_baseline_g1.sh  # GPU 1
# ...

Visualize the results:

python eval/controlled_experiment.py results/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct

Citation

@misc{blazedit,
  author = {Daita, Vijay and Lian, Xinyu and Zhang, Lingming and Liu, Jiawei},
  title = {Blazing-Fast Code Editing via Multi-Layer Speculation},
  year = {2025},
  howpublished = {\url{https://github.com/ise-uiuc/blazedit}}
}

Acknowledgement

The following resources have been helpful in developing this project:

We thank Jiankun Wang (UIUC) and Zhihao Zhang (CMU) for their insightful discussion.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
docs		docs
eval		eval
ref_impl		ref_impl
submodules		submodules
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Blazing-Fast Code Editing via Multi-Layer Speculation

Multi-Layer Speculation

Installation

Evaluation Commands

Citation

Acknowledgement

About

Contributors 2

Languages

ise-uiuc/blazedit

Folders and files

Latest commit

History

Repository files navigation

Blazing-Fast Code Editing via Multi-Layer Speculation

Multi-Layer Speculation

Installation

Evaluation Commands

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages