C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Mixture-of-Experts (MoE) Large Language Models (LLMs) suffer from severely sub-optimal expert pathways—our study reveals that naive expert selection learned from pretraining leaves a surprising 10-20% accuracy gap for improvement. Motivated by this observation, we develop a novel class of test-time optimization methods to re-weight or “re-mixing” the experts in different layers jointly for each test sample. Since the test sample’s ground truth is unknown, we propose to optimize a surrogate objective defined by the sample’s “successful neighbors” from a reference set of samples. We introduce three surrogates and algorithms based on mode-finding, kernel regression, and the average loss of similar reference samples/tasks. To reduce the cost of optimizing whole pathways, we apply our algorithms merely to the core experts’ mixing weights in critical layers, which enjoy similar performance but save significant computation. This leads to “Critical-Layer, Core-Expert, Collaborative Pathway Optimization (C3PO)”. We apply C3PO to two recent MoE LLMs and examine it on six widely-used benchmarks. It consistently improves the base model by 7-15% in accuracy and outperforms widely used test-time learning baselines, e.g., in-context learning and prompt/prefix tuning, by a large margin. Moreover, C3PO enables MoE LLMs with 1-3B active parameters to outperform LLMs of 7-9B parameters, hence improving MoE’s advantages on efficiency. Our thorough ablation study further sheds novel insights on achieving test-time improvement on MoE.

Setup and Installation

1. Create Conda Environment

Create a new conda environment named C3PO and install the required packages:

# Create conda environment
conda create -n C3PO python=3.10 -y
conda activate C3PO

# Install PyTorch (for CUDA 12.3)
conda install pytorch torchvision torchaudio pytorch-cuda=12.3 -c pytorch -c nvidia -y

# Install required packages
pip install torch numpy transformers fvcore tqdm

2. Download Reference Cases

Download the reference cases from this link: Reference Cases

# Extract the downloaded reference.zip
unzip reference.zip -d reference_data

3. Download Datasets

Run the download.sh script to get the necessary datasets:

# Execute download script
bash download.sh

4. Replace the original olmoe_modeling.py to customize the routing weights

# Replace olmoe_modeling.py
cp olmoe_modeling.py /miniconda3/envs/C3PO/lib/python3.10/site-packages/transformers/models/olmoe/modeling_olmoe.py

5. Run the optimizer code

# Run the main script
python olmoe_optimizer.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
logs		logs
scripts		scripts
.gitignore		.gitignore
C3PO.png		C3PO.png
C3PO_Radar.png		C3PO_Radar.png
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
olmoe_modeling.py		olmoe_modeling.py
olmoe_optimizer.py		olmoe_optimizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Setup and Installation

1. Create Conda Environment

2. Download Reference Cases

3. Download Datasets

4. Replace the original olmoe_modeling.py to customize the routing weights

5. Run the optimizer code

About

Releases

Packages

Languages

License

tianyi-lab/C3PO

Folders and files

Latest commit

History

Repository files navigation

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Setup and Installation

1. Create Conda Environment

2. Download Reference Cases

3. Download Datasets

4. Replace the original olmoe_modeling.py to customize the routing weights

5. Run the optimizer code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages