Recent advancements have illuminated theefficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark.
- Python = 3.9
- timm = 0.5.4
- avalanche-lib = 0.4.0
- Other dependencies specified in
requirements.txt
To set up your environment to run the code, follow these steps:
- Clone the Repository:
git clone https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.git
cd EFFT-EFfective-Factor-Tuning
- Create and Activate a Virtual Environment (optional but recommended) and Install the Required Packages:
conda create --name EFFT python=3.9
conda activate EFFT
pip install -r requirements.txt
-
Download Datasets To download the datasets, please refer to https://github.com/ZhangYuanhan-AI/NOAH/#data-preparation. Then move the dataset folders to
<YOUR PATH>/EFFT-EFfective-Factor-Tuning/data/
. -
Download Checkpoints of ViT and Swin Transformers As for ViT-B, download the pretrained ViT-B/16 to
<YOUR PATH>/EFFT-EFfective-Factor-Tuning/ViT-B_16.npz
. For other sizes of ViT and Swin Transformers, please kindly refer to ViT and Swin Transformers.
To reproduce the experiments, run:
./run.sh
You can also run experiment one by one:
python execute.py --model "ViT" --size "B" --dataset "cifar"
You can customize the execution by specifying various parameters:
--model
: Choose between 'ViT' or 'Swin'.--size
: For 'ViT', options include 'B', 'L', 'H'. For 'Swin', options include 'T', 'S', 'B', 'L'.--dataset
: Select from a wide range of datasets including 'cifar', 'caltech101', 'dtd', and many others listed in the introduction.
Example:
python execute.py --model "ViT" --size "B" --dataset "cifar"
Note: When using the 'ViT B' model, optimal hyperparameters for replication will be automatically imported.
Contributions to this project are welcome. Please consider the following ways to contribute:
- Reporting issues
- Improving documentation
- Proposing new features or improvements
This project is based on the findings and methodologies presented in the paper "Effective Factor Tuning". We would like to express our sincere appreciation to Tong Yao from Peking University (PKU) and Professor Yao Wan from Huazhong University of Science and Technology (HUST) for their invaluable contributions and guidance in this research. Part of the code is borrowed from FacT and timm.
@article{chen2023aggregate,
title={Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer},
author={Chen, Dongping},
journal={arXiv preprint arXiv:2311.06749},
year={2023}
}