Skip to content
This repository was archived by the owner on Oct 7, 2025. It is now read-only.

BonnBytes/CaRA

Repository files navigation

Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers

Lokesh Veeramacheneni1, Moritz Wolter1, Hilde Kuehne2, and Juergen Gall1,3

1. University of Bonn
2. University of Tübingen, MIT-IBM Watson AI Lab
3. Lamarr Institute for Machine Learning and Artificial Intelligence

License CodeStyle Workflow Paper Project Page

Keywords: CaRA, Canonical Polyadic Decomposition, CPD, Tensor methods, ViT, LoRA

Abstract: Modern methods for fine-tuning a Vision Transformer (ViT) like Low-Rank Adaptation (LoRA) and its variants demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this limitation, we propose Canonical Rank Adaptation (CaRA). CaRA leverages tensor mathematics, first by tensorising the transformer into two different tensors; one for projection layers in MHA and the other for feed-forward layers. Second, the tensorised formulation is fine-tuned using the low-rank adaptation in Canonical-Polyadic Decomposition (CPD) form. Employing CaRA efficiently minimizes the number of trainable parameters. Experimentally, CaRA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as Visual Task Adaptation Benchmark (VTAB)-1k and Fine-Grained Visual Categorization (FGVC).

Alternative text

Installation

Use UV to install the requirements

For CPU based pytorch

uv sync --extra cpu

For CUDA based pytorch

uv sync --extra cu118

Datasets

In the case of VTAB-1k benchmark, refer to the dataset download instructions from NOAH. We download the datasets for FGVC benchmark from their respective sources.

Note: Create a data folder in the root and place the datasets inside this folder.

Pretrained models

Please refer to the download links provided in the paper.

Training

For fine-tuning ViT use the following command.

export PYTHONPATH=.
python image_classification/vit_cp.py --dataset=<choice_of_dataset> --dim=<rank>

Evaluation

We provide the link for fine-tuned models for each dataset in VTAB-1k benchmark here. To reproduce results from the paper, download the model and execute the following command

export PYTHONPATH=.
python image_classification/vit_cp.py --dataset=<choice_of_dataset> --dim=<rank> --evaluate=<path_to_model>

Acknowledgments

The code is built on the implementation of FacT. Thanks to Zahra Ganji for reimplementing VeRA baseline.

Citation

If you use this work, please cite using following bibtex entry

@inproceedings{
veeramacheneni2025canonical,
title={Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers},
author={Lokesh Veeramacheneni and Moritz Wolter and Hilde Kuehne and Juergen Gall},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}}