Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers

Lokesh Veeramacheneni¹, Moritz Wolter¹, Hilde Kuehne², and Juergen Gall^1,3

1. University of Bonn

2. University of Tübingen, MIT-IBM Watson AI Lab

3. Lamarr Institute for Machine Learning and Artificial Intelligence

Keywords: CaRA, Canonical Polyadic Decomposition, CPD, Tensor methods, ViT, LoRA

Abstract: Modern methods for fine-tuning a Vision Transformer (ViT) like Low-Rank Adaptation (LoRA) and its variants demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this limitation, we propose Canonical Rank Adaptation (CaRA). CaRA leverages tensor mathematics, first by tensorising the transformer into two different tensors; one for projection layers in MHA and the other for feed-forward layers. Second, the tensorised formulation is fine-tuned using the low-rank adaptation in Canonical-Polyadic Decomposition (CPD) form. Employing CaRA efficiently minimizes the number of trainable parameters. Experimentally, CaRA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as Visual Task Adaptation Benchmark (VTAB)-1k and Fine-Grained Visual Categorization (FGVC).

Installation

Use UV to install the requirements

For CPU based pytorch

uv sync --extra cpu

For CUDA based pytorch

uv sync --extra cu118

Datasets

In the case of VTAB-1k benchmark, refer to the dataset download instructions from NOAH. We download the datasets for FGVC benchmark from their respective sources.

Note: Create a data folder in the root and place the datasets inside this folder.

Pretrained models

Please refer to the download links provided in the paper.

Training

For fine-tuning ViT use the following command.

export PYTHONPATH=.
python image_classification/vit_cp.py --dataset=<choice_of_dataset> --dim=<rank>

Evaluation

We provide the link for fine-tuned models for each dataset in VTAB-1k benchmark here. To reproduce results from the paper, download the model and execute the following command

export PYTHONPATH=.
python image_classification/vit_cp.py --dataset=<choice_of_dataset> --dim=<rank> --evaluate=<path_to_model>

Acknowledgments

The code is built on the implementation of FacT. Thanks to Zahra Ganji for reimplementing VeRA baseline.

Citation

If you use this work, please cite using following bibtex entry

@inproceedings{
veeramacheneni2025canonical,
title={Canonical Rank Adaptation: An Efficient Fine-Tuning Strategy for Vision Transformers},
author={Lokesh Veeramacheneni and Moritz Wolter and Hilde Kuehne and Juergen Gall},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
image_classification		image_classification
images		images
src/cara		src/cara
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CITATION.bib		CITATION.bib
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers

Installation

Datasets

Pretrained models

Training

Evaluation

Acknowledgments

Citation

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

BonnBytes/CaRA

Folders and files

Latest commit

History

Repository files navigation

Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers

Installation

Datasets

Pretrained models

Training

Evaluation

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Uh oh!

Languages

Packages