Skip to content

gaobb/AdaptCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

AdaptCLIP

HuggingFace Space

Official PyTorch Implementation of AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection, 2025.

Introduction

Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning, which is critical in open scenarios. To this end, we present a simple yet effective AdaptCLIP based on two key insights:

  • Adaptive visual and textual representations should be learned alternately rather than jointly.
  • Comparative learning should incorporate contextual and aligned residual features rather than relying solely on residual features.

AdaptCLIP Framework

AdaptCLIP

Image 1                  Image 2

Ablation Studies

No. Methods Shots TA VA PQA MVTec VisA
0 baselines 0 91.1 / 33.0 82.1 / 18.0
1 baselines 0 92.2 / 31.4 82.9 / 19.7
2 baselines 0 90.5 / 39.4 81.0 / 22.1
3 joint 0 89.3 / 36.2 81.6 / 21.5
4 alternating 0 93.5 / 38.3 84.8 / 26.1
5 w/o context 1 62.6 / 7.0 85.3 / 28.7
6 w context 1 88.1 / 50.2 88.9 / 38.1
7 AdaptCLIP 1 94.2 / 52.5 92.0 / 38.8

Note: Following previous works, we use AUROC for image-level anomaly classification and AUPR for pixel-level anomaly segmentation in our main paper. Here, we emphasize that AUPR is better for anomaly segmentation, where the imbalance issue is very extreme between normal and anomaly pixels, as pointed out in VisA paper (ECCV 2022). In Appendix, we also provide detailed comparisons using all metrics, including AUROC, AUPR, and F1max.

Complexity and Efficiency Comparisons

Shots Methods CLIP Models Input Size # F+L Params (M) Inf. Time (ms)
0 WinCLIP [16] ViT-B-16+240 240×240 208.4 + 0.0 201.3
0 WinCLIP [16] ViT-B-16+240 512×512 208.4 + 0.0 3912.6
0 AdaCLIP [6] ViT-L/14@336px 518×518 428.8 + 10.7 212.0
0 AnomalyCLIP [53] ViT-L/14@336px 518×518 427.9 + 5.6 154.9
0 AdaptCLIP-Zero ViT-B-16+240 512×512 208.4 + 0.4 49.9
0 AdaptCLIP-Zero ViT-L/14@336px 518×518 427.9 + 0.6 162.2
1 WinCLIP+ [16] ViT-B-16+240 240×240 208.4 + 0.0 339.5
1 WinCLIP+ [16] ViT-B-16+240 512×512 208.4 + 0.0 7434.9
1 InCtrl [54] ViT-B-16+240 240×240 208.4 + 0.3 337.0
1 AnomalyCLIP+ [53] ViT-L/14@336px 518×518 427.9 + 5.6 158.6
1 AdaptCLIP ViT-B-16+240 512×512 208.4 + 1.4 54.0
1 AdaptCLIP ViT-L/14@336px 518×518 427.9 + 1.8 168.2

Note: F means Frozen Parameters (M) and L means Learnable Parameters (M)

ToDo List

  • release pre-trained AdaptCLIP models
  • deploy online AdaptCLIP Demo on HuggingFace Space
  • open testing code
  • open training code

Star History

Star History Chart

Releases

No releases published

Packages

No packages published