camera-ready updates

saandeepa93 · Jul 4, 2024 · 694bd77 · 694bd77
1 parent add1921
commit 694bd77
Show file tree

Hide file tree

Showing 37 changed files with 3,427 additions and 276 deletions.
diff --git a/README.md b/README.md
@@ -1,121 +1,123 @@
-1. Follow Heatmap experiment framework
-2. Explain why FlowCon is better? Density based approach and class specific
-3. Penultimate layer only
-4. RAF <-> AFF evaluation. What is OOD in face expression?
-5. Timeline
+# [ECCV'24] FlowCon: Out-of-Distribution Detection using Flow-Based Contrastive Learning
 
+<!-- :book: Paper: [`ECCV'24`]Bandara_AdaMAE_Adaptive_Masking_for_Efficient_Spatiotemporal_Learning_With_Masked_Autoencoders_CVPR_2023_paper.pdf) and [``arXiv``] -->
 
+### :bulb: Contributions:
 
+![main-method](figures/intuition_orig.png)
+- A new density-based OOD detection technique called FlowCon is proposed. We introduce a new loss function $L_{con}$ which contrastively learns class separability in the probability distribution space. This learning occurs without any external OOD dataset and it operates on fixed classifiers.
 
+- The proposed method is evaluated on various metrics - FPR95, AUROC, AUPR-Success, and AUPR-Error and compared against state of the art. We observe that FlowCon is competitive or outperforms most methods under different OOD conditions. Additionally, FlowCon is stable even for a large number of classes and shows improvement for high-dimensional features
 
-. Literature -> Density based latest on CVPR2020
-  + [A Simple Unified Framework for Detecting Out-of-Distribution](https://proceedings.neurips.cc/paper/2018/file/abdeb6f575ac5c6676b747bca8d09cc2-Paper.pdf)
+- Histogram plots are detailed along with unified manifold approximations (UMAP) embeddings of the trained FlowCon model to respectively showcase it’s OOD detection and class-preserving capabilities. We also show FlowCon’s discriminative capabilities.
 
 
-************************
-  + [Boosting Out-of-distribution Detection with
-Typical Features](https://proceedings.neurips.cc/paper_files/paper/2022/file/82b0c1b954b6ef9f3cfb664a82b201bb-Paper-Conference.pdf)
 
-  + [Heatmap-based Out-of-Distribution Detection](https://openaccess.thecvf.com/content/WACV2023/papers/Hornauer_Heatmap-Based_Out-of-Distribution_Detection_WACV_2023_paper.pdf)
-  + [Beyond AUROC & co. for evaluating
-out-of-distribution detection performance](https://openaccess.thecvf.com/content/CVPR2023W/SAIAD/papers/Humblot-Renaux_Beyond_AUROC__Co._for_Evaluating_Out-of-Distribution_Detection_Performance_CVPRW_2023_paper.pdf)
+<!-- ### Method
+![main-method](figures/new_arch.png)
 
-  + [Out-of-Distribution Detection with Deep Nearest Neighbors](https://proceedings.mlr.press/v162/sun22d/sun22d.pdf)
+### FAR-OOD likelihood plots when $D_{in}=CIFAR10$ on ResNet-18 and WideResNet models.
+![cifar-10RN](figures/cifar10_3.jpg)
+![cifar-10WRN](figures/cifar10_7.jpg) -->
 
-python OOD_Generate_Mahalanobis_exp2.py --dataset raf --net_type resnet --gpu 1 --num_classes 7 --batch 64 --net_c 2
-python OOD_Regression_Mahalanobis.py --net_type resnet
+<!-- 
+### Adaptive mask visualizations from $SSv2$ (samples from $50th$ epoch)
 
+| &nbsp; Video &nbsp;  | Pred. &nbsp;| &nbsp; Error &nbsp; | &nbsp; &nbsp; CAT &nbsp; | Mask | &nbsp; |  Video  | Pred. &nbsp;| &nbsp; Error &nbsp; | &nbsp; &nbsp; CAT  &nbsp; | Mask &nbsp; |
+| ----------- | --------- | --------- | --------- | --------- |--|--------- | --------- | --------- | --------- | --------- |
 
-out_distribution: svhn
- TNR    AUROC  DTACC  AUIN   AUOUT
- 93.82  98.38  94.92  93.02  99.79
-Input noise: Mahalanobis_0.001
+<p float="left">
+  <img src="figs/ssv2-mask-vis-1.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-2.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/ssv2-mask-vis-3.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-4.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/ssv2-mask-vis-5.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-6.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/ssv2-mask-vis-7.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-8.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/ssv2-mask-vis-9.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-10.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/ssv2-mask-vis-11.gif" width="410" />
+  <img src="figs/ssv2-mask-vis-12.gif" width="410" /> 
+</p>
 
-out_distribution: imagenet_resize
- TNR    AUROC  DTACC  AUIN   AUOUT
- 93.34  98.33  94.31  95.42  99.49
-Input noise: Mahalanobis_0.001
+### Adaptive mask visualizations from $K400$ (samples from $50th$ epoch):
 
-out_distribution: lsun_resize
- TNR    AUROC  DTACC  AUIN   AUOUT
- 95.37  98.57  95.39  96.80  99.49
-Input noise: Mahalanobis_0.001
+| &nbsp; Video &nbsp;  | Pred. &nbsp;| &nbsp; Error &nbsp; | &nbsp; &nbsp; CAT &nbsp; | Mask | &nbsp; |  Video  | Pred. &nbsp;| &nbsp; Error &nbsp; | &nbsp; &nbsp; CAT  &nbsp; | Mask &nbsp; |
+| ----------- | --------- | --------- | --------- | --------- |--|--------- | --------- | --------- | --------- | --------- |
 
-###############################
-python OOD_Generate_Mahalanobis_exp2.py --dataset raf --net_type effnet --gpu 1 --num_classes 7 --batch 64 --net_c 1
+<p float="left">
+  <img src="figs/k400-mask-vis-1.gif" width="410" />
+  <img src="figs/k400-mask-vis-2.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/k400-mask-vis-3.gif" width="410" />
+  <img src="figs/k400-mask-vis-4.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/k400-mask-vis-5.gif" width="410" />
+  <img src="figs/k400-mask-vis-6.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/k400-mask-vis-7.gif" width="410" />
+  <img src="figs/k400-mask-vis-8.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/k400-mask-vis-9.gif" width="410" />
+  <img src="figs/k400-mask-vis-10.gif" width="410" /> 
+</p>
+<p float="left">
+  <img src="figs/k400-mask-vis-11.gif" width="410" />
+  <img src="figs/k400-mask-vis-12.gif" width="410" /> 
+</p>
 
+### A comparision
 
-out_distribution: svhn
- TNR    AUROC  DTACC  AUIN   AUOUT 
-100.00 100.00  99.80  99.95 100.00
-Input noise: Mahalanobis_0.0
+Comparison of our adaptive masking with existing random *patch*, *tube*, and *frame* masking for masking ratio of 80\%.} Our adaptive masking approach selects more tokens from the regions with high spatiotemporal information while a small number of tokens from the background.
 
-out_distribution: imagenet_resize
- TNR    AUROC  DTACC  AUIN   AUOUT 
- 99.89  99.86  98.71  99.60  99.96
-Input noise: Mahalanobis_0.0
+![mask-type-comp](figs/adamae-mask-types.jpeg)
 
-out_distribution: lsun_resize
- TNR    AUROC  DTACC  AUIN   AUOUT 
- 99.94  99.87  99.00  99.65  99.96
+## Ablation experiments on SSv2 dataset:
 
-out_distribution: cifar10
- TNR    AUROC  DTACC  AUIN   AUOUT 
- 99.72  99.58  98.08  98.28  99.88
+We use ViT-Base as the backbone for all experiments. MHA $(D=2, d=384)$ denotes our adaptive token sampling network with a depth of two and embedding dimension of $384$.  All pre-trained models are evaluated based on the evaluation protocol described in Sec. 4. The default choice of our *Ada*MAE is highlighted in gray color. The GPU memory consumption is reported for a batch size of 16 on a single GPU.
 
+![ssv2-ablations](figs/adamae-ablations.png)
 
+# Pre-training *Ada*MAE & fine-tuning:
 
+- We closely follow the [VideoMAE](https://github.com/MCG-NJU/VideoMAE.git) pre-trainig receipy, but now with our *adaptive masking* instead of *tube masking*. To pre-train *Ada*MAE, please follow the steps in [``DATASET.md``](readme/DATASET.md), [``PRETRAIN.md``](readme/PRETRAIN.md).
 
+- To check the performance of pre-trained *Ada*MAE please follow the steps in [``DATASET.md``](readme/DATASET.md) and [``FINETUNE.md``](readme/FINETUNE.md).
 
+- To setup the conda environment, please refer [``FINETUNE.md``](readme/INSTALL.md).
 
+# Pre-trained model weights
 
-
-
-
-  {'0.002': [{'lsun_resize': {'AUIN': 0.9159968772185179,
-                                         'AUOUT': 0.9718827769132757,
-                                         'AUROC': 0.9412050521512386,
-                                         'DTACC': 0.8951655801825293,
-                                         'TNR': 0.5205}},
-                        {'imagenet_resize': {'AUIN': 0.7446115293245945,
-                                             'AUOUT': 0.924432614947508,
-                                             'AUROC': 0.8266008474576271,
-                                             'DTACC': 0.7656550195567144,
-                                             'TNR': 0.26639999999999997}},
-                        {'svhn': {'AUIN': 0.3521282700002417,
-                                  'AUOUT': 0.9356399479733257,
-                                  'AUROC': 0.6517361993142128,
-                                  'DTACC': 0.6172117217681737,
-                                  'TNR': 0.15550092194222498}}]}
-
-
-MAHA
-    {'lsun_resize': {'AUIN': 0.14683431702363461,
-                    'AUOUT': 0.634156268287239,
-                    'AUROC': 0.2199678617992177,
-                    'DTACC': 0.5001370273794004,
-                    'TNR': 0.016700000000000048}}
-{'imagenet_resize': {'AUIN': 0.17565336244423915,
-                  'AUOUT': 0.7322783484630648,
-                  'AUROC': 0.37478556062581486,
-                  'DTACC': 0.5123157105606259,
-                  'TNR': 0.06899999999999995}},
-  {'svhn': {'AUIN': 0.1357192935527043,
-            'AUOUT': 0.9339393085827903,
-            'AUROC': 0.6118344015869747,
-            'DTACC': 0.5894820856328467,
-            'TNR': 0.1849646588813768}},
-
-
-
-git remote set-url origin [email protected]:saandeepa93/FlowCon_OOD.git
-ssh -vT [email protected]
-
-ssh-keygen -t ed25519 -C "[email protected]"
-
-
-SHA256:ggc57KKdIdFJSUm8Ol7yuFyY8ZAoTrOJF6q9rKl/W4o
-
-
-
-git remote set-url origin [email protected]:saandeepa93/FlowCon_OOD.git
+- Download the pre-trained model weights for SSv2 and K400 datasets [``here``](https://github.com/wgcban/adamae/releases/tag/v1).
+
+## Acknowledgement:
+Our AdaMAE codebase is based on the implementation of VideoMAE paper. We thank the authors of the [VideoMAE](https://github.com/MCG-NJU/VideoMAE.git) for making their code available to the public.
+
+
+
+## Citation:
+```
+@InProceedings{Bandara_2023_CVPR,
+    author    = {Bandara, Wele Gedara Chaminda and Patel, Naman and Gholami, Ali and Nikkhah, Mehdi and Agrawal, Motilal and Patel, Vishal M.},
+    title     = {AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders},
+    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month     = {June},
+    year      = {2023},
+    pages     = {14507-14517}
+}
+``` -->
diff --git a/configs/experiments/cifar10/cifar10_10.yaml b/configs/experiments/cifar10/cifar10_10.yaml
@@ -0,0 +1,49 @@
+PATHS:
+  DATA_ROOT: ./data
+  VIS_PATH: ./assets/loader/
+
+FLOW:
+  N_FLOW: 1
+  N_BLOCK: 8
+  IN_FEAT: 128
+  MLP_DIM: 256
+  INIT_ZEROS: False
+  DROPOUT: TRUE
+
+DATASET:
+  IN_DIST: cifar10
+  N_CLASS: 10
+  IMG_SIZE: 32
+  NUM_WORKERS: 2
+  AUG: True
+  W_SAMPLER: True
+
+TRAINING:
+  ITER: 701
+  BATCH: 64
+  LR: 1e-5
+  WT_DECAY: 1e-5
+  MOMENTUM: 0.9
+  DROPOUT: False
+  PRETRAINED: wideresnet
+  PRT_CONFIG: 5
+  PRT_LAYER: 3
+
+LR:
+  WARM: False
+  ADJUST: False
+  WARM_ITER: 50
+  WARMUP_FROM: 1e-6
+  DECAY_RATE: 0.1
+  MIN_LR: 1e-6
+  T_MAX: 100
+
+TEST:
+  EMP_PARAMS: True
+  SCORE: True
+  MAGNITUDE: 0.0024
+  IN_FEATS: [64, 128, 128, 512]
+
+COMMENTS:
+  RESNER CIFAR TRAINING with cosine scheduler
+
diff --git a/configs/experiments/cifar10/cifar10_9.yaml b/configs/experiments/cifar10/cifar10_9.yaml
@@ -0,0 +1,49 @@
+PATHS:
+  DATA_ROOT: ./data
+  VIS_PATH: ./assets/loader/
+
+FLOW:
+  N_FLOW: 1
+  N_BLOCK: 8
+  IN_FEAT: 512
+  MLP_DIM: 256
+  INIT_ZEROS: False
+  DROPOUT: TRUE
+
+DATASET:
+  IN_DIST: cifar10
+  N_CLASS: 10
+  IMG_SIZE: 32
+  NUM_WORKERS: 2
+  AUG: True
+  W_SAMPLER: True
+
+TRAINING:
+  ITER: 701
+  BATCH: 128
+  LR: 1e-5
+  WT_DECAY: 1e-5
+  MOMENTUM: 0.9
+  DROPOUT: False
+  PRETRAINED: resnet18
+  PRT_CONFIG: 9
+  PRT_LAYER: 4
+
+LR:
+  WARM: False
+  ADJUST: False
+  WARM_ITER: 50
+  WARMUP_FROM: 1e-6
+  DECAY_RATE: 0.1
+  MIN_LR: 1e-6
+  T_MAX: 100
+
+TEST:
+  EMP_PARAMS: True
+  SCORE: True
+  MAGNITUDE: 0.00
+  IN_FEATS: [64, 128, 256, 512]
+
+COMMENTS:
+  RESNER CIFAR TRAINING with cosine scheduler
+
diff --git a/configs/experiments/cifar100/cifar100_10.yaml b/configs/experiments/cifar100/cifar100_10.yaml
@@ -0,0 +1,52 @@
+PATHS:
+  DATA_ROOT: ./data
+  VIS_PATH: ./assets/loader/
+
+FLOW:
+  N_FLOW: 1
+  N_BLOCK: 12
+  IN_FEAT: 128
+  MLP_DIM: 256
+  INIT_ZEROS: False
+  DROPOUT: TRUE
+
+DATASET:
+  IN_DIST: cifar100
+  N_CLASS: 100
+  IMG_SIZE: 32
+  NUM_WORKERS: 2
+  AUG: True
+  W_SAMPLER: True
+
+TRAINING:
+  ITER: 701
+  BATCH: 64
+  LR: 1e-5
+  WT_DECAY: 1e-5
+  MOMENTUM: 0.9
+  DROPOUT: False
+  PRETRAINED: wideresnet
+  PRT_CONFIG: 10
+  PRT_LAYER: 3
+
+LR:
+  WARM: False
+  ADJUST: False
+  WARM_ITER: 50
+  WARMUP_FROM: 1e-6
+  DECAY_RATE: 0.1
+  MIN_LR: 1e-6
+  T_MAX: 100
+
+LOSS:
+  LMBDA_MIN: 0.37
+
+TEST:
+  EMP_PARAMS: True
+  SCORE: True
+  MAGNITUDE: 0.005
+  IN_FEATS: [64, 128, 128, 512]
+
+COMMENTS:
+  RESNER CIFAR TRAINING with cosine scheduler
+