Skip to content

Latest commit

 

History

History
130 lines (94 loc) · 30.4 KB

README.md

File metadata and controls

130 lines (94 loc) · 30.4 KB

Harmonic-NAS

Harmonic-NAS is a novel framework for the design of multimodal neural networks on resource-constrained devices. It employs a two-tier optimization strategy with a first-stage evolutionary search for the unimodal backbone networks and a second-stage differentiable search for the multimodal fusion network architecture. Harmonic-NAS also includes the hardware dimension within its optimization procedure by including the inference latency and energy consumption as optimization objectives for an optimal deployment on resource-constrained devices.

framework

Paper and Supplementary

Please find our arXiv version here for the full paper with additional results. Our paper has been accepted for publication in the 15th Asian Conference on Machine Learning (ACML 2023)

Requirements

  • Python version: tested in Python 3.8.10
  • Install the software environment in the yaml file environment.yml

Code Structure

Harmonic-NAS/
 ├── backbones/
 |    ├── maxout/ --- Our Maxout network configuration
 |    └── ofa/ --- Essential scripts from once-for-all for supernet specifications
 | 
 ├── configs/ --- Running configs for Harmonic-NAS search
 ├── data/ --- Essential scripts for data loading for our various datasets
 ├── evaluate/
 |    ├── backbone_eval/
 |    |    ├── accuracy/ --- Essential scripts for evaluating the accuracy of the explored uni/multi-modal models
 |    |    └── efficiency/ --- LUTs for evaluating the efficiency of our modality specific supernets on the targeted Edge devices
 |    └── fusion_eval/ --- LUTs for evaluating the efficiency of our fusion operators on the targeted Edge devices
 | 
 ├── fusion_search/ --- Scripts for the second-stage of optimization (fusion search)
 ├── saved_supernets/ --- Pretrained supernets for different modalities/datasets
 ├── utils/ --- Essential scripts for managing distributed training/evaluation across multiple GPUs
 ├── best_mm_model.py --- Script for the fusion micro-architecture seach for our best found multimodal models
 └── search_algo.py --- Main script for Harmonic-NAS search

Pretrained Supernets on Multimodal Datasets:

The following table provides a list of the employed backbones and supernets with their weights:

Dataset Modality Baseline Model Architecture Max subnet Accuracy Pretrained weights
AV-MNIST Image ofa_mbv3_d234_e346_k357_w1.0 TOP1-Acc: 86.44% Link
AV-MNIST Audio ofa_mbv3_d234_e346_k357_w1.0 TOP1-Acc: 88.22% Link
MM-IMDB Image ofa_mbv3_d234_e346_k357_w1.2 F1-W: 46.26% Link
MM-IMDB Text Maxout F1-W: 61.21% Link
Memes_Politics Image ofa_mbv3_d234_e346_k357_w1.0 TOP1-Acc: 84.78% Link
Memes_Politics Text Maxout TOP1-Acc: 83.38% Link

Dataset Pre-processing

AV-MNIST dataset:

Donwload the AV-MNIST dataset by following the instructions provided in SMIL ,or uploed it direcly from Here.

MM-IMDB dataset:

Download the multimodal_imdb.hdf5 file from the original repo of MM-IMDB using the Link.
Use the pre-processing script to split the dataset.

$ python data/mmimdb/prepare_mmimdb.py

Memes-Politics dataset:

To download the different files for Meme Images and Annotations

Harm-P: Link

Entity features: Link

ROI features: Link

To download the required vocabulary file:

$ wget https://openaipublic.azureedge.net/clip/bpe_simple_vocab_16e6.txt.gz -O bpe_simple_vocab_16e6.txt.gz

Run Experiments

In Harmonic-NAS, we conducted experiments within a distributed environment (i.e., clusters of GPUs). To replicate these experiments, follow these steps:
Modify the configuration file located in ./configs to match your customized settings.
Run the following command to initiate the Harmonic-NAS search.

$ python search_algo_DATASET.py

To reproduce the results achieved by our top-performing multimodal models without undergoing the entire Harmonic-NAS search process, simply specify the desired backbones architectures and the fusion macro-architecture (as detailed in Best Models Configuration) within the following script:

$ python best_mm_model_DATASET.py

Best Models Configuration

The architectural configuration of our top-performing multimodal models and their efficiency on the NVIDIA Jetson TX2, as described in our paper Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices.

AV-MNIST

Image Backbone Audio Backbone Fusion Network Multimodal Evaluation
Acc K E D Acc K E D Cells Nodes Acc Lat Enr
79.77 [5,5,5,5] [3,3,4,3] [2] 85.55 [3,3,7,3] [3,3,3,6] [2] 2 1 92.88 8.96 13.93
77.55 [3,5,7,3] [3,3,3,6] [2] 85.77 [3,5,5,5] [3,3,3,3] [2] 3 4 95.55 14.41 25.49
82.66 [5,5,5,7] [3,6,4,3] [2] 85.55 [3,3,7,5] [3,3,3,6] [2] 2 1 95.33 9.11 13.88

MM-IMDB

Image Backbone Text Backbone Fusion Network Multimodal Evaluation
F1-W K E D F1-W Maxout Cells Nodes F1-W Lat Enr
44.69 [3,3,5,7,3,7,7,5,7,7,7,7,5,3,3,5,5,5,3,5] [3,3,6,6,4,4,4,3,3,4,6,6,4,3,6,3,6,4,3,3] [2,2,3,2,2] 61.18 hidden_features: 128, n_blocks: 2, factor_multiplier: 2 2 1 63.61 21.37 113.99
45.22 [5,5,5,3,7,7,7,3,7,7,5,7,5,3,5,7,7,5,7,5] [6,4,4,3,4,4,3,6,4,3,3,4,6,3,4,3,6,4,4,6] [4,2,3,2,3] 1 1 64.36 28.68 163.04
44.96 [3,3,3,5,5,7,5,3,3,5,7,7,5,3,3,5,7,5,5,5] [4,3,3,4,6,4,3,3,6,4,3,3,4,4,6,6,6,4,4,6] [2,2,3,2,3] 1 1 64.27 23.67 121.75

Memes-P

Image Backbone Text Backbone Fusion Network Multimodal Evaluation
Acc K E D Acc Maxout Cells Nodes Acc Lat Enr
86.19 [3,3,3,3,3,5,3,3,3,7,3,5] [4,3,4,6,6,6,3,6,3,6,6,6] [2,2,2] 83.38 hidden_features: 128, n_blocks: 2, factor_multiplier: 2 1 2 88.45 10.51
25.63
85.91 [3,3,3,3,5,3,3,3,5,5,3,5] [4,3,4,6,4,4,3,6,6,6,3,4] [2,3,2] 2 3 90.42 12.47 31.92
85.91 [3,3,3,7,5,5,3,3,7,7,3,3] [4,4,3,4,6,3,4,3,4,6,3,6] [2,2,2] 2 2 90.14 11.11 26.63

Results Visualization

To visualize our multimodal models, we employ the BM-NAS plotter tool.
You can simply visulize the found fusion architectures by setting plot_arch=True when calling train_darts_model().

AV-MNIST Architecture

Citation

If you find this implementation helpful, please consider citing our work:

@inproceedings{ghebriout2024harmonic,
  title={Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices},
  author={Ghebriout, Mohamed Imed Eddine and Bouzidi, Halima and Niar, Smail and Ouarnoughi, Hamza},
  booktitle={Asian Conference on Machine Learning},
  pages={374--389},
  year={2024},
  organization={PMLR}
}