Harmonic-NAS

Harmonic-NAS is a novel framework for the design of multimodal neural networks on resource-constrained devices. It employs a two-tier optimization strategy with a first-stage evolutionary search for the unimodal backbone networks and a second-stage differentiable search for the multimodal fusion network architecture. Harmonic-NAS also includes the hardware dimension within its optimization procedure by including the inference latency and energy consumption as optimization objectives for an optimal deployment on resource-constrained devices.

Paper and Supplementary

Please find our arXiv version here for the full paper with additional results. Our paper has been accepted for publication in the 15th Asian Conference on Machine Learning (ACML 2023)

Requirements

Python version: tested in Python 3.8.10
Install the software environment in the yaml file environment.yml

Code Structure

Harmonic-NAS/
 ├── backbones/
 |    ├── maxout/ --- Our Maxout network configuration
 |    └── ofa/ --- Essential scripts from once-for-all for supernet specifications
 | 
 ├── configs/ --- Running configs for Harmonic-NAS search
 ├── data/ --- Essential scripts for data loading for our various datasets
 ├── evaluate/
 |    ├── backbone_eval/
 |    |    ├── accuracy/ --- Essential scripts for evaluating the accuracy of the explored uni/multi-modal models
 |    |    └── efficiency/ --- LUTs for evaluating the efficiency of our modality specific supernets on the targeted Edge devices
 |    └── fusion_eval/ --- LUTs for evaluating the efficiency of our fusion operators on the targeted Edge devices
 | 
 ├── fusion_search/ --- Scripts for the second-stage of optimization (fusion search)
 ├── saved_supernets/ --- Pretrained supernets for different modalities/datasets
 ├── utils/ --- Essential scripts for managing distributed training/evaluation across multiple GPUs
 ├── best_mm_model.py --- Script for the fusion micro-architecture seach for our best found multimodal models
 └── search_algo.py --- Main script for Harmonic-NAS search

Pretrained Supernets on Multimodal Datasets:

The following table provides a list of the employed backbones and supernets with their weights:

Dataset	Modality	Baseline Model Architecture	Max subnet Accuracy	Pretrained weights
AV-MNIST	Image	ofa_mbv3_d234_e346_k357_w1.0	TOP1-Acc: 86.44%	Link
AV-MNIST	Audio	ofa_mbv3_d234_e346_k357_w1.0	TOP1-Acc: 88.22%	Link
MM-IMDB	Image	ofa_mbv3_d234_e346_k357_w1.2	F1-W: 46.26%	Link
MM-IMDB	Text	Maxout	F1-W: 61.21%	Link
Memes_Politics	Image	ofa_mbv3_d234_e346_k357_w1.0	TOP1-Acc: 84.78%	Link
Memes_Politics	Text	Maxout	TOP1-Acc: 83.38%	Link

Dataset Pre-processing

AV-MNIST dataset:

Donwload the AV-MNIST dataset by following the instructions provided in SMIL ,or uploed it direcly from Here.

MM-IMDB dataset:

Download the multimodal_imdb.hdf5 file from the original repo of MM-IMDB using the Link.
Use the pre-processing script to split the dataset.

$ python data/mmimdb/prepare_mmimdb.py

Memes-Politics dataset:

To download the different files for Meme Images and Annotations

Harm-P: Link

Entity features: Link

ROI features: Link

To download the required vocabulary file:

$ wget https://openaipublic.azureedge.net/clip/bpe_simple_vocab_16e6.txt.gz -O bpe_simple_vocab_16e6.txt.gz

Run Experiments

In Harmonic-NAS, we conducted experiments within a distributed environment (i.e., clusters of GPUs). To replicate these experiments, follow these steps:
Modify the configuration file located in ./configs to match your customized settings.
Run the following command to initiate the Harmonic-NAS search.

$ python search_algo_DATASET.py

To reproduce the results achieved by our top-performing multimodal models without undergoing the entire Harmonic-NAS search process, simply specify the desired backbones architectures and the fusion macro-architecture (as detailed in Best Models Configuration) within the following script:

$ python best_mm_model_DATASET.py

Best Models Configuration

The architectural configuration of our top-performing multimodal models and their efficiency on the NVIDIA Jetson TX2, as described in our paper Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices.

AV-MNIST

Image Backbone

Audio Backbone

Fusion Network

Multimodal Evaluation

Acc

K

E

D

Acc

K

E

D

Cells

Nodes

Acc

Lat

Enr

79.77

[5,5,5,5]

[3,3,4,3]

[2]

85.55

[3,3,7,3]

[3,3,3,6]

[2]

2

1

92.88

8.96

13.93

77.55

[3,5,7,3]

[3,3,3,6]

[2]

85.77

[3,5,5,5]

[3,3,3,3]

[2]

3

4

95.55

14.41

25.49

82.66

[5,5,5,7]

[3,6,4,3]

[2]

85.55

[3,3,7,5]

[3,3,3,6]

[2]

2

1

95.33

9.11

13.88

MM-IMDB

Image Backbone				Text Backbone		Fusion Network		Multimodal Evaluation
F1-W	K	E	D	F1-W	Maxout	Cells	Nodes	F1-W	Lat	Enr
44.69	[3,3,5,7,3,7,7,5,7,7,7,7,5,3,3,5,5,5,3,5]	[3,3,6,6,4,4,4,3,3,4,6,6,4,3,6,3,6,4,3,3]	[2,2,3,2,2]	61.18	hidden_features: 128, n_blocks: 2, factor_multiplier: 2	2	1	63.61	21.37	113.99
45.22	[5,5,5,3,7,7,7,3,7,7,5,7,5,3,5,7,7,5,7,5]	[6,4,4,3,4,4,3,6,4,3,3,4,6,3,4,3,6,4,4,6]	[4,2,3,2,3]			1	1	64.36	28.68	163.04
44.96	[3,3,3,5,5,7,5,3,3,5,7,7,5,3,3,5,7,5,5,5]	[4,3,3,4,6,4,3,3,6,4,3,3,4,4,6,6,6,4,4,6]	[2,2,3,2,3]			1	1	64.27	23.67	121.75

Memes-P

Image Backbone				Text Backbone		Fusion Network		Multimodal Evaluation
Acc	K	E	D	Acc	Maxout	Cells	Nodes	Acc	Lat	Enr
86.19	[3,3,3,3,3,5,3,3,3,7,3,5]	[4,3,4,6,6,6,3,6,3,6,6,6]	[2,2,2]	83.38	hidden_features: 128, n_blocks: 2, factor_multiplier: 2	1	2	88.45	10.51	25.63
85.91	[3,3,3,3,5,3,3,3,5,5,3,5]	[4,3,4,6,4,4,3,6,6,6,3,4]	[2,3,2]			2	3	90.42	12.47	31.92
85.91	[3,3,3,7,5,5,3,3,7,7,3,3]	[4,4,3,4,6,3,4,3,4,6,3,6]	[2,2,2]			2	2	90.14	11.11	26.63

Results Visualization

To visualize our multimodal models, we employ the BM-NAS plotter tool.
You can simply visulize the found fusion architectures by setting plot_arch=True when calling train_darts_model().

Citation

If you find this implementation helpful, please consider citing our work:

@inproceedings{ghebriout2024harmonic,
  title={Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices},
  author={Ghebriout, Mohamed Imed Eddine and Bouzidi, Halima and Niar, Smail and Ouarnoughi, Hamza},
  booktitle={Asian Conference on Machine Learning},
  pages={374--389},
  year={2024},
  organization={PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backbones		backbones
configs		configs
data		data
evaluate		evaluate
fusion_search		fusion_search
utils		utils
README.md		README.md
avmnist_arch.png		avmnist_arch.png
best_mm_model_avmnist.py		best_mm_model_avmnist.py
best_mm_model_imdb.py		best_mm_model_imdb.py
best_mm_model_memes.py		best_mm_model_memes.py
environment.yml		environment.yml
framework-1.png		framework-1.png
search_algo_avmnist.py		search_algo_avmnist.py
search_algo_imdb.py		search_algo_imdb.py
search_algo_memes.py		search_algo_memes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harmonic-NAS

Paper and Supplementary

Requirements

Code Structure

Pretrained Supernets on Multimodal Datasets:

Dataset Pre-processing

AV-MNIST dataset:

MM-IMDB dataset:

Memes-Politics dataset:

Run Experiments

Best Models Configuration

AV-MNIST

MM-IMDB

Memes-P

Results Visualization

Citation

About

Releases

Packages

Contributors 2

Languages

Mohamed-Imed-Eddine/Harmonic-NAS

Folders and files

Latest commit

History

Repository files navigation

Harmonic-NAS

Paper and Supplementary

Requirements

Code Structure

Pretrained Supernets on Multimodal Datasets:

Dataset Pre-processing

AV-MNIST dataset:

MM-IMDB dataset:

Memes-Politics dataset:

Run Experiments

Best Models Configuration

AV-MNIST

MM-IMDB

Memes-P

Results Visualization

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages