m5-multimodal-encoder-decoder Two-branch RGB-D scene classification network–RGB: ImageNet-Alexnet(perhaps fine tuned) Depth: ImageNet-Alexnet(perhaps fine tuned) HHA input has also 3 channels Classifier: one or several fully connected layers