multimodal-vqa Developing multimodal models for visual question answering (VQA) on a variety of datasets like NLVR2, GQA