Skip to content

ExeCuteRunrunrun/Multi-Modal-Dialogue-System-Paperlist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Multi-Modal-Dialgoue-System-Paperlist

This is a paper list for the multimodal dialogue systems topic.

Keyword: Multi-modal, Dialogue system, visual, conversation

Paperlist

Dataset & Challenges

Images

(1) Visual QA VQA datasets in CVPR2021,2020,2019,..., containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

(2) Visual Dialog CVPR 2017, Open-domain dialogs & given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

(3) CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog NAACL2019, [code]

(4) Open-domain:

(?) sentiment

(5) Task/Goal-oriented:

(6) evaluation - A Revised Generative Evaluation of Visual Dialogue [Code] arXiv2020 - Evaluating Visual Conversational Agents via Cooperative Human-AI Games [Code for GuessWhich] 2017 - The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues EACL2021

(7) classification

(?) Others

(8) [Image caption] generating natural language description of an image

(9) Navigation task

(10) retrieval task

(11) image editing / text-to-image

  • [Sequential Attention GAN for Interactive Image Editing] ACM2020
  • [Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction] ICCV2019
  • [ChatPainter: Improving Text to Image Generation using Dialogue] ICLR2018
  • [Adversarial Text-to-Image Synthesis: A Review] 2021
  • [A Multimodal Dialogue System for Conversational Image Editing] 2020

(12) Fashion 🌟🌟🌟 ----F-a-s-h-i-o-n----

Video

(13) video

Charts / figures

(14) LEAF-QA: Locate, Encode & Attend for Figure Question Answering

Meme

(15) MOD Meme incorporated Open Dialogue WeChat conversations with meme / stickers in Chinese language.

  • A Multimodal Memes Classification: A Survey and Open Research Issues
  • [Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog] WWW2020
  • [Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog] 2020
  • [The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes] NeuIPS2020

Survey

Other github paperlists

In general

  • Tasks
    • Visual Question Answering,
    • Visual dialog
    • Visual Commonsense Reasoning,
    • Image-Text Retrieval,
    • Referring Expression Comprehension,
    • Visual Entailment
    • NL+V representation ==> multimodal pretraining
  • Issues / topics:
    • text and image bias
    • VL or LV bertologie
    • visual understanding / reasoning / object relation
    • cross-modal text-image relation (attention on interaction)
    • incorporate knowledge / common sense (attention on knowledge)
  • Often used model-elements :
  • often mentioned approaches:
    • adversarial training
    • reinforcement learning
    • graph neural network
    • joint learning / parel / Dual encoder / Dual attention
  • my questions
    • what does "adaptive" mean? why everyone likes this specific word?
    • "ground", mysterious word too...
    • often can't find many codes for papers with "graph" or "reinforcement learning" in title ???

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published