Skip to content

GupShup: Summarizing Open-Domain Code-Switched Conversations EMNLP 2021

License

Notifications You must be signed in to change notification settings

midas-research/gupshup

Repository files navigation

Gupshup

GupShup: Summarizing Open-Domain Code-Switched Conversations EMNLP 2021 Paper: https://aclanthology.org/2021.emnlp-main.499.pdf

Dataset

Please request for the Gupshup data using this Google form.

Dataset is available for Hinglish Dilaogues to English Summarization(h2e) and English Dialogues to English Summarization(e2e). For each task, Dialogues/conversastion have .source(train.source) as file extension whereas Summary has .target(train.target) file extension. ".source" file need to be provided to input_path and ".target" file to reference_path argument in the scripts.

Models

All model weights are available on the Huggingface model hub. Users can either directly download these weights in their local and provide this path to model_name argument in the scripts or use the provided alias (to model_name argument) in scripts directly; this will lead to download weights automatically by scripts.

Model names were aliased in "gupshup_TASK_MODEL" sense, where "TASK" can be h2e,e2e and MODEL can be mbart, pegasus, etc., as listed below.

1. Hinglish Dialogues to English Summary (h2e)

Model Huggingface Alias
mBART midas/gupshup_h2e_mbart
PEGASUS midas/gupshup_h2e_pegasus
T5 MTL midas/gupshup_h2e_t5_mtl
T5 midas/gupshup_h2e_t5
BART midas/gupshup_h2e_bart
GPT-2 midas/gupshup_h2e_gpt

2. English Dialogues to English Summary (e2e)

Model Huggingface Alias
mBART midas/gupshup_e2e_mbart
PEGASUS midas/gupshup_e2e_pegasus
T5 MTL midas/gupshup_e2e_t5_mtl
T5 midas/gupshup_e2e_t5
BART midas/gupshup_e2e_bart
GPT-2 midas/gupshup_e2e_gpt

Inference

Using command line

  1. Clone this repo and create a python virtual environment (https://docs.python.org/3/library/venv.html). Install the required packages using
git clone https://github.com/midas-research/gupshup.git
pip install -r requirements.txt
  1. run_eval script has the following arguments.
  • model_name : Path or alias to one of our models available on Huggingface as listed above.
  • input_path : Source file or path to file containing conversations, which will be summarized.
  • save_path : File path where to save summaries generated by the model.
  • reference_path : Target file or path to file containing summaries, used to calculate matrices.
  • score_path : File path where to save scores.
  • bs : Batch size
  • device: Cuda devices to use.

Please make sure you have downloaded the Gupshup dataset using the above google form and provide the correct path to these files in the argument's input_path and refrence_path. Or you can simply put test.source and test.target in data/h2e/(hinglish to english) or data/e2e/(english to english) folder. For example, to generate English summaries from Hinglish dialogues using the mbart model, run the following command

python run_eval.py \
    --model_name midas/gupshup_h2e_mbart \
    --input_path  data/h2e/test.source \
    --save_path generated_summary.txt \
    --reference_path data/h2e/test.target \
    --score_path scores.txt \
    --bs 8

Another example, to generate English summaries from English dialogues using the Pegasus model

python run_eval.py \
    --model_name midas/gupshup_e2e_pegasus \
    --input_path  data/e2e/test.source \
    --save_path generated_summary.txt \
    --reference_path data/e2e/test.target \
    --score_path scores.txt \
    --bs 8

In Google collaboratory

Please create a copy of this Notebook on Google colab or upload gupshup_notebook.ipynb on google collab and follow the instructions in it.

Streamlit UI

  1. Clone this repo and Create a python virtual environment (https://docs.python.org/3/library/venv.html). Install the required packages using
git clone https://github.com/midas-research/gupshup.git
pip install -r requirements.txt
  1. use Streamlit UI to make inferences from the choice of your models and tasks. To start the Streamlit Server:
streamlit run app.py

Image of Streamlit App

Please create an issue if you are facing any difficulties in replicating the results.

References

Please cite [1] if you found the resources in this repository useful.

[1] Mehnaz, Laiba, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle G. Lee, Anish Acharya, and Rajiv Shah. GupShup: Summarizing Open-Domain Code-Switched Conversations

@inproceedings{mehnaz2021gupshup,
  title={GupShup: Summarizing Open-Domain Code-Switched Conversations},
  author={Mehnaz, Laiba and Mahata, Debanjan and Gosangi, Rakesh and Gunturi, Uma Sushmitha and Jain, Riya and Gupta, Gauri and Kumar, Amardeep and Lee, Isabelle G and Acharya, Anish and Shah, Rajiv},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={6177--6192},
  year={2021}
}

About

GupShup: Summarizing Open-Domain Code-Switched Conversations EMNLP 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published