Conversational Robot

Robotics Club Summer Project 2020

Mentors

Team Members

Group A - Varun Khatri, Prateek Jain, Adit Khokhar, Atharva Umbarkar, Ishir Roongta
- Group Repo - [https://github.com/isro01/Conv_bot]
Group B - Shiven Tripathi, Prakhar Pradhan, Mohd Muzzammil, Sidhartha Watsa, Azhar Tanweer
- Group Repo - [https://github.com/conversational-robot]
Group C - Ambuja Budakoti, Devansh Mishra, Hem Shah, Kavya Agarwal, Preeti Kumari
- Group Repo - [https://github.com/AmbujaBudakoti27/ConversationalRobot]
Group D - Abhay Dayal Mathur, Amitesh Singh Sisodia, Anchal Gupta, Arpit Verma, Manit Ajmera, Sanskar Mittal
- Group Repo - [https://github.com/Amitesh163/ConvBot_group]

Aim

The aim of this project was to make a Talking bot, one which can pay attention to the user's voice and generate meaningful and contextual responses according to their intent, much like human conversations.

Ideation

This project was divided into overall three parts :

Overall Pipeline of the Project

Speech Recognition

We used google-speech-to-text (gstt) API for the conversion of speech to text transcripts with a WER(Word Error Rate) of 4.7%.

Response Generation

We used a subset of the OpenSubtitles ^[4] dataset to train our response generation model, which was a combination of Context-based and Topic-based Attention Model.
This model has an encoder network which produces context vector for an input sentence followed by an attention mechanism which decides how much attention is to be paid to a particular word in a sentence and finally a decoder network which uses attention weights and context vectors to generate words of the output sentence i.e. response. We also added an AIML pipeline to our model for responding to some specific pattern of inputs which include greetings, emotions, jokes etc and also added Weather Forecasting and Googling capabilities.
Some of the output examples that we've produced with our model are:

Text to speech conversion

We used the google-text-to-speech (gtts) API for the conversion of text transcripts of responses back to speech.
The API uses playsound to play a temporary mp3 file created from the model's textual response.

Usage

Install the required dependencies :

$pip install -r requirements.txt
$sudo apt-get install gstreamer-1.0

Training checkpoints, LDA model weights and tokens can be found here

Required File Structure:

Response Generation
├── bin
│   ├── LDA
│   ├── Tokens.txt
│   ├── topic_dict.dict
│   ├── training_checkpoints
│   └── glove.42B.300d.txt
└── ...

Running the bot

usage: bot.py [-h] [-m {msg,trigger}]

The bot.

optional arguments:
  -h, --help            show this help message and exit
  -m {msg,trigger}, --mode {msg,trigger}
                        Mode of execution : Message box/ Trigger word
                        detection

Modes

Message Box - Provides a GUI for the user to start the conversation at the click of a button.
Trigger Word Detection - The program listens in the background and starts the conversation upon hearing the trigger word.
- Commencement Trigger - Hello
- Concluding Trigger - Bye

Functionality

Casual Conversations
Google search along with an explicit search feature for images
Weather Information

Demonstration

The video demonstration of this project can be found here.

References

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Link : [https://arxiv.org/abs/1512.02595]
- Author(s)/Organization : Baidu Research – Silicon Valley AI Lab
- Tags : Speech Recognition
- Published : 8 Dec, 2015
Topic Aware Neural Response Generation
- Link : [https://arxiv.org/abs/1606.08340]
- Authors : Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma
- Tags : Neural response generation; Sequence to sequence model; Topic aware conversation model; Joint attention; Biased response generation
- Published : 21 Jun 2016 (v1), 19 Sep 2016 (v2)
Topic Modelling and Event Identification from Twitter Textual Data
- Link : [https://arxiv.org/abs/1608.02519]
- Authors : Marina Sokolova, Kanyi Huang, Stan Matwin, Joshua Ramisch, Vera Sazonova, Renee Black, Chris Orwa, Sidney Ochieng, Nanjira Sambuli
- Tags : Latent Dirichlet Allocation; Topic Models; Statistical machine translation
- Published : 8 Aug 2016
OpenSubtitles (Dataset)
- Link : [http://opus.nlpl.eu/OpenSubtitles-v2018.php]

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
AIML		AIML
Response Generation		Response Generation
SpeechRecognition		SpeechRecognition
TextToSpeech		TextToSpeech
images		images
.gitignore		.gitignore
README.md		README.md
bot.py		bot.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Robot

Mentors

Team Members

Aim

Ideation

Overall Pipeline of the Project

Speech Recognition

Response Generation

Text to speech conversion

Usage

Running the bot

Modes

Functionality

Demonstration

References

About

Releases

Packages

Contributors 5

Languages

RoboticsClubIITK/Convobot-2020

Folders and files

Latest commit

History

Repository files navigation

Conversational Robot

Mentors

Team Members

Aim

Ideation

Overall Pipeline of the Project

Speech Recognition

Response Generation

Text to speech conversion

Usage

Running the bot

Modes

Functionality

Demonstration

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages