Updated main.ipynb and readme.md

pra-dan · Mar 1, 2020 · b68dd0a · b68dd0a
1 parent 73aa242
commit b68dd0a
Show file tree

Hide file tree

Showing 3 changed files with 79 additions and 1,721 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Speech Censor Bot
+# Speech Censor Bot :speak_no_evil:
 
 ![Intro](https://github.com/PrashantDandriyal/Speech-Censor-Bot/blob/master/DocsResources/logo.PNG)
 ## Introduction:
@@ -7,18 +7,41 @@ The need for content monitoring has been the prevailing need ever since the birt
 ## Purpose: 
 To use OpenVINO to deploy a speech censor bot at the edge for censoring unwanted words such as cuss words in video or audio speech.
 
-
-## Installation
+## Usage
+The current version includes a already set-up Google Collab notebook. This saves users from manually installing dependencies. Just open the main.ipynb using your Google account, and you are ready to go. The project has been tested on certain _WAV_ files that are provided in the [Sample WAV files](https://github.com/PrashantDandriyal/Speech-Censor-Bot/tree/master/Sample%20WAV%20files) directory. For using custom files instead, the _TODO_ section in the notebook can be edited. The result files: censored audio in case of speech censoring and censored video in case of video censoring are exported to the ```\content``` directory in the notebook workspace. 
 
 ### Dependencies
-SOX: [Referene](https://explainshell.com/explain?cmd=sox+-r+48000+-b+16+-e+unsigned-integer+IMG_5367.raw+image.ogg+)
+* OpenVINO toolkit : The Open Visual Inference and Neural Network Optimization toolkit provides improved inference on edge devices by creating intermediate model files (_.bin + .xml_). This toolkit is pulled and installed by [OpenDevLibrary](https://github.com/alihussainia/OpenDevLibrary), an Open Source installer for OpenVINO on Google collab.
+* Sound Exchange [SoX](http://sox.sourceforge.net/): Cross-platform command line utility that is Python-independent, used for all the sound processing.
+* [wave](https://docs.python.org/2/library/wave.html): Python module for interfacing multiple audio format files with Python.
+* [ffmpeg](https://www.ffmpeg.org/): Cross-platform solution to process audio and video data, used for overwriting the original audio of video file (if using video censoring) with output censored audio.
+
 
 ## Understanding the process
+### Section 1: FOR AUDIO AND VIDEO SPEECH
 ![Methodology](https://github.com/PrashantDandriyal/Speech-Censor-Bot/blob/master/CussWordBot.png)
-
-##Results
+* Setup OpenVINO environment (automatically handled by OpenDevLibrary)
+* Generate configuration and executable files for inference. Run ```demo_speech_recognition.sh```. While using Google Collab, this file needs to be replaced or edited to comment out the online speech demo tests. This is done as Collab doesn't support the GUI-based applications and hence, may not proceed to the next cell. For custom audio, this files again needs to be edited or simply renaming our custom file to the audio mentioned in the shell file also works!
+* Pre-process the audio file to suit the format accepted by OpenVINO.
+* Make inference and export the generated speech text to a text file.
+* Obtain syncmap in _json_ format using [Gentle](https://github.com/lowerquality/gentle)or any other forced aligner. **(Still needs to be configured)**
+* Detect any inappropriate utterance using the [profanity_words_list](https://raw.githubusercontent.com/PrashantDandriyal/Google-profanity-words/master/list.txt) and censor the corresponding audio portion. 
+_NOTE: Some words may be added to label them as inappropriate words._
+### Section 2: ONLY FOR VIDEO SPEECH 
+* Overwrite original audio of video with censored version.
+
+## Results
 [![Actual Clip](https://i.imgur.com/JnAamnUm.png)](https://youtu.be/FYM8NWKDqMU)
-[![Actual Clip](https://i.imgur.com/LowQIgsm.png)](https://youtu.be/FYM8NWKDqMU)
+[![Actual Clip](https://i.imgur.com/LowQIgsm.png)](https://youtu.be/MlbJCKB1LNM)
+
+The two thumbnails show the original and censored (output) files.
+Note: The pre-trained model used here is observed to be trained on audiobook corpus and hence was not found accurate in detecting cuss utterances. Custom words have been appended to the _profanity_words_list_ for our test case.
 
 ## Future Work
-* The audio files (WAV format) should be accessed in raw format. For eg, the "trump_cuss.wav" is to be accessed as "https://raw.githubusercontent.com/PrashantDandriyal/Speech-Censor-Bot/master/trump_cussing.wav"
+* Train custom model specific to profane words.
+* Use an automatic forced aligner 
+* Add support for video input.
+
+## References
+* [SoX illustrations](https://explainshell.com/explain?cmd=sox+-r+48000+-b+16+-e+unsigned-integer+IMG_5367.raw+image.ogg+)
+* [SoX to manipulate amplitude using shell](https://stackoverflow.com/questions/20127095/using-sox-to-change-the-volume-level-of-a-range-of-time-in-an-audio-file)
diff --git a/Sample WAV files/blowup.wav b/Sample WAV files/blowup.wav