- Introduction
- Pipeline and Tech Stack
- Installation Instructions
- Pipeline Overview
- Running the Pipeline
- Unit Testing
- How to Contribute
Real-Time End-to-End IoT Edge Translation (REIET
) is an IoT Edge translation pipeline designed to allow speakers of different languages communicate seamlessly in real time.
- Deploying on IoT Edge
- Rather than relying on Cloud APIs, we use small Language Models to run the entire Automatic Speech Recognition (ASR), Translation, and Text to Speech (TTS) pipeline on device
- Rather than choosing general purpose large models, we use smol domain specific models (i.e. Whisper-Base/Seq2Seq Translation model) to meet RAM and latency requirements
- Wifi No More
- Most conversations happen spontaneously and face to face where a strong wifi connection may not be possible
- In combination with on device ML inference, we utilize bluetooth as our data link to provide a Wifi-less mode so
REIET
can translate where you communicate
- Distributed Topology (Hardware Agnostic)
REIET
was originally designed to run on Linux devices but requires knowing MAC addresses before hand for bluetooth data link- To allow
REIET
to support an arbitrary number of nodes across different localities (places), we also support an experimental MQTT (OSI Application Layer) Pub/Sub Architecture mode
With the innovation in Natural Language Processing (NLP), we seek to bridge the multilingual communication gap with a simple mission:
Help people have natural, meaningful conversations in their Native Language
- Machine Learning Stack
- Pytorch (Inference)
- HuggingFace (Inference)
- ChatTTS (Text to Speech)
- OpenAI Whisper-Base
- Audio Processing Stack
- Wavio
- Pydub
- Communication Protocol Stack
- Bluedot
- Paho-MQTT
- Wavio + Sounddevice
For full writeup checkout: tinyurl.com/REIET-writeup
The flowchart above shows the pipeline from Listener RPi to Speaker RPi. Since both RPi's are both Listeners and Speakers, the pipeline is replicated symmetrically and bidirectionally
Step | Task | IoT Device | Protocol | Processing Technique |
---|---|---|---|---|
1 | Record Audio | Microphone | USB | SoundDevice + Wavio |
2 | Automatic Speech Recognition (ASR) | Raspberry Pi (Speaker) | Bluetooth/MQTT | OpenAI Whisper, Deep Learning Transformer Architecture model |
3 | Translation + Text to Speech (TTS) | Raspberry Pi (Listener) | ~ | Helsinki-NLP for translation and ChatTTS for text to speech |
4 | Play Audio | Headphones | Audio Jack | Pydub audio playback library |
5 | Visualize Transcription | Raspberry Pi or Laptop | HTTPS | FlaskAPI from JSON log file to a Web Server |
- 64-bit Operating System
- The RPi default image has a 32-bit OS, a 64-bit OS can be found under Raspberry Pi OS with desktop and recommended software
Python 3.10
(Pytorch and ChatTTS)- Note:
Python3.12
comes with the RPi image linked above
- Note:
- Clone Repository
git clone [email protected]:Ky-Ng/REIET.git
cd REIET
- Install script
source install.sh
pip3 install -r requirements-local.txt
There are two main ways to run the pipeline:
Bluetooth Mode
: Server-Client mode- Run
Bluetooth
mode if your IoT devices are in close proximity or you wish to support wifi-less conversational translation
- Run
MQTT Mode
: Distributed mode- Run
MQTT
mode if you would like to support IoT devices which do not support bluetooth or you wish to support translation from arbitrary distances- Note: MQTT only needs wifi and need not be in close physical range of other nodes
- MQTT may also be helpful if pairing between hardware devices is difficult (i.e. Mac and Windows)
- Run
Additionally, we also support a visualization of the English transcript of the conversation through a simple Flask API web app
.
- Run the server device by specifying the
-c
flag
python3 e2e/e2e_client_server.py -s -l <LANGUAGE_ISO_639_CODE>
- Run the client device by specifying the
-c
flag - The
MAC_ADDRESS
is the MAC address of your Server node
python3 e2e/e2e_client_server.py -c <MAC_ADDRESS> -l <LANGUAGE_ISO_639_CODE>
- Since
MQTT Mode
is distributed, there are noservers
andclients
- Instead, nodes publish and subscribe to a
topic_name
which is similar to a "chat room" name
python3 e2e/e2e_client_server_mqtt.py -t <TOPIC_NAME> -l <LANGUAGE_ISO_639_CODE>
- View the transcript on either the Server or Client through a flask server
- Note: you will need to refresh the web browser when the new audio message is received
python3 gui/flask-server.py
- Note: the
-l
language options areen
for English andzh
for Chinese
- If you run into errors above, running the following unit tests below will help to verify that each part of the pipeline is working correctly:
- Each testing script will also help guarantee your objects/imports/dependencies are nominal with the message:
Initializing Class...success!
- Each file that takes arguments has a
-h
flag for user friendliness as well - Note: these examples assume we are running from the root of the repository
# Outputs Saved to "./testing/testing_outputs/test_audio_handler.wav" by default
python testing/testAudioHandler.py
# python testing/testASRHandler.py -a {AUDIO_FILE}
# ex:
python testing/testASRHandler.py -a testing/testing_outputs/test_audio_handler.wav
# python testing/testTranslationHandler.py -t {TEXT_TO_TRANSLATE} -l {ISO 639 Language Code}
# Chinese ex:
python testing/testTranslationHandler.py -t "hi there, what's your name" -l zh
# output:
# Output Translation in zh: 嗨,你叫什么来着?
# python testing/testTTSHandler.py -t {TEXT_TO_SYNTHESIZE} {-p, flag for playback}
# ex: to synthesize and play the audio
python testing/testTTSHandler.py -t "what is the meaning of life" -p
# ex: to synthesize and no audio playback
python testing/testTTSHandler.py -t "what is the meaning of life"
- Note: the input language is not specified because we support any input language whisper can do ASR translation on
# python e2e/test_self_loop.py -l {OUTPUT_TTS_LANGUAGE}
# ex:
python e2e/test_self_loop.py -l zh
If you're interested in contributing to this repository, feel free to make a Pull Request or contact Jonathan Ong (ongjd [at] usc [dot] edu) and Kyle Ng (kgng [at] usc [dot] edu)