Real-Time End-to-End IoT Edge Translation (REIET)

Introduction
- What makes REIET special?
- Motivation
Pipeline and Tech Stack
- Tech Stack
- Pipeline
Installation Instructions
- Prerequisites
- Installing
  - Linux Dependencies
  - Mac/Windows Dependencies
Pipeline Overview
- Real-Time Translation
- Transcript Visualization
Running the Pipeline
Unit Testing
How to Contribute

Introduction

Real-Time End-to-End IoT Edge Translation (REIET) is an IoT Edge translation pipeline designed to allow speakers of different languages communicate seamlessly in real time.

What makes `REIET` special?

Deploying on IoT Edge
1. Rather than relying on Cloud APIs, we use small Language Models to run the entire Automatic Speech Recognition (ASR), Translation, and Text to Speech (TTS) pipeline on device
2. Rather than choosing general purpose large models, we use smol domain specific models (i.e. Whisper-Base/Seq2Seq Translation model) to meet RAM and latency requirements
Wifi No More
1. Most conversations happen spontaneously and face to face where a strong wifi connection may not be possible
2. In combination with on device ML inference, we utilize bluetooth as our data link to provide a Wifi-less mode so REIET can translate where you communicate
Distributed Topology (Hardware Agnostic)
1. REIET was originally designed to run on Linux devices but requires knowing MAC addresses before hand for bluetooth data link
2. To allow REIET to support an arbitrary number of nodes across different localities (places), we also support an experimental MQTT (OSI Application Layer) Pub/Sub Architecture mode

Motivation

With the innovation in Natural Language Processing (NLP), we seek to bridge the multilingual communication gap with a simple mission:

Help people have natural, meaningful conversations in their Native Language

Pipeline and Tech Stack

Tech Stack

Machine Learning Stack
1. Pytorch (Inference)
2. HuggingFace (Inference)
3. ChatTTS (Text to Speech)
4. OpenAI Whisper-Base
Audio Processing Stack
1. Wavio
2. Pydub
Communication Protocol Stack
1. Bluedot
2. Paho-MQTT
3. Wavio + Sounddevice

Pipeline

For full writeup checkout: tinyurl.com/REIET-writeup

The flowchart above shows the pipeline from Listener RPi to Speaker RPi. Since both RPi's are both Listeners and Speakers, the pipeline is replicated symmetrically and bidirectionally

Step	Task	IoT Device	Protocol	Processing Technique
1	Record Audio	Microphone	USB	SoundDevice + Wavio
2	Automatic Speech Recognition (ASR)	Raspberry Pi (Speaker)	Bluetooth/MQTT	OpenAI Whisper, Deep Learning Transformer Architecture model
3	Translation + Text to Speech (TTS)	Raspberry Pi (Listener)	~	Helsinki-NLP for translation and ChatTTS for text to speech
4	Play Audio	Headphones	Audio Jack	Pydub audio playback library
5	Visualize Transcription	Raspberry Pi or Laptop	HTTPS	FlaskAPI from JSON log file to a Web Server

Installation Instructions

Prerequisites

64-bit Operating System
1. The RPi default image has a 32-bit OS, a 64-bit OS can be found under Raspberry Pi OS with desktop and recommended software
Python 3.10 (Pytorch and ChatTTS)
1. Note: Python3.12 comes with the RPi image linked above

Installing

Clone Repository

git clone [email protected]:Ky-Ng/REIET.git
cd REIET

Linux Dependencies

Install script

source install.sh

Mac/Windows Dependencies

pip3 install -r requirements-local.txt

Pipeline Overview

Real-Time Translation

There are two main ways to run the pipeline:

Bluetooth Mode: Server-Client mode
1. Run Bluetooth mode if your IoT devices are in close proximity or you wish to support wifi-less conversational translation
MQTT Mode: Distributed mode
1. Run MQTT mode if you would like to support IoT devices which do not support bluetooth or you wish to support translation from arbitrary distances
  1. Note: MQTT only needs wifi and need not be in close physical range of other nodes
  2. MQTT may also be helpful if pairing between hardware devices is difficult (i.e. Mac and Windows)

Transcript Visualization

Additionally, we also support a visualization of the English transcript of the conversation through a simple Flask API web app.

Running the Pipeline

Bluetooth Mode

Running the Server

Run the server device by specifying the -c flag

python3 e2e/e2e_client_server.py -s -l <LANGUAGE_ISO_639_CODE>

Running the Client

Run the client device by specifying the -c flag
The MAC_ADDRESS is the MAC address of your Server node

python3 e2e/e2e_client_server.py -c <MAC_ADDRESS> -l <LANGUAGE_ISO_639_CODE>

MQTT Mode

Since MQTT Mode is distributed, there are no servers and clients
Instead, nodes publish and subscribe to a topic_name which is similar to a "chat room" name

python3 e2e/e2e_client_server_mqtt.py -t <TOPIC_NAME> -l <LANGUAGE_ISO_639_CODE>

Running the Transcript

View the transcript on either the Server or Client through a flask server
- Note: you will need to refresh the web browser when the new audio message is received

python3 gui/flask-server.py

Note: the -l language options are en for English and zh for Chinese

Unit Testing

If you run into errors above, running the following unit tests below will help to verify that each part of the pipeline is working correctly:

Usages

Each testing script will also help guarantee your objects/imports/dependencies are nominal with the message: Initializing Class...success!
Each file that takes arguments has a -h flag for user friendliness as well
Note: these examples assume we are running from the root of the repository

Audio Handler

# Outputs Saved to "./testing/testing_outputs/test_audio_handler.wav" by default
python testing/testAudioHandler.py

ASR Handler

# python testing/testASRHandler.py -a {AUDIO_FILE}
# ex:
python testing/testASRHandler.py -a testing/testing_outputs/test_audio_handler.wav

Translation Handler

# python testing/testTranslationHandler.py -t {TEXT_TO_TRANSLATE} -l {ISO 639 Language Code}
# Chinese ex:
python testing/testTranslationHandler.py -t "hi there, what's your name" -l zh

# output:
	# Output Translation in zh: 嗨,你叫什么来着?

TTS Handler

# python testing/testTTSHandler.py -t {TEXT_TO_SYNTHESIZE} {-p, flag for playback}      

# ex: to synthesize and play the audio
python testing/testTTSHandler.py -t "what is the meaning of life" -p       

# ex: to synthesize and no audio playback
python testing/testTTSHandler.py -t "what is the meaning of life"

Self Loop (Recording —> ASR —> Translation —> TTS)

Note: the input language is not specified because we support any input language whisper can do ASR translation on

# python e2e/test_self_loop.py -l {OUTPUT_TTS_LANGUAGE}

# ex:
python e2e/test_self_loop.py -l zh

How to Contribute

If you're interested in contributing to this repository, feel free to make a Pull Request or contact Jonathan Ong (ongjd [at] usc [dot] edu) and Kyle Ng (kgng [at] usc [dot] edu)

Name	Name	Last commit message	Last commit date
Latest commit Ky-Ng add playbutton Dec 17, 2024 5ab9a05 · Dec 17, 2024 History 20 Commits
.github/workflows	.github/workflows	Update and rename format-and-lint.yaml to format.yaml	Nov 26, 2024
assets	assets	add playbutton	Dec 17, 2024
e2e	e2e	add helper msg to cli	Dec 5, 2024
gui	gui	update header name	Dec 2, 2024
library	library	remove extra print statement	Dec 4, 2024
testing	testing	feat/MQTT (#8 )	Dec 4, 2024
.gitignore	.gitignore	feat/e2e-mvp-RPi-deployment (#6 )	Dec 1, 2024
README.md	README.md	add playbutton	Dec 17, 2024
install.sh	install.sh	feat/e2e-mvp-RPi-deployment (#6 )	Dec 1, 2024
requirements-local.txt	requirements-local.txt	feat/MQTT (#8 )	Dec 4, 2024
requirements.txt	requirements.txt	feat/MQTT (#8 )	Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time End-to-End IoT Edge Translation (REIET)

Table of Contents

Introduction

What makes `REIET` special?

Motivation

Pipeline and Tech Stack

Tech Stack

Pipeline

Installation Instructions

Prerequisites

Installing

Linux Dependencies

Mac/Windows Dependencies

Pipeline Overview

Real-Time Translation

Transcript Visualization

Running the Pipeline

Bluetooth Mode

Running the Server

Running the Client

MQTT Mode

Running the Transcript

Unit Testing

Usages

Audio Handler

ASR Handler

Translation Handler

TTS Handler

Self Loop (Recording —> ASR —> Translation —> TTS)

How to Contribute

About

Releases

Packages

Contributors 2

Languages

Ky-Ng/REIET

Folders and files

Latest commit

History

Repository files navigation

Real-Time End-to-End IoT Edge Translation (REIET)

Table of Contents

Introduction

What makes REIET special?

Motivation

Pipeline and Tech Stack

Tech Stack

Pipeline

Installation Instructions

Prerequisites

Installing

Linux Dependencies

Mac/Windows Dependencies

Pipeline Overview

Real-Time Translation

Transcript Visualization

Running the Pipeline

Bluetooth Mode

Running the Server

Running the Client

MQTT Mode

Running the Transcript

Unit Testing

Usages

Audio Handler

ASR Handler

Translation Handler

TTS Handler

Self Loop (Recording —> ASR —> Translation —> TTS)

How to Contribute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

What makes `REIET` special?

Packages