Proof of concept for a desktop application that translates on-screen elements based on optical character recognition (OCR).
While web pages can be easily translated using browser extensions, the textual elements of some applications cannot be easily extracted (e.g. video games, images, scanned documents, ...). The goal of this application is to 1) capture a portion of the screen, 2) recognize the text it contains, 3) translate it and finally 4) paint it back on the screen, all in real time. Currently, this application comes in the form of an overlay and implements SuryaOCR for layout understanding, EasyOCR for optical character recognition and either Argos Translate for local translation or MyMemory for online translation.
Below is an example of translation from English to French by overlaying the application on top of a PDF:
This main limitation of this application is its ability to correctly understand the layout (i.e. should the recognized words be concatenated into a sentence or not). SuryaOCR, the layout understanding model currently in use, was trained on structured documents (mainly PDFs and newspapers) and does not work well in various scenarios such as algorithms, tables, game footage, captions, ...
- Modal for adding/removing other input/output languages
- Application packaging
- Parameters save button
- set updateIgnoreMouseEvents using coordinates instead of alpha value
- Identify sub-tasks for specialized layout detection models (i.e. video game, outdoor, PDF, etc)
cd electron_gui
# Install the required packages
npm install
# Launch the Electron app
npm start --enable-logging
Download Anaconda.
For Windows users, if conda is not recognized as a command by the terminal, add C:\ProgramData\anaconda3\Scripts
to the user's Path environment variables.
cd python_server
# Create the virtual environment and install the packages with conda
conda env create --file environment.yml --prefix ./ldtvenv
# Activate the virtual environment
conda activate .\ldtvenv
Download Python 3.12.7 (don't forget to add it to the PATH during install).
cd python_server
# Create the empty virtual environment
py -3.12 -m venv ldtvenv
# Activate the virtual environment
# On windows:
.\ldtvenv\Scripts\activate
# On linux:
source ldtvenv/bin/activate
# Install pytorch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Install PaddleOCR
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# Install the rest of the packages
pip install -r requirements.txt
Start by bundling the Python application and its dependencies into a single executable that can be run by the user without installing Python. We'll use PyInstaller:
cd python_server
pyinstaller --onefile server.py
Currently, you need to copy the generated file python_server\dist\server.exe
into electron_gui\assets\
.
Then, we'll create the Electron executable using electron-forge:
cd electron_gui
npm run make
You'll find the resulting application in a path similar to electron_gui\out\live_desktop_translator-win32-x64
(the last folder depends on your system's architecture).
This code is released under the MIT license. See the LICENSE file for more information.