This is a toolkit for LLM translation. Currently command-line only, but potential for import as a node-module.
It's evolving from the old NMT's (Neural Machine Translator models) to using Large Language Models (LLM) as the translator.
Because LLM translations are the way to go. This can allow for semantic and syntactic examples to be added to an LLM translation request. Read our post about using a Vector TMX Database
And read all the research also available in this IKI.ai Collection
Each tool is in its own folder and most are command-line interactive tools.
Make sure to run yarn install
to have libraries installed.
- TMX Generator - Generates TMX files for us with translations or the upcomgin TMC Vector/Hybrid Database
- docx xlsx pptx Translator - Generates XLIFF files then translates them using LLM translation
- [LLM Translator] - The main translator tool using a mix of LLMs and the concept of TMX Vector/Hybrid Databases
- ... more coming soon
git clone https://github.com/baobab-tech/llm-translator.git
yarn install
OPEN_AI_KEY = your-openai-key
TMX_INPUT_PAIRS_DIR = data/input/pairs
TMX_INPUT_SINGLES_DIR = data/input/singles
TMX_OUTPUT_DIR = data/output
The TMX generator relies on a decent enough LLM to run basic phrase matching for the "pairs" method and good basic translation of to english for the "singles" method. So the LLM needs to be able to understand the language to translate it to english.
The XLIFF Translator LLM needs to manage placeholder from the xliff, after lots of testing the best models to do this with limited errors are (balancing cost so exlude GPT4 and Claude-3-Opus): GPT-3.5-turbo, Claude-3-Haiku, and Hermes-2-Mixtral-8x7b variant (using Fireworks AI to serve it)
You can pick the LLM by setting the environment variable in .env
LLM="gpt3.5"
Here is the list of options:
- OpenAI:
gpt3.5
, - Anthropic/Claude-3:
haiku
,sonnet
, - Cohere:
cmdr
,cmdrplus
- Open models:
mixtral
,h2mixtral
(served by Fireworks AI)
Why not? Python can go rest a bit.
- Spin up a TMX Vector Database
- Build more tools
- Spin-up a self-hosted service to translate
We welcome contributions to improve this project. Please follow these steps to contribute:
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature/AmazingFeature
). - Commit your changes (
git commit -am 'Add some AmazingFeature'
). - Push to the branch (
git push origin feature/AmazingFeature
). - Open a Pull Request.
This project is licensed under the CC-BY License - see the LICENSE file for details.