This project implements a neural machine translation system that translates English sentences to Portuguese using a sequence-to-sequence model with attention mechanism.
- Overview
- Model Architecture
- Installation
- Usage
- Files Description
- Training
- Inference
- Minimum Bayes Risk Decoding
- Results
This neural machine translation system is built using TensorFlow and implements a sequence-to-sequence model with attention for translating English sentences to Portuguese. The model uses an encoder-decoder architecture with cross-attention and implements various techniques such as Minimum Bayes Risk (MBR) decoding for improved translation quality.
The model consists of the following main components:
- Encoder: Bidirectional LSTM that processes the input English sentence.
- Decoder: LSTM-based decoder with cross-attention mechanism.
- Cross-Attention: Multi-head attention layer for attending to relevant parts of the encoded input.
To set up the project, follow these steps:
- Clone the repository:
git clone https://github.com/TrishamBP/neural-machine-translation-lstm-attention.git
- Install the required dependencies:
pip install -r requirements.txt
To translate an English sentence to Portuguese:
- Ensure you have a trained model saved as 'translator_model'.
- Run the main script:
python main.py
- The script will output multiple translation candidates and the selected best translation.
main.py
: The main script for running translations and MBR decoding.utils.py
: Utility functions for data loading, preprocessing, and evaluation metrics.training.py
: Script for training the translator model.translator.py
: Defines the main Translator model.encoder.py
: Implementation of the Encoder class.decoder.py
: Implementation of the Decoder class.cross_attention.py
: Implementation of the CrossAttention layer.
The model is trained using the following process:
- Data is loaded and preprocessed from a Portuguese-English parallel corpus.
- The model is compiled with Adam optimizer and custom loss and accuracy functions.
- Training is performed with early stopping based on validation loss.
To train the model:
python training.py
The trained model can be used for inference as follows:
- Load the trained model.
- Use the
translate
function to generate a translation for an input English sentence. - Optionally use MBR decoding for improved translation quality.
This project implements Minimum Bayes Risk (MBR) decoding to improve translation quality:
- Multiple translation candidates are generated.
- Candidates are scored based on their similarity to other candidates.
- The candidate with the highest average similarity is selected as the final translation.