Skip to content

neural machine translation using PyTorch and TorchText we'll be implementing the model from Neural Machine Translation by Jointly Learning to Align and Translate.

Notifications You must be signed in to change notification settings

ali-aboelezz/neural-machine-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

neural-machine-translation

1- the main goal

we'll be building a machine learning model to go from once sequence to another, using PyTorch and torchtext. This will be done on German to English translations, but the models can be applied to any problem that involves going from one sequence to another, such as summarization, i.e. going from a sequence to a shorter sequence in the same language.

2- the dataset

spaCy has model for each language ("de_core_news_sm" for German and "en_core_web_sm" for English) which need to be loaded so we can access the tokenizer of each model.

3- the model

The most common sequence-to-sequence (seq2seq) models are encoder-decoder models, which commonly use a recurrent neural network (RNN) to encode the source (input) sentence into a single vector. In this notebook, we'll refer to this single vector as a context vector. We can think of the context vector as being an abstract representation of the entire input sentence. This vector is then decoded by a second RNN which learns to output the target (output) sentence by generating it one word at a time.

we now use a bidirectional RNN. With a bidirectional RNN, we have two RNNs in each layer. A forward RNN going over the embedded sentence from left to right (shown below in green), and a backward RNN going over the embedded sentence from right to left (teal). All we need to do in code is set bidirectional = True and then pass the embedded sentence to the RNN as before.

Next up is the attention layer. This will take in the previous hidden state of the decoder, 𝑠𝑡−1 , and all of the stacked forward and backward hidden states from the encoder, 𝐻 . The layer will output an attention vector, 𝑎𝑡 , that is the length of the source sentence, each element is between 0 and 1 and the entire vector sums to 1.

4- the results

we got Train PPL: 4.493 and valid PPL: 11.493

and Test PPL: 10.752

About

neural machine translation using PyTorch and TorchText we'll be implementing the model from Neural Machine Translation by Jointly Learning to Align and Translate.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published