A tensorflow implementation of CopyNet based on tf.contrib.seq2seq APIs.
Refer to Incorporating Copying Mechanism in Sequence-to-Sequence Learning for the details of copynet.
Also, this implementation is based on the code of CopyNet Implementation with Tensorflow and nmt , I borrowed the ideas from here.
The main idea here is create a new RNN cell wrapper(CopyNetWrapper), which is used to cacluate the copy and generate probability, and update the state.
Since we don't need the attention information, we don't need to change the decoder class. Everything can be done in a RNN cell.
This part is different from point-generator. In PG, we use a placeholder to feed the max number of oov words(use max_oov for short later) in a batch. But we can set the max_oov in a batch in advance, say 100. In this case, the source input extend tokens should only have ids in range 0 to (vocab size + 100), other oov words should set to UNK. You can analysis the training data to get a proper max_oov.
Using the formula in the paper to get the probability of the two modes and get the final vocab distribution. Then we use this to calculate
In selective read, we need get
Actually,
Since
And K equals to the total number of input tokens which is same as
I found tf.einsum is a very useful api to manipulate matrixes, you can find how many lines of codes saved in CopyNet to get the attention distribution compared with PG. Everytime you need to manipulate the matrixes, check if einsum works for you.
You can refer to A basic introduction to NumPy's einsum for a basic introduction of einsum.
I test the code using the toy data, check the test_copynet.py in the bin folder for details. After about 20 epochs, the results is almost the same as the truth.