diff --git a/README.md b/README.md
index c4ce89b..22dbe2f 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
 </p>
 <p align="center">
   <a href="https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/License-MIT-yellow.svg"></a>
-  <a href="#"><img src="https://img.shields.io/badge/total%20notebooks-308--models-blue.svg"></a>
+  <a href="#"><img src="https://img.shields.io/badge/total%20notebooks-311--models-blue.svg"></a>
 </p>
 
 ---
@@ -17,6 +17,7 @@
   * [Chatbot](#chatbot)
   * [Dependency Parser](#dependency-parser)
   * [Entity Tagging](#entity-tagging)
+  * [Extractive Summarization](#extractive-summarization)
   * [Generator](#generator)
   * [Language Detection](#language-detection)
   * [Neural Machine Translation](neural-machine-translation)
@@ -174,6 +175,16 @@ Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81).
 7. Char Ngrams + Attention is you all Need + CRF, test accuracy 90%
 8. BERT, test accuracy 99%
 
+### [Extractive Summarization](extractive-summarization)
+
+Trained on [CNN News dataset](https://cs.nyu.edu/~kcho/DMQA/).
+
+Accuracy based on ROUGE-2.
+
+1. LSTM RNN, test accuracy 16.13%
+2. Dilated-CNN, test accuracy 15.54%
+3. Multihead Attention, test accuracy 26.33%
+
 ### [Generator](generator)
 
 Trained on [Shakespeare dataset](generator/shakespeare.txt).
diff --git a/extractive-summarization/1.rnn-lstm.ipynb b/extractive-summarization/1.rnn-lstm.ipynb
new file mode 100644
index 0000000..3a35e44
--- /dev/null
+++ b/extractive-summarization/1.rnn-lstm.ipynb
@@ -0,0 +1,882 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['train_texts', 'test_texts', 'train_clss', 'test_clss', 'train_labels', 'test_labels'])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "with open('dataset.pkl', 'rb') as fopen:\n",
+    "    dataset = pickle.load(fopen)\n",
+    "dataset.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "73967"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(dataset['train_texts'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('dictionary.pkl', 'rb') as fopen:\n",
+    "    dictionary = pickle.load(fopen)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rev_dictionary = dictionary['rev_dictionary']\n",
+    "dictionary = dictionary['dictionary']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Model:\n",
+    "    def __init__(self, size_layer, num_layers, embedded_size,\n",
+    "                 dict_size, learning_rate):\n",
+    "        \n",
+    "        def cells(reuse=False):\n",
+    "            return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+    "        \n",
+    "        self.X = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.Y = tf.placeholder(tf.float32, [None, None])\n",
+    "        self.mask = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.clss = tf.placeholder(tf.int32, [None, None])\n",
+    "        mask = tf.cast(self.mask, tf.float32)\n",
+    "        encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n",
+    "        encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n",
+    "        rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n",
+    "        outputs, _ = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded, dtype = tf.float32)\n",
+    "        outputs = tf.gather(outputs, self.clss, axis = 1, batch_dims = 1)\n",
+    "        self.logits = tf.layers.dense(outputs, 1)\n",
+    "        self.logits = tf.squeeze(self.logits, axis=-1)\n",
+    "        self.logits = self.logits * mask\n",
+    "        crossent = tf.nn.sigmoid_cross_entropy_with_logits(logits=self.logits, labels=self.Y)\n",
+    "        crossent = crossent * mask\n",
+    "        crossent = tf.reduce_sum(crossent)\n",
+    "        total_size = tf.reduce_sum(mask)\n",
+    "        self.cost = tf.div_no_nan(crossent, total_size)\n",
+    "        \n",
+    "        self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n",
+    "        \n",
+    "        l = tf.round(tf.sigmoid(self.logits))\n",
+    "        self.accuracy = tf.reduce_mean(tf.cast(tf.boolean_mask(l, tf.equal(self.Y, 1)), tf.float32))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "size_layer = 256\n",
+    "num_layers = 2\n",
+    "embedded_size = 256\n",
+    "learning_rate = 1e-3\n",
+    "batch_size = 128\n",
+    "epoch = 20"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From <ipython-input-6-8f447dd98b6a>:6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n",
+      "WARNING:tensorflow:From <ipython-input-6-8f447dd98b6a>:15: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n",
+      "WARNING:tensorflow:From <ipython-input-6-8f447dd98b6a>:16: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+      "WARNING:tensorflow:From <ipython-input-6-8f447dd98b6a>:18: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use keras.layers.dense instead.\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n"
+     ]
+    }
+   ],
+   "source": [
+    "tf.reset_default_graph()\n",
+    "sess = tf.InteractiveSession()\n",
+    "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate)\n",
+    "sess.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UNK = 3\n",
+    "\n",
+    "def str_idx(corpus, dic):\n",
+    "    X = []\n",
+    "    for i in corpus:\n",
+    "        ints = []\n",
+    "        for k in i.split():\n",
+    "            ints.append(dic.get(k,UNK))\n",
+    "        X.append(ints)\n",
+    "    return X\n",
+    "\n",
+    "def pad_sentence_batch(sentence_batch, pad_int):\n",
+    "    padded_seqs = []\n",
+    "    seq_lens = []\n",
+    "    max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n",
+    "    for sentence in sentence_batch:\n",
+    "        padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n",
+    "        seq_lens.append(len(sentence))\n",
+    "    return padded_seqs, seq_lens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_X = str_idx(dataset['train_texts'], dictionary)\n",
+    "test_X = str_idx(dataset['test_texts'], dictionary)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_clss = dataset['train_clss']\n",
+    "test_clss = dataset['test_clss']\n",
+    "train_Y = dataset['train_labels']\n",
+    "test_Y = dataset['test_labels']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(0.27272728, 0.68941796)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(train_X[:5], 0)\n",
+    "batch_y, _ = pad_sentence_batch(train_Y[:5], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(train_clss[:5], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.Y: batch_y,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n",
+    "acc, loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0, cost=0.267]      \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0, cost=0.221]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, training avg loss 0.262667, training avg acc 0.000085\n",
+      "epoch 1, testing avg loss 0.252563, testing avg acc 0.000000\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0, cost=0.264]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0, cost=0.221]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 2, training avg loss 0.252687, training avg acc 0.000000\n",
+      "epoch 2, testing avg loss 0.250887, testing avg acc 0.000000\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0.0106, cost=0.261] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0, cost=0.219]      \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 3, training avg loss 0.249986, training avg acc 0.000347\n",
+      "epoch 3, testing avg loss 0.250423, testing avg acc 0.000835\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:48<00:00,  2.47s/it, accuracy=0.0177, cost=0.254] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0.0129, cost=0.221] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 4, training avg loss 0.245642, training avg acc 0.005859\n",
+      "epoch 4, testing avg loss 0.253216, testing avg acc 0.006405\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:48<00:00,  2.47s/it, accuracy=0.0532, cost=0.243] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0.0452, cost=0.228] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 5, training avg loss 0.238232, training avg acc 0.026460\n",
+      "epoch 5, testing avg loss 0.260064, testing avg acc 0.025689\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:48<00:00,  2.47s/it, accuracy=0.0922, cost=0.231]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.0516, cost=0.235]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 6, training avg loss 0.228040, training avg acc 0.069235\n",
+      "epoch 6, testing avg loss 0.269286, testing avg acc 0.036230\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:47<00:00,  2.47s/it, accuracy=0.17, cost=0.216]  \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0.0516, cost=0.24] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 7, training avg loss 0.215618, training avg acc 0.125084\n",
+      "epoch 7, testing avg loss 0.272501, testing avg acc 0.045919\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:48<00:00,  2.47s/it, accuracy=0.238, cost=0.203] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.0839, cost=0.257]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 8, training avg loss 0.201337, training avg acc 0.187912\n",
+      "epoch 8, testing avg loss 0.287809, testing avg acc 0.070733\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:47<00:00,  2.47s/it, accuracy=0.312, cost=0.186]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0.103, cost=0.281] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 9, training avg loss 0.186864, training avg acc 0.249759\n",
+      "epoch 9, testing avg loss 0.315843, testing avg acc 0.084132\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:46<00:00,  2.47s/it, accuracy=0.358, cost=0.169]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.11, cost=0.314]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 10, training avg loss 0.172965, training avg acc 0.309004\n",
+      "epoch 10, testing avg loss 0.339780, testing avg acc 0.092134\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:51<00:00,  2.48s/it, accuracy=0.443, cost=0.153]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0.135, cost=0.348] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 11, training avg loss 0.159476, training avg acc 0.365974\n",
+      "epoch 11, testing avg loss 0.374555, testing avg acc 0.111304\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0.482, cost=0.132]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.129, cost=0.357] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 12, training avg loss 0.145508, training avg acc 0.424775\n",
+      "epoch 12, testing avg loss 0.380014, testing avg acc 0.103223\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0.479, cost=0.123]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.123, cost=0.357] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 13, training avg loss 0.132634, training avg acc 0.476701\n",
+      "epoch 13, testing avg loss 0.392311, testing avg acc 0.098482\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:49<00:00,  2.47s/it, accuracy=0.475, cost=0.12]  \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.11, cost=0.357]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 14, training avg loss 0.121177, training avg acc 0.523573\n",
+      "epoch 14, testing avg loss 0.401181, testing avg acc 0.082353\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:43<00:00,  2.46s/it, accuracy=0.585, cost=0.105] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0.148, cost=0.416] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 15, training avg loss 0.110205, training avg acc 0.566239\n",
+      "epoch 15, testing avg loss 0.457439, testing avg acc 0.102568\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:41<00:00,  2.46s/it, accuracy=0.617, cost=0.0902]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.20it/s, accuracy=0.142, cost=0.442] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 16, training avg loss 0.098867, training avg acc 0.610249\n",
+      "epoch 16, testing avg loss 0.508520, testing avg acc 0.109937\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:44<00:00,  2.46s/it, accuracy=0.66, cost=0.0792] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0.129, cost=0.441] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 17, training avg loss 0.089521, training avg acc 0.646151\n",
+      "epoch 17, testing avg loss 0.516304, testing avg acc 0.108609\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:42<00:00,  2.46s/it, accuracy=0.73, cost=0.0708] \n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0.168, cost=0.48]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 18, training avg loss 0.081986, training avg acc 0.674612\n",
+      "epoch 18, testing avg loss 0.546543, testing avg acc 0.129065\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:40<00:00,  2.46s/it, accuracy=0.745, cost=0.0597]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:00<00:00,  1.20it/s, accuracy=0.148, cost=0.478] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 19, training avg loss 0.074648, training avg acc 0.702221\n",
+      "epoch 19, testing avg loss 0.547588, testing avg acc 0.135218\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [23:43<00:00,  2.46s/it, accuracy=0.716, cost=0.0659]\n",
+      "minibatch loop: 100%|██████████| 145/145 [02:01<00:00,  1.19it/s, accuracy=0.174, cost=0.512] "
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 20, training avg loss 0.068429, training avg acc 0.724982\n",
+      "epoch 20, testing avg loss 0.577269, testing avg acc 0.130517\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tqdm\n",
+    "\n",
+    "for e in range(epoch):\n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(train_X), batch_size), desc = 'minibatch loop')\n",
+    "    train_loss, train_acc, test_loss, test_acc = [], [], [], []\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(train_X))\n",
+    "        batch_x, _ = pad_sentence_batch(train_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(train_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(train_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss, _ = sess.run([model.accuracy,model.cost,model.optimizer],\n",
+    "                                    feed_dict = feed)\n",
+    "        train_loss.append(loss)\n",
+    "        train_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(test_X), batch_size), desc = 'minibatch loop')\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(test_X))\n",
+    "        batch_x, _ = pad_sentence_batch(test_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(test_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(test_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss = sess.run([model.accuracy,model.cost],\n",
+    "                                    feed_dict = feed)\n",
+    "\n",
+    "        test_loss.append(loss)\n",
+    "        test_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    print('epoch %d, training avg loss %f, training avg acc %f'%(e+1,\n",
+    "                                                                 np.mean(train_loss),np.mean(train_acc)))\n",
+    "    print('epoch %d, testing avg loss %f, testing avg acc %f'%(e+1,\n",
+    "                                                              np.mean(test_loss),np.mean(test_acc)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "from tensorflow.keras.preprocessing import sequence\n",
+    "\n",
+    "def calculate_rouges(predicted, batch_y):\n",
+    "    non = np.count_nonzero(batch_y, axis = 1)\n",
+    "    o = []\n",
+    "    for n in non:\n",
+    "        o.append([True for _ in range(n)])\n",
+    "    b = sequence.pad_sequences(o, dtype = np.bool, padding = 'post', value = False)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    rouges = []\n",
+    "    for i in range(predicted.shape[0]):\n",
+    "        a = batch_y[i][b[i]]\n",
+    "        p = predicted[i][b[i]]\n",
+    "        rouges.append(rouge.rouge_n([p], [a]))\n",
+    "    return np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(test_X[: 5], 0)\n",
+    "batch_y, _ = pad_sentence_batch(test_Y[: 5], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(test_clss[: 5], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_y = np.array(batch_y)\n",
+    "batch_x = np.array(batch_x)\n",
+    "cp_batch_clss = batch_clss.copy()\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.20137092"
+      ]
+     },
+     "execution_count": 59,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "\n",
+    "def calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x):\n",
+    "    f, y_, predicted_ = [], [], []\n",
+    "    for i in range(len(cp_batch_clss)):\n",
+    "        f.append(cp_batch_clss[i][cp_batch_clss[i] != -1])\n",
+    "        y_.append(batch_y[i][cp_batch_clss[i] != -1])\n",
+    "        predicted_.append(predicted[i][cp_batch_clss[i] != -1])\n",
+    "    \n",
+    "    actual, predict = [], []\n",
+    "    for i in range(len(f)):\n",
+    "        actual_, predict_ = [], []\n",
+    "        for k in range(len(f[i])):\n",
+    "            if k == (len(f[i]) - 1):\n",
+    "                s = batch_x[i][f[i][k]:]\n",
+    "                s = s[s != 0]\n",
+    "            else:\n",
+    "                s = batch_x[i][f[i][k]: f[i][k + 1]]\n",
+    "            s = [w for w in s if w not in [0, 1, 2, 3, 5, 6, 7, 8]]\n",
+    "            if y_[i][k]:\n",
+    "                actual_.extend(s)\n",
+    "            if predicted_[i][k]:\n",
+    "                predict_.extend(s)\n",
+    "        actual.append(actual_)\n",
+    "        predict.append(predict_)\n",
+    "    return rouge.rouge_n(predict, actual)\n",
+    "\n",
+    "calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tqdm import tqdm as tqdm_base\n",
+    "def tqdm(*args, **kwargs):\n",
+    "    if hasattr(tqdm_base, '_instances'):\n",
+    "        for instance in list(tqdm_base._instances):\n",
+    "            tqdm_base._decr_instances(instance)\n",
+    "    return tqdm_base(*args, **kwargs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "minibatch loop:   0%|          | 1/578 [03:28<33:22:59, 208.28s/it, rouge=0.184]\n",
+      "minibatch loop:   2%|▏         | 13/578 [03:01<2:11:19, 13.95s/it, rouge=0.181]\n",
+      "minibatch loop: 100%|██████████| 578/578 [10:19<00:00,  1.07s/it, rouge=0.155]\n"
+     ]
+    }
+   ],
+   "source": [
+    "rouges = []\n",
+    "\n",
+    "pbar = tqdm(\n",
+    "    range(0, len(test_X), 32), desc = 'minibatch loop')\n",
+    "for i in pbar:\n",
+    "    index = min(i + batch_size, len(test_X))\n",
+    "    batch_x, _ = pad_sentence_batch(test_X[i: index], 0)\n",
+    "    batch_y, _ = pad_sentence_batch(test_Y[i: index], 0)\n",
+    "    batch_clss, _ = pad_sentence_batch(test_clss[i: index], -1)\n",
+    "    batch_clss = np.array(batch_clss)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    batch_x = np.array(batch_x)\n",
+    "    cp_batch_clss = batch_clss.copy()\n",
+    "    batch_mask = 1 - (batch_clss == -1)\n",
+    "    batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "    feed = {model.X: batch_x,\n",
+    "            model.mask: batch_mask,\n",
+    "            model.clss: batch_clss}\n",
+    "    predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)\n",
+    "    rouge_ = calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)\n",
+    "    rouges.append(rouge_)\n",
+    "    pbar.set_postfix(rouge = rouge_)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.1613218"
+      ]
+     },
+     "execution_count": 64,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/extractive-summarization/2.dilated-cnn.ipynb b/extractive-summarization/2.dilated-cnn.ipynb
new file mode 100644
index 0000000..97d8659
--- /dev/null
+++ b/extractive-summarization/2.dilated-cnn.ipynb
@@ -0,0 +1,909 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['train_texts', 'test_texts', 'train_clss', 'test_clss', 'train_labels', 'test_labels'])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "with open('dataset.pkl', 'rb') as fopen:\n",
+    "    dataset = pickle.load(fopen)\n",
+    "dataset.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "73967"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(dataset['train_texts'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('dictionary.pkl', 'rb') as fopen:\n",
+    "    dictionary = pickle.load(fopen)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rev_dictionary = dictionary['rev_dictionary']\n",
+    "dictionary = dictionary['dictionary']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def position_encoding(inputs):\n",
+    "    T = tf.shape(inputs)[1]\n",
+    "    repr_dim = inputs.get_shape()[-1].value\n",
+    "    pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n",
+    "    i = np.arange(0, repr_dim, 2, np.float32)\n",
+    "    denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n",
+    "    enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n",
+    "    return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n",
+    "\n",
+    "def layer_norm(inputs, epsilon=1e-8):\n",
+    "    mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n",
+    "    normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n",
+    "    params_shape = inputs.get_shape()[-1:]\n",
+    "    gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n",
+    "    beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n",
+    "    return gamma * normalized + beta\n",
+    "\n",
+    "\n",
+    "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n",
+    "    x = layer_norm(x)\n",
+    "    pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n",
+    "    x =  tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n",
+    "                          filters = hidden_dim,\n",
+    "                          kernel_size = kernel_size,\n",
+    "                          dilation_rate = dilation_rate)\n",
+    "    x = x[:, :-pad_sz, :]\n",
+    "    x = tf.nn.relu(x)\n",
+    "    return x\n",
+    "\n",
+    "class Model:\n",
+    "    def __init__(self, size_layer, num_layers, embedded_size,\n",
+    "                 dict_size, learning_rate, kernel_size = 3):\n",
+    "        \n",
+    "        def cells(reuse=False):\n",
+    "            return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+    "        \n",
+    "        self.X = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.Y = tf.placeholder(tf.float32, [None, None])\n",
+    "        self.mask = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.clss = tf.placeholder(tf.int32, [None, None])\n",
+    "        mask = tf.cast(self.mask, tf.float32)\n",
+    "        encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n",
+    "        encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n",
+    "        encoder_embedded += position_encoding(encoder_embedded)\n",
+    "        \n",
+    "        for i in range(num_layers): \n",
+    "            dilation_rate = 2 ** i\n",
+    "            pad_sz = (kernel_size - 1) * dilation_rate \n",
+    "            with tf.variable_scope('block_%d'%i,reuse=False):\n",
+    "                encoder_embedded += cnn_block(encoder_embedded, dilation_rate, \n",
+    "                                              pad_sz, size_layer, kernel_size)\n",
+    "                        \n",
+    "        outputs = tf.gather(encoder_embedded, self.clss, axis = 1, batch_dims = 1)\n",
+    "        self.logits = tf.layers.dense(outputs, 1)\n",
+    "        self.logits = tf.squeeze(self.logits, axis=-1)\n",
+    "        self.logits = self.logits * mask\n",
+    "        crossent = tf.nn.sigmoid_cross_entropy_with_logits(logits=self.logits, labels=self.Y)\n",
+    "        crossent = crossent * mask\n",
+    "        crossent = tf.reduce_sum(crossent)\n",
+    "        total_size = tf.reduce_sum(mask)\n",
+    "        self.cost = tf.div_no_nan(crossent, total_size)\n",
+    "        \n",
+    "        self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n",
+    "        \n",
+    "        l = tf.round(tf.sigmoid(self.logits))\n",
+    "        self.accuracy = tf.reduce_mean(tf.cast(tf.boolean_mask(l, tf.equal(self.Y, 1)), tf.float32))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "size_layer = 256\n",
+    "num_layers = 4\n",
+    "embedded_size = 256\n",
+    "learning_rate = 1e-3\n",
+    "batch_size = 128\n",
+    "epoch = 20"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From <ipython-input-6-2231f555135a>:4: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use `tf.cast` instead.\n",
+      "WARNING:tensorflow:From <ipython-input-6-2231f555135a>:25: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use `tf.keras.layers.Conv1D` instead.\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+      "WARNING:tensorflow:From <ipython-input-6-2231f555135a>:54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use keras.layers.dense instead.\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n"
+     ]
+    }
+   ],
+   "source": [
+    "tf.reset_default_graph()\n",
+    "sess = tf.InteractiveSession()\n",
+    "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate)\n",
+    "sess.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UNK = 3\n",
+    "\n",
+    "def str_idx(corpus, dic):\n",
+    "    X = []\n",
+    "    for i in corpus:\n",
+    "        ints = []\n",
+    "        for k in i.split():\n",
+    "            ints.append(dic.get(k,UNK))\n",
+    "        X.append(ints)\n",
+    "    return X\n",
+    "\n",
+    "def pad_sentence_batch(sentence_batch, pad_int):\n",
+    "    padded_seqs = []\n",
+    "    seq_lens = []\n",
+    "    max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n",
+    "    for sentence in sentence_batch:\n",
+    "        padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n",
+    "        seq_lens.append(len(sentence))\n",
+    "    return padded_seqs, seq_lens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_X = str_idx(dataset['train_texts'], dictionary)\n",
+    "test_X = str_idx(dataset['test_texts'], dictionary)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_clss = dataset['train_clss']\n",
+    "test_clss = dataset['test_clss']\n",
+    "train_Y = dataset['train_labels']\n",
+    "test_Y = dataset['test_labels']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(0.36363637, 0.80718136)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(train_X[:5], 0)\n",
+    "batch_y, _ = pad_sentence_batch(train_Y[:5], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(train_clss[:5], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.Y: batch_y,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n",
+    "acc, loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [05:11<00:00,  1.86it/s, accuracy=0, cost=0.268]      \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:21<00:00,  6.74it/s, accuracy=0, cost=0.221]     \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, training avg loss 0.267856, training avg acc 0.003014\n",
+      "epoch 1, testing avg loss 0.253723, testing avg acc 0.000193\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.42it/s, accuracy=0.0106, cost=0.265] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.66it/s, accuracy=0, cost=0.222]      \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 2, training avg loss 0.252464, training avg acc 0.001037\n",
+      "epoch 2, testing avg loss 0.253672, testing avg acc 0.001260\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.0142, cost=0.261] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.67it/s, accuracy=0, cost=0.224]      \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 3, training avg loss 0.248327, training avg acc 0.005151\n",
+      "epoch 3, testing avg loss 0.255450, testing avg acc 0.003196\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.44it/s, accuracy=0.0319, cost=0.253] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.65it/s, accuracy=0.0129, cost=0.232] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 4, training avg loss 0.240196, training avg acc 0.020250\n",
+      "epoch 4, testing avg loss 0.260763, testing avg acc 0.007725\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.44it/s, accuracy=0.078, cost=0.238] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.61it/s, accuracy=0.0129, cost=0.24]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 5, training avg loss 0.225537, training avg acc 0.067153\n",
+      "epoch 5, testing avg loss 0.272367, testing avg acc 0.018334\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.195, cost=0.211] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.61it/s, accuracy=0.0452, cost=0.258]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 6, training avg loss 0.203671, training avg acc 0.162199\n",
+      "epoch 6, testing avg loss 0.290515, testing avg acc 0.036583\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.365, cost=0.182]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.66it/s, accuracy=0.0516, cost=0.282]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 7, training avg loss 0.177396, training avg acc 0.287097\n",
+      "epoch 7, testing avg loss 0.317477, testing avg acc 0.072228\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.443, cost=0.148]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.65it/s, accuracy=0.103, cost=0.321] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 8, training avg loss 0.151854, training avg acc 0.402537\n",
+      "epoch 8, testing avg loss 0.356769, testing avg acc 0.102667\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.42it/s, accuracy=0.571, cost=0.122]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.58it/s, accuracy=0.11, cost=0.361]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 9, training avg loss 0.128955, training avg acc 0.500623\n",
+      "epoch 9, testing avg loss 0.398493, testing avg acc 0.111438\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.42it/s, accuracy=0.582, cost=0.102] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.65it/s, accuracy=0.135, cost=0.421] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 10, training avg loss 0.110343, training avg acc 0.574596\n",
+      "epoch 10, testing avg loss 0.457165, testing avg acc 0.138099\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.532, cost=0.101] \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.65it/s, accuracy=0.0839, cost=0.452]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 11, training avg loss 0.094866, training avg acc 0.633989\n",
+      "epoch 11, testing avg loss 0.505708, testing avg acc 0.099099\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.645, cost=0.0773]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.64it/s, accuracy=0.0839, cost=0.495]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 12, training avg loss 0.083911, training avg acc 0.674366\n",
+      "epoch 12, testing avg loss 0.558003, testing avg acc 0.077336\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.798, cost=0.06]  \n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.66it/s, accuracy=0.116, cost=0.541] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 13, training avg loss 0.072917, training avg acc 0.712523\n",
+      "epoch 13, testing avg loss 0.596416, testing avg acc 0.117849\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:49<00:00,  3.42it/s, accuracy=0.794, cost=0.0595]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:17<00:00,  8.49it/s, accuracy=0.161, cost=0.595]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 14, training avg loss 0.065610, training avg acc 0.737632\n",
+      "epoch 14, testing avg loss 0.634937, testing avg acc 0.159761\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:56<00:00,  3.27it/s, accuracy=0.599, cost=0.0775]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.60it/s, accuracy=0.11, cost=0.584]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 15, training avg loss 0.060682, training avg acc 0.755263\n",
+      "epoch 15, testing avg loss 0.646741, testing avg acc 0.103789\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.43it/s, accuracy=0.741, cost=0.0466]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.59it/s, accuracy=0.071, cost=0.63]  \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 16, training avg loss 0.054130, training avg acc 0.777972\n",
+      "epoch 16, testing avg loss 0.707160, testing avg acc 0.068564\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:49<00:00,  3.42it/s, accuracy=0.858, cost=0.0818]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:17<00:00,  8.35it/s, accuracy=0.161, cost=0.701]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 17, training avg loss 0.048324, training avg acc 0.796035\n",
+      "epoch 17, testing avg loss 0.731096, testing avg acc 0.179190\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.42it/s, accuracy=0.833, cost=0.0343]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:19<00:00,  7.42it/s, accuracy=0.181, cost=0.732]\n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 18, training avg loss 0.046448, training avg acc 0.802126\n",
+      "epoch 18, testing avg loss 0.783001, testing avg acc 0.167646\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:55<00:00,  3.28it/s, accuracy=0.791, cost=0.0392]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:17<00:00,  8.52it/s, accuracy=0.135, cost=0.772] \n",
+      "minibatch loop:   0%|          | 0/578 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 19, training avg loss 0.043719, training avg acc 0.812243\n",
+      "epoch 19, testing avg loss 0.852649, testing avg acc 0.110365\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [02:48<00:00,  3.42it/s, accuracy=0.837, cost=0.0352]\n",
+      "minibatch loop: 100%|██████████| 145/145 [00:16<00:00,  8.68it/s, accuracy=0.135, cost=0.8]   "
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 20, training avg loss 0.042226, training avg acc 0.816797\n",
+      "epoch 20, testing avg loss 0.871126, testing avg acc 0.131413\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tqdm\n",
+    "\n",
+    "for e in range(epoch):\n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(train_X), batch_size), desc = 'minibatch loop')\n",
+    "    train_loss, train_acc, test_loss, test_acc = [], [], [], []\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(train_X))\n",
+    "        batch_x, _ = pad_sentence_batch(train_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(train_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(train_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss, _ = sess.run([model.accuracy,model.cost,model.optimizer],\n",
+    "                                    feed_dict = feed)\n",
+    "        train_loss.append(loss)\n",
+    "        train_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(test_X), batch_size), desc = 'minibatch loop')\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(test_X))\n",
+    "        batch_x, _ = pad_sentence_batch(test_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(test_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(test_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss = sess.run([model.accuracy,model.cost],\n",
+    "                                    feed_dict = feed)\n",
+    "\n",
+    "        test_loss.append(loss)\n",
+    "        test_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    print('epoch %d, training avg loss %f, training avg acc %f'%(e+1,\n",
+    "                                                                 np.mean(train_loss),np.mean(train_acc)))\n",
+    "    print('epoch %d, testing avg loss %f, testing avg acc %f'%(e+1,\n",
+    "                                                              np.mean(test_loss),np.mean(test_acc)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "from tensorflow.keras.preprocessing import sequence\n",
+    "\n",
+    "def calculate_rouges(predicted, batch_y):\n",
+    "    non = np.count_nonzero(batch_y, axis = 1)\n",
+    "    o = []\n",
+    "    for n in non:\n",
+    "        o.append([True for _ in range(n)])\n",
+    "    b = sequence.pad_sequences(o, dtype = np.bool, padding = 'post', value = False)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    rouges = []\n",
+    "    for i in range(predicted.shape[0]):\n",
+    "        a = batch_y[i][b[i]]\n",
+    "        p = predicted[i][b[i]]\n",
+    "        rouges.append(rouge.rouge_n([p], [a]))\n",
+    "    return np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(test_X[: 5], 0)\n",
+    "batch_y, _ = pad_sentence_batch(test_Y[: 5], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(test_clss[: 5], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_y = np.array(batch_y)\n",
+    "batch_x = np.array(batch_x)\n",
+    "cp_batch_clss = batch_clss.copy()\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.02314938"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "\n",
+    "def calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x):\n",
+    "    f, y_, predicted_ = [], [], []\n",
+    "    for i in range(len(cp_batch_clss)):\n",
+    "        f.append(cp_batch_clss[i][cp_batch_clss[i] != -1])\n",
+    "        y_.append(batch_y[i][cp_batch_clss[i] != -1])\n",
+    "        predicted_.append(predicted[i][cp_batch_clss[i] != -1])\n",
+    "    \n",
+    "    actual, predict = [], []\n",
+    "    for i in range(len(f)):\n",
+    "        actual_, predict_ = [], []\n",
+    "        for k in range(len(f[i])):\n",
+    "            if k == (len(f[i]) - 1):\n",
+    "                s = batch_x[i][f[i][k]:]\n",
+    "                s = s[s != 0]\n",
+    "            else:\n",
+    "                s = batch_x[i][f[i][k]: f[i][k + 1]]\n",
+    "            s = [w for w in s if w not in [0, 1, 2, 3, 5, 6, 7, 8]]\n",
+    "            if y_[i][k]:\n",
+    "                actual_.extend(s)\n",
+    "            if predicted_[i][k]:\n",
+    "                predict_.extend(s)\n",
+    "        actual.append(actual_)\n",
+    "        predict.append(predict_)\n",
+    "    return rouge.rouge_n(predict, actual)\n",
+    "\n",
+    "calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tqdm import tqdm as tqdm_base\n",
+    "def tqdm(*args, **kwargs):\n",
+    "    if hasattr(tqdm_base, '_instances'):\n",
+    "        for instance in list(tqdm_base._instances):\n",
+    "            tqdm_base._decr_instances(instance)\n",
+    "    return tqdm_base(*args, **kwargs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 578/578 [03:33<00:00,  2.71it/s, rouge=0.186]\n"
+     ]
+    }
+   ],
+   "source": [
+    "rouges = []\n",
+    "\n",
+    "pbar = tqdm(\n",
+    "    range(0, len(test_X), 32), desc = 'minibatch loop')\n",
+    "for i in pbar:\n",
+    "    index = min(i + batch_size, len(test_X))\n",
+    "    batch_x, _ = pad_sentence_batch(test_X[i: index], 0)\n",
+    "    batch_y, _ = pad_sentence_batch(test_Y[i: index], 0)\n",
+    "    batch_clss, _ = pad_sentence_batch(test_clss[i: index], -1)\n",
+    "    batch_clss = np.array(batch_clss)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    batch_x = np.array(batch_x)\n",
+    "    cp_batch_clss = batch_clss.copy()\n",
+    "    batch_mask = 1 - (batch_clss == -1)\n",
+    "    batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "    feed = {model.X: batch_x,\n",
+    "            model.mask: batch_mask,\n",
+    "            model.clss: batch_clss}\n",
+    "    predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)\n",
+    "    rouge_ = calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)\n",
+    "    rouges.append(rouge_)\n",
+    "    pbar.set_postfix(rouge = rouge_)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.1554709"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/extractive-summarization/3.multihead-attention.ipynb b/extractive-summarization/3.multihead-attention.ipynb
new file mode 100644
index 0000000..b4974ec
--- /dev/null
+++ b/extractive-summarization/3.multihead-attention.ipynb
@@ -0,0 +1,988 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['train_texts', 'test_texts', 'train_clss', 'test_clss', 'train_labels', 'test_labels'])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "with open('dataset.pkl', 'rb') as fopen:\n",
+    "    dataset = pickle.load(fopen)\n",
+    "dataset.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "73967"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(dataset['train_texts'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('dictionary.pkl', 'rb') as fopen:\n",
+    "    dictionary = pickle.load(fopen)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rev_dictionary = dictionary['rev_dictionary']\n",
+    "dictionary = dictionary['dictionary']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def ln(inputs, epsilon = 1e-8, scope=\"ln\"):\n",
+    "    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n",
+    "        inputs_shape = inputs.get_shape()\n",
+    "        params_shape = inputs_shape[-1:]\n",
+    "    \n",
+    "        mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n",
+    "        beta= tf.get_variable(\"beta\", params_shape, initializer=tf.zeros_initializer())\n",
+    "        gamma = tf.get_variable(\"gamma\", params_shape, initializer=tf.ones_initializer())\n",
+    "        normalized = (inputs - mean) / ( (variance + epsilon) ** (.5) )\n",
+    "        outputs = gamma * normalized + beta\n",
+    "        \n",
+    "    return outputs\n",
+    "\n",
+    "def scaled_dot_product_attention(Q, K, V,\n",
+    "                                 causality=False, dropout_rate=0.,\n",
+    "                                 training=True,\n",
+    "                                 scope=\"scaled_dot_product_attention\"):\n",
+    "    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n",
+    "        d_k = Q.get_shape().as_list()[-1]\n",
+    "\n",
+    "        outputs = tf.matmul(Q, tf.transpose(K, [0, 2, 1]))  # (N, T_q, T_k)\n",
+    "        outputs /= d_k ** 0.5\n",
+    "        outputs = mask(outputs, Q, K, type=\"key\")\n",
+    "        if causality:\n",
+    "            outputs = mask(outputs, type=\"future\")\n",
+    "        outputs = tf.nn.softmax(outputs)\n",
+    "        attention = tf.transpose(outputs, [0, 2, 1])\n",
+    "        #tf.summary.image(\"attention\", tf.expand_dims(attention[:1], -1))\n",
+    "        outputs = mask(outputs, Q, K, type=\"query\")\n",
+    "        outputs = tf.layers.dropout(outputs, rate=dropout_rate, training=training)\n",
+    "        outputs = tf.matmul(outputs, V)\n",
+    "    return outputs\n",
+    "\n",
+    "def mask(inputs, queries=None, keys=None, type=None):\n",
+    "    padding_num = -2 ** 32 + 1\n",
+    "    if type in (\"k\", \"key\", \"keys\"):\n",
+    "        masks = tf.sign(tf.reduce_sum(tf.abs(keys), axis=-1))  # (N, T_k)\n",
+    "        masks = tf.expand_dims(masks, 1) # (N, 1, T_k)\n",
+    "        masks = tf.tile(masks, [1, tf.shape(queries)[1], 1])  # (N, T_q, T_k)\n",
+    "        paddings = tf.ones_like(inputs) * padding_num\n",
+    "        outputs = tf.where(tf.equal(masks, 0), paddings, inputs)  # (N, T_q, T_k)\n",
+    "    elif type in (\"q\", \"query\", \"queries\"):\n",
+    "        masks = tf.sign(tf.reduce_sum(tf.abs(queries), axis=-1))  # (N, T_q)\n",
+    "        masks = tf.expand_dims(masks, -1)  # (N, T_q, 1)\n",
+    "        masks = tf.tile(masks, [1, 1, tf.shape(keys)[1]])  # (N, T_q, T_k)\n",
+    "        outputs = inputs*masks\n",
+    "    elif type in (\"f\", \"future\", \"right\"):\n",
+    "        diag_vals = tf.ones_like(inputs[0, :, :])  # (T_q, T_k)\n",
+    "        tril = tf.linalg.LinearOperatorLowerTriangular(diag_vals).to_dense()  # (T_q, T_k)\n",
+    "        masks = tf.tile(tf.expand_dims(tril, 0), [tf.shape(inputs)[0], 1, 1])  # (N, T_q, T_k)\n",
+    "        paddings = tf.ones_like(masks) * padding_num\n",
+    "        outputs = tf.where(tf.equal(masks, 0), paddings, inputs)\n",
+    "    else:\n",
+    "        print(\"Check if you entered type correctly!\")\n",
+    "\n",
+    "\n",
+    "    return outputs\n",
+    "\n",
+    "def multihead_attention(queries, keys, values,\n",
+    "                        num_heads=8, \n",
+    "                        dropout_rate=0,\n",
+    "                        training=True,\n",
+    "                        causality=False,\n",
+    "                        scope=\"multihead_attention\"):\n",
+    "    d_model = queries.get_shape().as_list()[-1]\n",
+    "    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n",
+    "        # Linear projections\n",
+    "        Q = tf.layers.dense(queries, d_model, use_bias=False) # (N, T_q, d_model)\n",
+    "        K = tf.layers.dense(keys, d_model, use_bias=False) # (N, T_k, d_model)\n",
+    "        V = tf.layers.dense(values, d_model, use_bias=False) # (N, T_k, d_model)\n",
+    "        \n",
+    "        Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, d_model/h)\n",
+    "        K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, d_model/h)\n",
+    "        V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, d_model/h)\n",
+    "\n",
+    "        outputs = scaled_dot_product_attention(Q_, K_, V_, causality, dropout_rate, training)\n",
+    "        outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2 ) # (N, T_q, d_model)\n",
+    "        outputs += queries\n",
+    "        outputs = ln(outputs)\n",
+    " \n",
+    "    return outputs\n",
+    "\n",
+    "def ff(inputs, num_units, scope=\"positionwise_feedforward\"):\n",
+    "    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n",
+    "        outputs = tf.layers.dense(inputs, num_units[0], activation=tf.nn.relu)\n",
+    "        outputs = tf.layers.dense(outputs, num_units[1])\n",
+    "        outputs += inputs\n",
+    "        outputs = ln(outputs)\n",
+    "    \n",
+    "    return outputs\n",
+    "\n",
+    "def label_smoothing(inputs, epsilon=0.1):\n",
+    "    V = inputs.get_shape().as_list()[-1] # number of channels\n",
+    "    return ((1-epsilon) * inputs) + (epsilon / V)\n",
+    "\n",
+    "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n",
+    "    T = tf.shape(inputs)[1]\n",
+    "    pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n",
+    "    i = np.arange(0, repr_dim, 2, np.float32)\n",
+    "    denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n",
+    "    enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n",
+    "    return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n",
+    "\n",
+    "class Model:\n",
+    "    def __init__(self, size_layer, embedded_size,\n",
+    "                 dict_size, learning_rate,\n",
+    "                 num_blocks = 4, num_heads = 8, ratio_hidden = 2):\n",
+    "        \n",
+    "        def cells(reuse=False):\n",
+    "            return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n",
+    "        \n",
+    "        self.X = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.Y = tf.placeholder(tf.float32, [None, None])\n",
+    "        self.mask = tf.placeholder(tf.int32, [None, None])\n",
+    "        self.clss = tf.placeholder(tf.int32, [None, None])\n",
+    "        mask = tf.cast(self.mask, tf.float32)\n",
+    "        encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n",
+    "        encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n",
+    "        en_masks = tf.sign(self.X)\n",
+    "        encoder_embedded += sinusoidal_position_encoding(self.X, en_masks, size_layer)\n",
+    "        enc = encoder_embedded\n",
+    "        \n",
+    "        for i in range(num_blocks):\n",
+    "            with tf.variable_scope('encoder_self_attn_%d'%i,reuse=False):\n",
+    "                enc = multihead_attention(queries=enc,\n",
+    "                                          keys=enc,\n",
+    "                                          values=enc,\n",
+    "                                          num_heads=num_heads,\n",
+    "                                          causality=False)\n",
+    "                enc = ff(enc, num_units=[size_layer * ratio_hidden, size_layer])\n",
+    "                        \n",
+    "        outputs = tf.gather(enc, self.clss, axis = 1, batch_dims = 1)\n",
+    "        self.logits = tf.layers.dense(outputs, 1)\n",
+    "        self.logits = tf.squeeze(self.logits, axis=-1)\n",
+    "        self.logits = self.logits * mask\n",
+    "        crossent = tf.nn.sigmoid_cross_entropy_with_logits(logits=self.logits, labels=self.Y)\n",
+    "        crossent = crossent * mask\n",
+    "        crossent = tf.reduce_sum(crossent)\n",
+    "        total_size = tf.reduce_sum(mask)\n",
+    "        self.cost = tf.div_no_nan(crossent, total_size)\n",
+    "        \n",
+    "        self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n",
+    "        \n",
+    "        l = tf.round(tf.sigmoid(self.logits))\n",
+    "        self.accuracy = tf.reduce_mean(tf.cast(tf.boolean_mask(l, tf.equal(self.Y, 1)), tf.float32))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "size_layer = 256\n",
+    "embedded_size = 256\n",
+    "learning_rate = 1e-3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From <ipython-input-6-385ed069ad9c>:98: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use `tf.cast` instead.\n",
+      "WARNING:tensorflow:From <ipython-input-6-385ed069ad9c>:68: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use keras.layers.dense instead.\n",
+      "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
+      "WARNING:tensorflow:From <ipython-input-6-385ed069ad9c>:41: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
+      "WARNING:tensorflow:From <ipython-input-6-385ed069ad9c>:30: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "Instructions for updating:\n",
+      "Use keras.layers.dropout instead.\n"
+     ]
+    }
+   ],
+   "source": [
+    "tf.reset_default_graph()\n",
+    "sess = tf.InteractiveSession()\n",
+    "model = Model(size_layer,embedded_size,len(dictionary),learning_rate)\n",
+    "sess.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "UNK = 3\n",
+    "\n",
+    "def str_idx(corpus, dic):\n",
+    "    X = []\n",
+    "    for i in corpus:\n",
+    "        ints = []\n",
+    "        for k in i.split():\n",
+    "            ints.append(dic.get(k,UNK))\n",
+    "        X.append(ints)\n",
+    "    return X\n",
+    "\n",
+    "def pad_sentence_batch(sentence_batch, pad_int):\n",
+    "    padded_seqs = []\n",
+    "    seq_lens = []\n",
+    "    max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n",
+    "    for sentence in sentence_batch:\n",
+    "        padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n",
+    "        seq_lens.append(len(sentence))\n",
+    "    return padded_seqs, seq_lens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_X = str_idx(dataset['train_texts'], dictionary)\n",
+    "test_X = str_idx(dataset['test_texts'], dictionary)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_clss = dataset['train_clss']\n",
+    "test_clss = dataset['test_clss']\n",
+    "train_Y = dataset['train_labels']\n",
+    "test_Y = dataset['test_labels']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(1.0, 1.4390177)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(train_X[:64], 0)\n",
+    "batch_y, _ = pad_sentence_batch(train_Y[:64], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(train_clss[:64], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.Y: batch_y,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n",
+    "acc, loss"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.0167, cost=0.376] \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.00641, cost=0.397]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, training avg loss 0.387910, training avg acc 0.008459\n",
+      "epoch 1, testing avg loss 0.378677, testing avg acc 0.002847\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.0167, cost=0.37]  \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.67it/s, accuracy=0.00641, cost=0.393]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 2, training avg loss 0.374503, training avg acc 0.029881\n",
+      "epoch 2, testing avg loss 0.375109, testing avg acc 0.003962\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.1, cost=0.349]   \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.68it/s, accuracy=0.0321, cost=0.397]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 3, training avg loss 0.365837, training avg acc 0.061967\n",
+      "epoch 3, testing avg loss 0.378205, testing avg acc 0.052813\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.225, cost=0.324] \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.67it/s, accuracy=0.0705, cost=0.406]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 4, training avg loss 0.351095, training avg acc 0.128159\n",
+      "epoch 4, testing avg loss 0.392094, testing avg acc 0.101840\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.358, cost=0.299]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.67it/s, accuracy=0.154, cost=0.426]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 5, training avg loss 0.329346, training avg acc 0.222128\n",
+      "epoch 5, testing avg loss 0.413041, testing avg acc 0.174466\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.433, cost=0.262]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.67it/s, accuracy=0.135, cost=0.455]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 6, training avg loss 0.305964, training avg acc 0.309037\n",
+      "epoch 6, testing avg loss 0.436760, testing avg acc 0.183248\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.533, cost=0.219]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.141, cost=0.492]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 7, training avg loss 0.281421, training avg acc 0.392515\n",
+      "epoch 7, testing avg loss 0.472197, testing avg acc 0.184607\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.7, cost=0.186]  \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.173, cost=0.543]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 8, training avg loss 0.252382, training avg acc 0.477697\n",
+      "epoch 8, testing avg loss 0.522769, testing avg acc 0.221310\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:43<00:00,  2.87it/s, accuracy=0.742, cost=0.142]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.173, cost=0.641]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 9, training avg loss 0.225019, training avg acc 0.549972\n",
+      "epoch 9, testing avg loss 0.597408, testing avg acc 0.246061\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:42<00:00,  2.87it/s, accuracy=0.775, cost=0.121]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:38<00:00,  7.60it/s, accuracy=0.154, cost=0.644]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 10, training avg loss 0.203076, training avg acc 0.606897\n",
+      "epoch 10, testing avg loss 0.594545, testing avg acc 0.227027\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:37<00:00,  2.91it/s, accuracy=0.792, cost=0.102]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.61it/s, accuracy=0.179, cost=0.728]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 11, training avg loss 0.189718, training avg acc 0.639519\n",
+      "epoch 11, testing avg loss 0.670387, testing avg acc 0.249209\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:37<00:00,  2.91it/s, accuracy=0.883, cost=0.0795]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.61it/s, accuracy=0.167, cost=0.775]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 12, training avg loss 0.168331, training avg acc 0.688725\n",
+      "epoch 12, testing avg loss 0.721593, testing avg acc 0.229948\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.892, cost=0.0754]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.167, cost=0.728]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 13, training avg loss 0.147944, training avg acc 0.734141\n",
+      "epoch 13, testing avg loss 0.681439, testing avg acc 0.234368\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.9, cost=0.0825]  \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.64it/s, accuracy=0.192, cost=0.824]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 14, training avg loss 0.135995, training avg acc 0.758445\n",
+      "epoch 14, testing avg loss 0.749728, testing avg acc 0.255525\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.958, cost=0.0592]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.64it/s, accuracy=0.205, cost=0.873]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 15, training avg loss 0.127149, training avg acc 0.778638\n",
+      "epoch 15, testing avg loss 0.811870, testing avg acc 0.269463\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.9, cost=0.0623]  \n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.66it/s, accuracy=0.186, cost=0.957]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 16, training avg loss 0.110549, training avg acc 0.812205\n",
+      "epoch 16, testing avg loss 0.862229, testing avg acc 0.248714\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.875, cost=0.0698]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.64it/s, accuracy=0.212, cost=0.879]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 17, training avg loss 0.099921, training avg acc 0.834709\n",
+      "epoch 17, testing avg loss 0.792213, testing avg acc 0.235911\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.933, cost=0.0742]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.205, cost=0.793]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 18, training avg loss 0.099067, training avg acc 0.837627\n",
+      "epoch 18, testing avg loss 0.752785, testing avg acc 0.243941\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.933, cost=0.0539]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.205, cost=0.932]\n",
+      "minibatch loop:   0%|          | 0/1156 [00:00<?, ?it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 19, training avg loss 0.087893, training avg acc 0.859114\n",
+      "epoch 19, testing avg loss 0.854014, testing avg acc 0.258605\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00,  2.92it/s, accuracy=0.933, cost=0.0515]\n",
+      "minibatch loop: 100%|██████████| 289/289 [00:37<00:00,  7.65it/s, accuracy=0.179, cost=0.983]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 20, training avg loss 0.085529, training avg acc 0.863731\n",
+      "epoch 20, testing avg loss 0.869859, testing avg acc 0.236150\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tqdm\n",
+    "\n",
+    "batch_size = 64\n",
+    "epoch = 20\n",
+    "\n",
+    "for e in range(epoch):\n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(train_X), batch_size), desc = 'minibatch loop')\n",
+    "    train_loss, train_acc, test_loss, test_acc = [], [], [], []\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(train_X))\n",
+    "        batch_x, _ = pad_sentence_batch(train_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(train_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(train_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss, _ = sess.run([model.accuracy,model.cost,model.optimizer],\n",
+    "                                    feed_dict = feed)\n",
+    "        train_loss.append(loss)\n",
+    "        train_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    pbar = tqdm.tqdm(\n",
+    "        range(0, len(test_X), batch_size), desc = 'minibatch loop')\n",
+    "    for i in pbar:\n",
+    "        index = min(i + batch_size, len(test_X))\n",
+    "        batch_x, _ = pad_sentence_batch(test_X[i : index], 0)\n",
+    "        batch_y, _ = pad_sentence_batch(test_Y[i : index], 0)\n",
+    "        batch_clss, _ = pad_sentence_batch(test_clss[i : index], -1)\n",
+    "        batch_clss = np.array(batch_clss)\n",
+    "        batch_mask = 1 - (batch_clss == -1)\n",
+    "        batch_clss[batch_clss == -1] = 0\n",
+    "        feed = {model.X: batch_x,\n",
+    "                model.Y: batch_y,\n",
+    "                model.mask: batch_mask,\n",
+    "                model.clss: batch_clss}\n",
+    "        accuracy, loss = sess.run([model.accuracy,model.cost],\n",
+    "                                    feed_dict = feed)\n",
+    "\n",
+    "        test_loss.append(loss)\n",
+    "        test_acc.append(accuracy)\n",
+    "        pbar.set_postfix(cost = loss, accuracy = accuracy)\n",
+    "    \n",
+    "    print('epoch %d, training avg loss %f, training avg acc %f'%(e+1,\n",
+    "                                                                 np.mean(train_loss),np.mean(train_acc)))\n",
+    "    print('epoch %d, testing avg loss %f, testing avg acc %f'%(e+1,\n",
+    "                                                              np.mean(test_loss),np.mean(test_acc)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "from tensorflow.keras.preprocessing import sequence\n",
+    "\n",
+    "def calculate_rouges(predicted, batch_y):\n",
+    "    non = np.count_nonzero(batch_y, axis = 1)\n",
+    "    o = []\n",
+    "    for n in non:\n",
+    "        o.append([True for _ in range(n)])\n",
+    "    b = sequence.pad_sequences(o, dtype = np.bool, padding = 'post', value = False)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    rouges = []\n",
+    "    for i in range(predicted.shape[0]):\n",
+    "        a = batch_y[i][b[i]]\n",
+    "        p = predicted[i][b[i]]\n",
+    "        rouges.append(rouge.rouge_n([p], [a]))\n",
+    "    return np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch_x, _ = pad_sentence_batch(test_X[: 5], 0)\n",
+    "batch_y, _ = pad_sentence_batch(test_Y[: 5], 0)\n",
+    "batch_clss, _ = pad_sentence_batch(test_clss[: 5], -1)\n",
+    "batch_clss = np.array(batch_clss)\n",
+    "batch_y = np.array(batch_y)\n",
+    "batch_x = np.array(batch_x)\n",
+    "cp_batch_clss = batch_clss.copy()\n",
+    "batch_mask = 1 - (batch_clss == -1)\n",
+    "batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "feed = {model.X: batch_x,\n",
+    "        model.mask: batch_mask,\n",
+    "        model.clss: batch_clss}\n",
+    "predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.19125411"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tensor2tensor.utils import rouge\n",
+    "\n",
+    "def calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x):\n",
+    "    f, y_, predicted_ = [], [], []\n",
+    "    for i in range(len(cp_batch_clss)):\n",
+    "        f.append(cp_batch_clss[i][cp_batch_clss[i] != -1])\n",
+    "        y_.append(batch_y[i][cp_batch_clss[i] != -1])\n",
+    "        predicted_.append(predicted[i][cp_batch_clss[i] != -1])\n",
+    "    \n",
+    "    actual, predict = [], []\n",
+    "    for i in range(len(f)):\n",
+    "        actual_, predict_ = [], []\n",
+    "        for k in range(len(f[i])):\n",
+    "            if k == (len(f[i]) - 1):\n",
+    "                s = batch_x[i][f[i][k]:]\n",
+    "                s = s[s != 0]\n",
+    "            else:\n",
+    "                s = batch_x[i][f[i][k]: f[i][k + 1]]\n",
+    "            s = [w for w in s if w not in [0, 1, 2, 3, 5, 6, 7, 8]]\n",
+    "            if y_[i][k]:\n",
+    "                actual_.extend(s)\n",
+    "            if predicted_[i][k]:\n",
+    "                predict_.extend(s)\n",
+    "        actual.append(actual_)\n",
+    "        predict.append(predict_)\n",
+    "    return rouge.rouge_n(predict, actual)\n",
+    "\n",
+    "calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tqdm import tqdm as tqdm_base\n",
+    "def tqdm(*args, **kwargs):\n",
+    "    if hasattr(tqdm_base, '_instances'):\n",
+    "        for instance in list(tqdm_base._instances):\n",
+    "            tqdm_base._decr_instances(instance)\n",
+    "    return tqdm_base(*args, **kwargs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "minibatch loop: 100%|██████████| 2312/2312 [14:40<00:00,  2.63it/s, rouge=0.211]\n"
+     ]
+    }
+   ],
+   "source": [
+    "rouges = []\n",
+    "\n",
+    "pbar = tqdm(\n",
+    "    range(0, len(test_X), 32), desc = 'minibatch loop')\n",
+    "for i in pbar:\n",
+    "    index = min(i + batch_size, len(test_X))\n",
+    "    batch_x, _ = pad_sentence_batch(test_X[i: index], 0)\n",
+    "    batch_y, _ = pad_sentence_batch(test_Y[i: index], 0)\n",
+    "    batch_clss, _ = pad_sentence_batch(test_clss[i: index], -1)\n",
+    "    batch_clss = np.array(batch_clss)\n",
+    "    batch_y = np.array(batch_y)\n",
+    "    batch_x = np.array(batch_x)\n",
+    "    cp_batch_clss = batch_clss.copy()\n",
+    "    batch_mask = 1 - (batch_clss == -1)\n",
+    "    batch_clss[batch_clss == -1] = 0\n",
+    "\n",
+    "    feed = {model.X: batch_x,\n",
+    "            model.mask: batch_mask,\n",
+    "            model.clss: batch_clss}\n",
+    "    predicted = sess.run(tf.round(tf.sigmoid(model.logits)), feed_dict = feed)\n",
+    "    rouge_ = calculate_rouge(predicted, batch_y, cp_batch_clss, batch_x)\n",
+    "    rouges.append(rouge_)\n",
+    "    pbar.set_postfix(rouge = rouge_)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.26330408"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "np.mean(rouges)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/extractive-summarization/download-data.ipynb b/extractive-summarization/download-data.ipynb
new file mode 100644
index 0000000..1ed961b
--- /dev/null
+++ b/extractive-summarization/download-data.ipynb
@@ -0,0 +1,55 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# !pip3 install googledrivedownloader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from google_drive_downloader import GoogleDriveDownloader as gdd\n",
+    "\n",
+    "id = '0BwmD_VLjROrfTHk4NFg2SndKcjQ'\n",
+    "gdd.download_file_from_google_drive(file_id=id, dest_path='./cnn.tgz')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tar -zxf cnn.tgz"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/extractive-summarization/preprocessing-data.ipynb b/extractive-summarization/preprocessing-data.ipynb
new file mode 100644
index 0000000..6c4de05
--- /dev/null
+++ b/extractive-summarization/preprocessing-data.ipynb
@@ -0,0 +1,448 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n",
+      "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n",
+      "  np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# !pip3 install malaya\n",
+    "\n",
+    "import malaya\n",
+    "import re\n",
+    "from malaya.texts._text_functions import split_into_sentences\n",
+    "from malaya.texts import _regex\n",
+    "\n",
+    "tokenizer = malaya.preprocessing._tokenizer\n",
+    "splitter = split_into_sentences"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "92579"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import glob\n",
+    "\n",
+    "stories = glob.glob('cnn/stories/*.story')\n",
+    "len(stories)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def is_number_regex(s):\n",
+    "    if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n",
+    "        return s.isdigit()\n",
+    "    return True\n",
+    "\n",
+    "def preprocessing(string):\n",
+    "    string = re.sub('[^\\'\"A-Za-z\\-(),.$0-9 ]+', ' ', string.lower())\n",
+    "    tokenized = tokenizer(string)\n",
+    "    tokens = []\n",
+    "    for w in tokenized:\n",
+    "        if is_number_regex(w):\n",
+    "            tokens.append('<NUM>')\n",
+    "        elif re.match(_regex._money, w):\n",
+    "            tokens.append('<MONEY>')\n",
+    "        elif re.match(_regex._date, w):\n",
+    "            tokens.append('<DATE>')\n",
+    "        else:\n",
+    "            tokens.append(w)\n",
+    "    return tokens\n",
+    "\n",
+    "def split_story(doc):\n",
+    "    index = doc.find('@highlight')\n",
+    "    story, highlights = doc[:index], doc[index:].split('@highlight')\n",
+    "    highlights = [h.strip() for h in highlights if len(h) > 0]\n",
+    "    stories = []\n",
+    "    for s in splitter(story):\n",
+    "        stories.append(preprocessing(s))\n",
+    "    summaries = []\n",
+    "    for s in highlights:\n",
+    "        summaries.append(preprocessing(s))\n",
+    "    return stories, summaries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "min_src_nsents = 3\n",
+    "max_src_nsents = 20\n",
+    "min_src_ntokens_per_sent = 5\n",
+    "max_src_ntokens_per_sent = 30\n",
+    "min_tgt_ntokens = 5\n",
+    "max_tgt_ntokens = 500\n",
+    "sep_token = '[SEP]'\n",
+    "cls_token = '[CLS]'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(stories[0]) as fopen:\n",
+    "    story = fopen.read()\n",
+    "story, highlights = split_story(story)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def _get_ngrams(n, text):\n",
+    "    ngram_set = set()\n",
+    "    text_length = len(text)\n",
+    "    max_index_ngram_start = text_length - n\n",
+    "    for i in range(max_index_ngram_start + 1):\n",
+    "        ngram_set.add(tuple(text[i:i + n]))\n",
+    "    return ngram_set\n",
+    "\n",
+    "\n",
+    "def _get_word_ngrams(n, sentences):\n",
+    "    assert len(sentences) > 0\n",
+    "    assert n > 0\n",
+    "\n",
+    "    words = sum(sentences, [])\n",
+    "    return _get_ngrams(n, words)\n",
+    "\n",
+    "def cal_rouge(evaluated_ngrams, reference_ngrams):\n",
+    "    reference_count = len(reference_ngrams)\n",
+    "    evaluated_count = len(evaluated_ngrams)\n",
+    "\n",
+    "    overlapping_ngrams = evaluated_ngrams.intersection(reference_ngrams)\n",
+    "    overlapping_count = len(overlapping_ngrams)\n",
+    "\n",
+    "    if evaluated_count == 0:\n",
+    "        precision = 0.0\n",
+    "    else:\n",
+    "        precision = overlapping_count / evaluated_count\n",
+    "\n",
+    "    if reference_count == 0:\n",
+    "        recall = 0.0\n",
+    "    else:\n",
+    "        recall = overlapping_count / reference_count\n",
+    "\n",
+    "    f1_score = 2.0 * ((precision * recall) / (precision + recall + 1e-8))\n",
+    "    return {\"f\": f1_score, \"p\": precision, \"r\": recall}\n",
+    "\n",
+    "\n",
+    "def greedy_selection(doc_sent_list, abstract_sent_list, summary_size):\n",
+    "    def _rouge_clean(s):\n",
+    "        return re.sub(r'[^a-zA-Z0-9 ]', '', s)\n",
+    "\n",
+    "    max_rouge = 0.0\n",
+    "    abstract = sum(abstract_sent_list, [])\n",
+    "    abstract = _rouge_clean(' '.join(abstract)).split()\n",
+    "    sents = [_rouge_clean(' '.join(s)).split() for s in doc_sent_list]\n",
+    "    evaluated_1grams = [_get_word_ngrams(1, [sent]) for sent in sents]\n",
+    "    reference_1grams = _get_word_ngrams(1, [abstract])\n",
+    "    evaluated_2grams = [_get_word_ngrams(2, [sent]) for sent in sents]\n",
+    "    reference_2grams = _get_word_ngrams(2, [abstract])\n",
+    "\n",
+    "    selected = []\n",
+    "    for s in range(summary_size):\n",
+    "        cur_max_rouge = max_rouge\n",
+    "        cur_id = -1\n",
+    "        for i in range(len(sents)):\n",
+    "            if (i in selected):\n",
+    "                continue\n",
+    "            c = selected + [i]\n",
+    "            candidates_1 = [evaluated_1grams[idx] for idx in c]\n",
+    "            candidates_1 = set.union(*map(set, candidates_1))\n",
+    "            candidates_2 = [evaluated_2grams[idx] for idx in c]\n",
+    "            candidates_2 = set.union(*map(set, candidates_2))\n",
+    "            rouge_1 = cal_rouge(candidates_1, reference_1grams)['f']\n",
+    "            rouge_2 = cal_rouge(candidates_2, reference_2grams)['f']\n",
+    "            rouge_score = rouge_1 + rouge_2\n",
+    "            if rouge_score > cur_max_rouge:\n",
+    "                cur_max_rouge = rouge_score\n",
+    "                cur_id = i\n",
+    "        if (cur_id == -1):\n",
+    "            return selected\n",
+    "        selected.append(cur_id)\n",
+    "        max_rouge = cur_max_rouge\n",
+    "\n",
+    "    return sorted(selected)\n",
+    "\n",
+    "def get_xy(story, highlights):\n",
+    "    idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n",
+    "    \n",
+    "    idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n",
+    "\n",
+    "    src = [story[i][:max_src_ntokens_per_sent] for i in idxs]\n",
+    "    src = src[:max_src_nsents]\n",
+    "\n",
+    "    sent_labels = greedy_selection(src, highlights, 3)\n",
+    "\n",
+    "    _sent_labels = [0] * len(src)\n",
+    "    for l in sent_labels:\n",
+    "        _sent_labels[l] = 1\n",
+    "    _sent_labels\n",
+    "    \n",
+    "    src_txt = [' '.join(sent) for sent in src]\n",
+    "    text = ' {} {} '.format(sep_token, cls_token).join(src_txt)\n",
+    "    text = '[CLS] %s [SEP]'%(text)\n",
+    "    cls_ids = [i for i, t in enumerate(text.split()) if t == cls_token]\n",
+    "    \n",
+    "    return text, cls_ids, _sent_labels"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import collections\n",
+    "import json\n",
+    "\n",
+    "def build_dataset(words, n_words, atleast=1):\n",
+    "    count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n",
+    "    counter = collections.Counter(words).most_common(n_words)\n",
+    "    counter = [i for i in counter if i[1] >= atleast]\n",
+    "    count.extend(counter)\n",
+    "    dictionary = dict()\n",
+    "    for word, _ in count:\n",
+    "        dictionary[word] = len(dictionary)\n",
+    "    data = list()\n",
+    "    unk_count = 0\n",
+    "    for word in words:\n",
+    "        index = dictionary.get(word, 0)\n",
+    "        if index == 0:\n",
+    "            unk_count += 1\n",
+    "        data.append(index)\n",
+    "    count[0][1] = unk_count\n",
+    "    reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n",
+    "    return data, count, dictionary, reversed_dictionary"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# from dask import delayed\n",
+    "# import dask\n",
+    "\n",
+    "# def process(i):\n",
+    "#     with open(stories[i]) as fopen:\n",
+    "#         story = fopen.read()\n",
+    "#     story, highlights = split_story(story)\n",
+    "#     return get_xy(story, highlights)\n",
+    "\n",
+    "# train = []\n",
+    "# for i in range(len(stories)):\n",
+    "#     im = delayed(process)(i)\n",
+    "#     train.append(im)\n",
+    "    \n",
+    "# train = dask.compute(*train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(stories[1]) as fopen:\n",
+    "    story = fopen.read()\n",
+    "story, highlights = split_story(story)\n",
+    "text, cls_ids, sent_labels = get_xy(story, highlights)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(20, 20, 560)"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(sent_labels), len(cls_ids), len(text.split())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 92579/92579 [16:41<00:00, 92.41it/s] \n"
+     ]
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "\n",
+    "texts, clss, labels = [], [], []\n",
+    "\n",
+    "for i in tqdm(range(len(stories))):\n",
+    "    with open(stories[i]) as fopen:\n",
+    "        story = fopen.read()\n",
+    "    story, highlights = split_story(story)\n",
+    "    text, cls_ids, sent_labels = get_xy(story, highlights)\n",
+    "    if len(cls_ids) != len(sent_labels):\n",
+    "        continue\n",
+    "    texts.append(text)\n",
+    "    clss.append(cls_ids)\n",
+    "    labels.append(sent_labels)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "vocab from size: 118356\n",
+      "Most common words [('the', 1974502), (',', 1740960), ('[CLS]', 1668596), ('[SEP]', 1668596), ('.', 1284463), ('to', 844716)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "concat = ' '.join(texts).split()\n",
+    "vocabulary_size = len(list(set(concat)))\n",
+    "_, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size, atleast = 2)\n",
+    "print('vocab from size: %d'%(len(dictionary)))\n",
+    "print('Most common words', count[4:10])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "train_texts, test_texts, train_clss, test_clss, train_labels, test_labels = \\\n",
+    "train_test_split(texts, clss, labels, test_size = 0.2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pickle\n",
+    "\n",
+    "with open('dataset.pkl', 'wb') as fopen:\n",
+    "    pickle.dump({'train_texts': train_texts,\n",
+    "                'test_texts': test_texts,\n",
+    "                'train_clss': train_clss,\n",
+    "                'test_clss': test_clss,\n",
+    "                'train_labels': train_labels,\n",
+    "                'test_labels': test_labels}, fopen)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('dictionary.pkl', 'wb') as fopen:\n",
+    "    pickle.dump({'dictionary': dictionary, 'rev_dictionary': rev_dictionary}, fopen)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/neural-machine-translation/43.attention-is-all-you-need-beam.ipynb b/neural-machine-translation/43.attention-is-all-you-need-beam.ipynb
index 430f59b..340ca63 100644
--- a/neural-machine-translation/43.attention-is-all-you-need-beam.ipynb
+++ b/neural-machine-translation/43.attention-is-all-you-need-beam.ipynb
@@ -7,7 +7,7 @@
    "outputs": [],
    "source": [
     "import os\n",
-    "os.environ['CUDA_VISIBLE_DEVICES'] = '2'"
+    "os.environ['CUDA_VISIBLE_DEVICES'] = '0'"
    ]
   },
   {
@@ -303,12 +303,13 @@
     "            encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, x)\n",
     "            en_masks = tf.sign(x)\n",
     "            encoder_embedded += sinusoidal_position_encoding(x, en_masks, size_layer)\n",
+    "            enc = encoder_embedded\n",
     "            \n",
     "            for i in range(num_blocks):\n",
     "                with tf.variable_scope('encoder_self_attn_%d'%i,reuse=reuse):\n",
-    "                    enc = multihead_attention(queries=encoder_embedded,\n",
-    "                                              keys=encoder_embedded,\n",
-    "                                              values=encoder_embedded,\n",
+    "                    enc = multihead_attention(queries=enc,\n",
+    "                                              keys=enc,\n",
+    "                                              values=enc,\n",
     "                                              num_heads=num_heads,\n",
     "                                              causality=False)\n",
     "                    enc = ff(enc, num_units=[size_layer * ratio_hidden, size_layer])\n",
@@ -399,19 +400,19 @@
       "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "reduction_indices is deprecated, use axis instead\n",
-      "WARNING:tensorflow:From <ipython-input-10-474210a7070d>:102: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
+      "WARNING:tensorflow:From <ipython-input-10-dc53d01461b8>:102: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "Use `tf.cast` instead.\n",
-      "WARNING:tensorflow:From <ipython-input-10-474210a7070d>:72: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "WARNING:tensorflow:From <ipython-input-10-dc53d01461b8>:72: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "Use keras.layers.dense instead.\n",
       "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "Call initializer instance with the dtype argument instead of passing it to the constructor\n",
-      "WARNING:tensorflow:From <ipython-input-10-474210a7070d>:45: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
+      "WARNING:tensorflow:From <ipython-input-10-dc53d01461b8>:45: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "Use tf.where in 2.0, which has the same broadcast rule as np.where\n",
-      "WARNING:tensorflow:From <ipython-input-10-474210a7070d>:34: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
+      "WARNING:tensorflow:From <ipython-input-10-dc53d01461b8>:34: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
       "Instructions for updating:\n",
       "Use keras.layers.dropout instead.\n",
       "WARNING:tensorflow:\n",
@@ -498,339 +499,339 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:30<00:00,  4.20it/s, accuracy=0.153, cost=6.59] \n",
-      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.85it/s, accuracy=0.158, cost=6.6] \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:21,  5.31it/s, accuracy=0.195, cost=5.64]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:42<00:00,  3.45it/s, accuracy=0.142, cost=6.66] \n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  7.95it/s, accuracy=0.144, cost=6.6] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 1, training avg loss 8.955545, training avg acc 0.108270\n",
-      "epoch 1, testing avg loss 5.840093, testing avg acc 0.185351\n"
+      "epoch 1, training avg loss 9.061677, training avg acc 0.098744\n",
+      "epoch 1, testing avg loss 5.826741, testing avg acc 0.173797\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:28<00:00,  4.23it/s, accuracy=0.21, cost=5.53] \n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.12it/s, accuracy=0.22, cost=5.7]  \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:26,  5.21it/s, accuracy=0.285, cost=4.75]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.194, cost=5.72]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.06it/s, accuracy=0.185, cost=5.76]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 2, training avg loss 5.173765, training avg acc 0.243477\n",
-      "epoch 2, testing avg loss 4.984414, testing avg acc 0.264366\n"
+      "epoch 2, training avg loss 5.312159, training avg acc 0.214993\n",
+      "epoch 2, testing avg loss 5.052103, testing avg acc 0.233469\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:29<00:00,  4.21it/s, accuracy=0.279, cost=4.91]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.81it/s, accuracy=0.255, cost=5.25]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:27,  5.19it/s, accuracy=0.325, cost=4.32]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.238, cost=5.25]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.08it/s, accuracy=0.197, cost=5.42]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 3, training avg loss 4.533984, training avg acc 0.306413\n",
-      "epoch 3, testing avg loss 4.555390, testing avg acc 0.311855\n"
+      "epoch 3, training avg loss 4.778862, training avg acc 0.258775\n",
+      "epoch 3, testing avg loss 4.727597, testing avg acc 0.258978\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:20<00:00,  4.34it/s, accuracy=0.317, cost=4.47]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.90it/s, accuracy=0.292, cost=4.98]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:26,  5.22it/s, accuracy=0.365, cost=3.98]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.288, cost=4.86]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.05it/s, accuracy=0.24, cost=5.07] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 4, training avg loss 4.126750, training avg acc 0.349728\n",
-      "epoch 4, testing avg loss 4.273607, testing avg acc 0.342725\n"
+      "epoch 4, training avg loss 4.421953, training avg acc 0.296155\n",
+      "epoch 4, testing avg loss 4.400771, testing avg acc 0.299211\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:19<00:00,  4.34it/s, accuracy=0.361, cost=4.04]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.08it/s, accuracy=0.306, cost=4.79]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:31,  5.11it/s, accuracy=0.398, cost=3.71]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.327, cost=4.43]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.07it/s, accuracy=0.277, cost=4.75]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 5, training avg loss 3.816826, training avg acc 0.383024\n",
-      "epoch 5, testing avg loss 4.091312, testing avg acc 0.360991\n"
+      "epoch 5, training avg loss 4.059185, training avg acc 0.342010\n",
+      "epoch 5, testing avg loss 4.104659, testing avg acc 0.337337\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.402, cost=3.65]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.12it/s, accuracy=0.314, cost=4.68]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:22,  5.30it/s, accuracy=0.426, cost=3.47]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.36, cost=4.03] \n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.07it/s, accuracy=0.309, cost=4.54]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 6, training avg loss 3.562186, training avg acc 0.410961\n",
-      "epoch 6, testing avg loss 3.968070, testing avg acc 0.373758\n"
+      "epoch 6, training avg loss 3.753322, training avg acc 0.379169\n",
+      "epoch 6, testing avg loss 3.947848, testing avg acc 0.354884\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.457, cost=3.24]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.96it/s, accuracy=0.315, cost=4.65]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:17,  5.40it/s, accuracy=0.458, cost=3.26]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.418, cost=3.59]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.03it/s, accuracy=0.313, cost=4.44]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 7, training avg loss 3.342103, training avg acc 0.435845\n",
-      "epoch 7, testing avg loss 3.900058, testing avg acc 0.381138\n"
+      "epoch 7, training avg loss 3.494035, training avg acc 0.410474\n",
+      "epoch 7, testing avg loss 3.832910, testing avg acc 0.370797\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.498, cost=2.87]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.12it/s, accuracy=0.33, cost=4.67] \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:21,  5.32it/s, accuracy=0.483, cost=3.08]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.476, cost=3.15]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.05it/s, accuracy=0.313, cost=4.41]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 8, training avg loss 3.146801, training avg acc 0.459015\n",
-      "epoch 8, testing avg loss 3.896288, testing avg acc 0.386253\n"
+      "epoch 8, training avg loss 3.266684, training avg acc 0.438682\n",
+      "epoch 8, testing avg loss 3.759165, testing avg acc 0.383419\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.545, cost=2.49]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.08it/s, accuracy=0.326, cost=4.77]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:26,  5.21it/s, accuracy=0.498, cost=2.93]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:40<00:00,  3.47it/s, accuracy=0.532, cost=2.69]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.03it/s, accuracy=0.328, cost=4.38]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 9, training avg loss 2.972501, training avg acc 0.480463\n",
-      "epoch 9, testing avg loss 3.959981, testing avg acc 0.386002\n"
+      "epoch 9, training avg loss 3.068970, training avg acc 0.463571\n",
+      "epoch 9, testing avg loss 3.747680, testing avg acc 0.386675\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.6, cost=2.12]  \n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.06it/s, accuracy=0.319, cost=4.91]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:29,  5.16it/s, accuracy=0.52, cost=2.77]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:41<00:00,  3.46it/s, accuracy=0.537, cost=2.41]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.08it/s, accuracy=0.32, cost=4.42] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 10, training avg loss 2.818905, training avg acc 0.499701\n",
-      "epoch 10, testing avg loss 4.049783, testing avg acc 0.387150\n"
+      "epoch 10, training avg loss 2.891522, training avg acc 0.486224\n",
+      "epoch 10, testing avg loss 3.775118, testing avg acc 0.391871\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.661, cost=1.82]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.10it/s, accuracy=0.32, cost=5.02] \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:20,  5.33it/s, accuracy=0.533, cost=2.61]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:40<00:00,  3.47it/s, accuracy=0.607, cost=2.03]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.01it/s, accuracy=0.321, cost=4.58]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 11, training avg loss 2.685967, training avg acc 0.515953\n",
-      "epoch 11, testing avg loss 4.098845, testing avg acc 0.388923\n"
+      "epoch 11, training avg loss 2.731236, training avg acc 0.506997\n",
+      "epoch 11, testing avg loss 3.821919, testing avg acc 0.392999\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.69, cost=1.6]  \n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.15it/s, accuracy=0.316, cost=4.91]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:23,  5.27it/s, accuracy=0.556, cost=2.42]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.666, cost=1.77]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.06it/s, accuracy=0.317, cost=4.54]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 12, training avg loss 2.567834, training avg acc 0.530294\n",
-      "epoch 12, testing avg loss 4.018679, testing avg acc 0.396831\n"
+      "epoch 12, training avg loss 2.584878, training avg acc 0.526294\n",
+      "epoch 12, testing avg loss 3.767745, testing avg acc 0.401960\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:18<00:00,  4.36it/s, accuracy=0.737, cost=1.37]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.07it/s, accuracy=0.325, cost=4.85]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:20,  5.32it/s, accuracy=0.574, cost=2.26]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.714, cost=1.5] \n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.08it/s, accuracy=0.31, cost=4.56] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 13, training avg loss 2.453018, training avg acc 0.545141\n",
-      "epoch 13, testing avg loss 4.005711, testing avg acc 0.397211\n"
+      "epoch 13, training avg loss 2.453064, training avg acc 0.543468\n",
+      "epoch 13, testing avg loss 3.786662, testing avg acc 0.401954\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:19<00:00,  4.34it/s, accuracy=0.785, cost=1.16]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.07it/s, accuracy=0.313, cost=4.84]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:22,  5.29it/s, accuracy=0.6, cost=2.14]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.761, cost=1.26]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.08it/s, accuracy=0.31, cost=4.62] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 14, training avg loss 2.338308, training avg acc 0.560455\n",
-      "epoch 14, testing avg loss 4.023633, testing avg acc 0.392258\n"
+      "epoch 14, training avg loss 2.328101, training avg acc 0.560009\n",
+      "epoch 14, testing avg loss 3.845014, testing avg acc 0.397056\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:20<00:00,  4.33it/s, accuracy=0.802, cost=0.994]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.01it/s, accuracy=0.314, cost=4.88]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:22,  5.29it/s, accuracy=0.614, cost=2.01]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.784, cost=1.07]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.03it/s, accuracy=0.302, cost=4.78]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 15, training avg loss 2.229293, training avg acc 0.575232\n",
-      "epoch 15, testing avg loss 4.066958, testing avg acc 0.388447\n"
+      "epoch 15, training avg loss 2.204450, training avg acc 0.576944\n",
+      "epoch 15, testing avg loss 3.944659, testing avg acc 0.395583\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:21<00:00,  4.32it/s, accuracy=0.84, cost=0.838]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.98it/s, accuracy=0.299, cost=5.04]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:18,  5.36it/s, accuracy=0.617, cost=1.93]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:40<00:00,  3.47it/s, accuracy=0.819, cost=0.888]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.03it/s, accuracy=0.309, cost=4.96]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 16, training avg loss 2.127022, training avg acc 0.589416\n",
-      "epoch 16, testing avg loss 4.188903, testing avg acc 0.383200\n"
+      "epoch 16, training avg loss 2.087303, training avg acc 0.593277\n",
+      "epoch 16, testing avg loss 4.105100, testing avg acc 0.391073\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:21<00:00,  4.32it/s, accuracy=0.855, cost=0.728]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.96it/s, accuracy=0.311, cost=5.25]\n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:23,  5.26it/s, accuracy=0.631, cost=1.83]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.872, cost=0.669]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.01it/s, accuracy=0.323, cost=5.05]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 17, training avg loss 2.032683, training avg acc 0.602592\n",
-      "epoch 17, testing avg loss 4.360122, testing avg acc 0.378222\n"
+      "epoch 17, training avg loss 1.983363, training avg acc 0.607275\n",
+      "epoch 17, testing avg loss 4.217791, testing avg acc 0.387449\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:21<00:00,  4.33it/s, accuracy=0.882, cost=0.598]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.99it/s, accuracy=0.308, cost=5.5] \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:27,  5.19it/s, accuracy=0.646, cost=1.71]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.89, cost=0.555]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.06it/s, accuracy=0.32, cost=5.06] \n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 18, training avg loss 1.948235, training avg acc 0.613778\n",
-      "epoch 18, testing avg loss 4.509265, testing avg acc 0.375841\n"
+      "epoch 18, training avg loss 1.882893, training avg acc 0.621618\n",
+      "epoch 18, testing avg loss 4.276752, testing avg acc 0.385177\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:21<00:00,  4.32it/s, accuracy=0.911, cost=0.493]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 10.99it/s, accuracy=0.305, cost=5.6] \n",
-      "minibatch loop:   0%|          | 1/1389 [00:00<04:24,  5.24it/s, accuracy=0.671, cost=1.57]"
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.48it/s, accuracy=0.907, cost=0.464]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.07it/s, accuracy=0.309, cost=5.14]\n",
+      "minibatch loop:   0%|          | 0/1389 [00:00<?, ?it/s]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 19, training avg loss 1.869872, training avg acc 0.624181\n",
-      "epoch 19, testing avg loss 4.605663, testing avg acc 0.377575\n"
+      "epoch 19, training avg loss 1.787632, training avg acc 0.634967\n",
+      "epoch 19, testing avg loss 4.345780, testing avg acc 0.382855\n"
      ]
     },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "minibatch loop: 100%|██████████| 1389/1389 [05:21<00:00,  4.33it/s, accuracy=0.939, cost=0.417]\n",
-      "minibatch loop: 100%|██████████| 30/30 [00:02<00:00, 11.06it/s, accuracy=0.301, cost=5.7] "
+      "minibatch loop: 100%|██████████| 1389/1389 [06:39<00:00,  3.47it/s, accuracy=0.897, cost=0.469]\n",
+      "minibatch loop: 100%|██████████| 30/30 [00:03<00:00,  9.07it/s, accuracy=0.302, cost=5.26]"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "epoch 20, training avg loss 1.785753, training avg acc 0.636835\n",
-      "epoch 20, testing avg loss 4.688456, testing avg acc 0.378041\n"
+      "epoch 20, training avg loss 1.694954, training avg acc 0.648714\n",
+      "epoch 20, testing avg loss 4.461932, testing avg acc 0.375447\n"
      ]
     },
     {
@@ -927,64 +928,64 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "0 predict: Làm cách nào tôi nói trong <NUM> phút ở đây , về ba thế hệ phụ nữ , về thế hệ sức mạnh của đời sống bằng cách hỗ trợ cuộc sống thực của bà bà ta hơn <NUM> năm sau khi bà bà bà bà bà bà còn nhỏ nhất hơn so với bà bà bà bà ấy hơn <NUM> năm trước hơn so với hơn so với hơn so với hơn so với hơn <NUM> năm so với hơn so với hơn so với hơn so với hơn <NUM> năm chúng ta ? Trong <NUM> câu trả\n",
+      "0 predict: Lúc này tôi có thể nói <NUM> phút còn khoảng <NUM> phút của phụ nữ , về việc làm thế nào một cô bé xuất ra một cô gái <NUM> tuổi , cô bé gái <NUM> tuổi của cao tuổi này với con gái <NUM> tuổi và cô gái đang ở tuổi <NUM> và <NUM> tuổi <NUM> tuổi <NUM> tuổi mà đã giảm số lượng lớn tuổi của cô ấy đã giảm số <NUM> tuổi của Trung Quốc . Một năm trước . . . . . . . . . . . . . . . . .\n",
       "0 actual: Làm sao tôi có thể trình bày trong <NUM> phút về sợi dây liên kết những người phụ nữ qua ba thế hệ , về việc làm thế nào những sợi dây mạnh mẽ đáng kinh ngạc ấy đã níu chặt lấy cuộc sống của một cô bé bốn tuổi co quắp với đứa em gái nhỏ của cô bé , với mẹ và bà trong suốt năm ngày đêm trên con thuyền nhỏ lênh đênh trên Biển Đông hơn <NUM> năm trước , những sợi dây liên kết đã níu lấy cuộc đời cô bé ấy và không bao giờ rời đi - - cô bé ấy giờ sống ở San Francisco và đang nói chuyện với các bạn hôm nay ?\n",
       "\n",
-      "1 predict: Đây không phải là câu chuyện hoàn thiện . Câu chuyện của chúng ta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . đó là một câu chuyện . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "1 predict: Đây không phải là một câu chuyện . . . . . . . . . . . . . . . . . . . . này không phải là một câu chuyện . . . . . . . . . . . . . . . . . . . . . . . . . . này không phải như thế này . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "1 actual: Câu chuyện này chưa kết thúc .\n",
       "\n",
-      "2 predict: Nó là một ứng dụng chúng vẫn đang được hợp lại với nhau . Hãy cùng làm việc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "2 predict: Nó vẫn tiếp tục tôn trọng nhau . Nó vẫn được gắn bó . . . . . . . . . vào nhau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nó là cái nghĩa là là sự xã hội . . . . . . . . . . . .\n",
       "2 actual: Nó là một trò chơi ghép hình vẫn đang được xếp .\n",
       "\n",
-      "3 predict: Để tôi kể cho các bạn vài mảnh giấy phiêu lưu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "3 predict: Để tôi kể các bạn nghe về một số đoạn nhạc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . để kể tôi . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "3 actual: Hãy để tôi kể cho các bạn về vài mảnh ghép nhé .\n",
       "\n",
-      "4 predict: Hãy hình dung : một bong bóng gà đầu tiên của anh ta . Trong công việc của anh ấy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "4 predict: Hãy tưởng tượng sinh vật đầu tiên : Hệ thống cuộc đời của anh ta . . . . . . . . . . . . . . . . . . . . hãy tưởng tượng xem như nhau . . . . . . . . . . . . . . . . . . . hãy tưởng tượng này như nhau tưởng tượng luôn luôn luôn luôn luôn luôn thú tượng tượng luôn luôn đếm tưởng tượng này . . . . . . . . . . . . . . . .\n",
       "4 actual: Hãy tưởng tượng mảnh đầu tiên : một người đàn ông đốt cháy sự nghiệp cả đời mình .\n",
       "\n",
-      "5 predict: Ông ấy là một nhà thơ , một người đàn ông , một người duy nhất đã bị bỏ rơi trên đất nước và sự tự do ngoại trừ sự tự do ý chí . Đó là sự tự do và sự tự do ngôn ngữ và sự tự do . Nó chính là sự tự do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "5 predict: Anh ta là một nhà thơ , một người đàn ông có cuộc sống của mình và tự do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . tự do và tự do . . . . . . . . . . . . . tự do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "5 actual: Ông là nhà thơ , nhà viết kịch , một người mà cả cuộc đời chênh vênh trên tia hi vọng duy nhất rằng đất nước ông sẽ độc lập tự do .\n",
       "\n",
-      "6 predict: Hãy tưởng tượng anh ấy là nơi trên thực tế , đã hoàn toàn xấu xa của anh ta . Thật sự lãng phí . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . thực sự là một sự lãng lãng cốt . . . . . . .\n",
+      "6 predict: Hãy tưởng tượng ông ta là sở hữu , thực tế rằng cuộc đời của ông đã hoàn thành . . . . . . . . . . . . . . . . . . . . . . . . . . . . . hãy tưởng tượng ra một hoàn thách . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "6 actual: Hãy tưởng tượng ông , một người cộng sản tiến vào , đối diện sự thật rằng cả cuộc đời ông đã phí hoài .\n",
       "\n",
-      "7 predict: Từ lúc này , cho hàng loạt những người bạn đâu đó vì ông ấy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "7 predict: Thêm vào đó vì bạn bè . . . . . . . . . . của anh ấy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . từ rất nhiều việc bị kỵ tự nhiên . . . . . . . . . . . . . . như vậy . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "7 actual: Ngôn từ , qua bao năm tháng là bạn đồng hành với ông , giờ quay ra chế giễu ông .\n",
       "\n",
-      "8 predict: Anh ấy đi vào sự tĩnh lặng . Thật ra ngoài . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "8 predict: Anh ta đi vào sự im lặng . Có khi im lặng . . . . . . . . . . . . . . . . . . . khi anh ấy đi vào đèn hư hỏng . . . . . . . . . . . . . . . . . . . . . . . . cởi cởi cởi cởi chán khoá khoá khoá khoá . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "8 actual: Ông rút lui vào yên lặng .\n",
       "\n",
-      "9 predict: Anh ta chết đuối nháy . Anh đã chết lịch sử . Anh ấy đã chết . Anh ấy đã chết . Anh ấy đã chết . Anh ấy đã chết lịch sử . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "9 predict: Anh ta mất đi bằng lịch sử . . . . . . . . . . . . . . . . . . . . . . . . . . đã qua đời . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . này kể đời đời . . . . . . . . . . . . . . . . . . . . .\n",
       "9 actual: Ông qua đời , bị lịch sử quật ngã .\n",
       "\n",
-      "10 predict: Anh ấy là ông nội tại ông nội . Anh ấy là ông ấy . Anh ấy đang thực hiện . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "10 predict: Ông ấy là ông của tôi . Ông ấy là kiến thức của ông . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ông là anh là người đời . . . . . . . . của ông . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "10 actual: Ông là ông của tôi .\n",
       "\n",
-      "11 predict: Tôi không bao giờ biết nó trong cuộc sống thực sự . . . . . . . . . . . . . . . . . . . . . tôi không bao giờ biết thật sự . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "11 predict: Tôi chưa bao giờ biết gì cả . . . . . . . . . . . . . . . . . . . . . . . . chưa bao giờ đi chăng chăng như tôi trước như tôi chưa bao giờ . . . . . . . như tôi chưa bao giờ hiểu được bao giờ như vậy . . . . . như tôi chưa bao giờ như tôi chưa bao giờ bao giờ bao giờ bao giờ bao giờ bao giờ nhớ bao giờ . như tôi chưa bao giờ bao giờ\n",
       "11 actual: Tôi chưa bao giờ gặp ông ngoài đời .\n",
       "\n",
-      "12 predict: Nhưng cuộc sống của chúng ta hơn nhiều so với ký ức của chúng ta . Thừa đến hơn là ký ức . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cuộc cuộc sống của chúng ta hơn nhiều thời điểm . . . . . . . . . . . . . . . .\n",
+      "12 predict: Nhưng cuộc sống của chúng ta còn dành nhiều ký ức của chúng ta hơn mục đích của chúng ta . . . . . . . . . . . . . . từ đó . . . . . . . . . . . . từ cuộc sống này của chúng ta và đẻ đẻ tương tác của chúng ta . . . . . cuộc sống của chúng ta sống sống này ảnh hưởng từ cuộc sống này của chúng ta . . . . . . . . . . . . . .\n",
       "12 actual: Nhưng cuộc đời ta nhiều hơn những gì ta lưu trong kí ức nhiều .\n",
       "\n",
-      "13 predict: Người bà tôi không bao giờ quên đời ông ấy . Hãy sửa cuộc sống của ông ta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . thì hãy hãy để tôi đang quên . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "13 predict: Em không bao giờ để tôi quên đời . . . . . . . . . . . . . . . . . . . . . nhận thức của tôi về đời tôi . . . . . . . . . . đừng đừng đừng đừng đừng quên việc đời của tôi . . . . . . . . . . tôi không bao đừng đừng đừng đừng đừng đừng đừng đừng đừng quên quên đời đời đời tôi . . . . . . . . . . . . . . .\n",
       "13 actual: Bà tôi chưa bao giờ cho phép tôi quên cuộc đời của ông .\n",
       "\n",
-      "14 predict: Bảo tôi không cho phép điều đó , bài học của tôi được kiểm tra tấn , và vâng , vâng , vâng , vâng , vâng , chúng ta . \" chúng ta có thể tiếp tục . . \" nhưng chúng ta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "14 predict: Chân tôi không cho phép ta đánh giá , và cũng đang học hỏi chúng ta nhưng chúng ta đã cố gắn thời gian . . . . . . . . nhưng chúng ta đang hướng tới . . . . . . . . . . . . . . . . . . . . chúng ta càng đùa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "14 actual: Nhiệm vụ của tôi là không để cuộc đời ấy qua trong vô vọng , và bài học của tôi là nhận ra rằng , vâng , lịch sử đã cố quật ngã chúng tôi , nhưng chúng tôi đã chịu đựng được .\n",
       "\n",
-      "15 predict: Sau điểm màu tiếp theo là sự biến đổi trong một cái tổ chức nào đó buổi diễn chú . Hãy sửa nó . Hãy bận tâm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "15 predict: Tiếp theo là của một công việc có hành động đầu tiên ngay từ đầu tới mực nước biển hút mực nước biển . Tiếp đi xuyên đi nước biển . Quả là đi săn mồi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "15 actual: Mảnh ghép tiếp theo của tấm hình là một con thuyền trong sớm hoàng hôn lặng lẽ trườn ra biển .\n",
       "\n",
-      "16 predict: Mẹ tôi , thật hạn , lúc <NUM> tuổi khi cô ấy mất hai cuộc hôn nhân của mình trong <NUM> cuộc hôn nhân hai cô bé nhỏ , với hai em gái nhỏ rồi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "16 predict: Mẹ tôi lúc <NUM> tuổi đã qua đời . . . . . . . . . . . . với hai cô gái nhỏ . . . . . . . . . . . . . . . . . . . . . . . . hai cô gái . . . . . . . . . . . . . . . . . . . . của mẹ tôi . . . . . . . . . . . . . hai chữ lời lời lời lời <NUM> tiên này của cô\n",
       "16 actual: Mẹ tôi , Mai , mới <NUM> tuổi khi ba của mẹ mất - - đã lập gia đình , một cuộc hôn nhân sắp đặt trước , đã có hai cô con gái nhỏ .\n",
       "\n",
-      "17 predict: Đối với cô ấy , sự sống của chính nó có một nhiệm vụ : Một gia đình và gia đình mới trong cuộc sống gia đình bà ấy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "17 predict: Đối với cuộc đời mình , cuộc sống thành một công : cuộc đời và gia đình mới ở Úc . . . . . . . . . . . trong Úc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lịch sử nước Úc . . . . . . . . . . . . . . . . . . . . .\n",
       "17 actual: Với mẹ , cuộc đời cô đọng vào nhiệm vụ duy nhất : để gia đình mẹ trốn thoát và bắt đầu cuộc sống mới ở Úc .\n",
       "\n",
-      "18 predict: Nó có thể khiến cô ấy không thành công . Nó sẽ không thành công . . . . . . . . . . . . . cô ấy không thể hiểu biết cô ấy sẽ đi theo . . . . . . . . . . . . . cô ấy có thể sẽ thành công . . . . . . . . . . . . . . . . . . . cô ấy có thể thành công việc . . . . . . . . . . . . . .\n",
+      "18 predict: Đó là việc cô ấy sẽ không thành công . thành công . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . từ nó đến việc việc hối bao nhiêu . . . . . . . . . . . . . . . . . . . . .\n",
       "18 actual: Mẹ không bao giờ chấp nhận được là mẹ sẽ không thành công .\n",
       "\n",
-      "19 predict: Một năm sau - bốt tái sinh ra <NUM> ma trận ngoại bào sắt ra đại dương như một con cá voi như một chiếc Syria . Những đánh cá voi đánh cá mồi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
+      "19 predict: Sau <NUM> năm tin sinh ra phân tử Việt Nam Cực như <NUM> con Hạ rủi ro với <NUM> rủi ro như <NUM> . . . . . . . . . . . . . . . . . <NUM> ngày <NUM> tháng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .\n",
       "19 actual: Thế là sau bốn năm , một trường thiên đằng đẵng hơn cả trong truyện , một chiếc thuyền trườn ra biển nguỵ trang là thuyền đánh cá .\n",
       "\n"
      ]