Skip to content

Commit

Permalink
Lesson 16 & 17
Browse files Browse the repository at this point in the history
  • Loading branch information
CinnamonXI committed May 12, 2022
1 parent a75ac8b commit 9b66995
Show file tree
Hide file tree
Showing 11 changed files with 475 additions and 166 deletions.
4 changes: 3 additions & 1 deletion etc/quiz-app/src/assets/translations/en/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import x10 from "./lesson-10.json";
import x12 from "./lesson-12.json";
import x13 from "./lesson-13.json";
import x14 from "./lesson-14.json";
import x16 from "./lesson-16.json";
import x17 from "./lesson-17.json";
import x23 from "./lesson-23.json";
const quiz = { 0 : x1[0], 1 : x2[0], 2 : x3[0], 3 : x4[0], 4 : x5[0], 5 : x7[0], 6 : x8[0], 7 : x9[0], 8 : x10[0], 9 : x12[0], 10 : x13[0], 11 : x14[0], 12 : x23[0] };
const quiz = { 0 : x1[0], 1 : x2[0], 2 : x3[0], 3 : x4[0], 4 : x5[0], 5 : x7[0], 6 : x8[0], 7 : x9[0], 8 : x10[0], 9 : x12[0], 10 : x13[0], 11 : x14[0], 12 : x16[0], 13 : x17[0], 14 : x23[0] };
export default quiz;
119 changes: 119 additions & 0 deletions etc/quiz-app/src/assets/translations/en/lesson-16.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
[
{
"title": "AI for Beginners: Quizzes",
"complete": "Congratulations, you completed the quiz!",
"error": "Sorry, try again",
"quizzes": [
{
"id": 116,
"title": "RNN: Pre Quiz",
"quiz": [
{
"questionText": "RNN is short for?",
"answerOptions": [
{
"answerText": "regression neural network",
"isCorrect": false
},
{
"answerText": "recurrent neural network",
"isCorrect": true
},
{
"answerText": "re-iterative neural network",
"isCorrect": false
}
]
},
{
"questionText": "Simple RNN cell has two weight _____",
"answerOptions": [
{
"answerText": "matrices",
"isCorrect": true
},
{
"answerText": "cell",
"isCorrect": false
},
{
"answerText": "neuron",
"isCorrect": false
}
]
},
{
"questionText": "vanishing gradients is a problem of _____",
"answerOptions": [
{
"answerText": "RNN",
"isCorrect": true
},
{
"answerText": "CNN",
"isCorrect": false
},
{
"answerText": "KNN",
"isCorrect": false
}
]
}
]
},
{
"id": 216,
"title": "RNN: Post Quiz",
"quiz": [
{
"questionText": "_____ takes some information from the input and hidden vector, and inserts it into state",
"answerOptions": [
{
"answerText": "forget gate",
"isCorrect": false
},
{
"answerText": "output gate",
"isCorrect": false
},
{
"answerText": "input gate",
"isCorrect": true
}
]
},
{
"questionText": "Bidirectional RNNs runs recurrent computation in _____",
"answerOptions": [
{
"answerText": "both directions",
"isCorrect": true
},
{
"answerText": "nort-west direction",
"isCorrect": false
},
{
"answerText": "left-right direction",
"isCorrect": false
}
]
},
{
"questionText": "All RNN Cells have the same shareable weights",
"answerOptions": [
{
"answerText": "True",
"isCorrect": true
},
{
"answerText": "False",
"isCorrect": false
}
]
}
]
}
]
}
]
115 changes: 115 additions & 0 deletions etc/quiz-app/src/assets/translations/en/lesson-17.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
[
{
"title": "AI for Beginners: Quizzes",
"complete": "Congratulations, you completed the quiz!",
"error": "Sorry, try again",
"quizzes": [
{
"id": 117,
"title": "Generative networks: Pre Quiz",
"quiz": [
{
"questionText": "RNNs can be for generative tasks",
"answerOptions": [
{
"answerText": "yes",
"isCorrect": true
},
{
"answerText": "no",
"isCorrect": false
}
]
},
{
"questionText": "_____ is a traditional neural network with one input and one output",
"answerOptions": [
{
"answerText": "one-to-one",
"isCorrect": true
},
{
"answerText": "sequence-to-sequence",
"isCorrect": false
},
{
"answerText": "one-to-many",
"isCorrect": false
}
]
},
{
"questionText": "RNN generate texts by generating next output character for each input character",
"answerOptions": [
{
"answerText": "true",
"isCorrect": true
},
{
"answerText": "false",
"isCorrect": false
}
]
}
]
},
{
"id": 217,
"title": "Generative networks: Post Quiz",
"quiz": [
{
"questionText": "Output encoder converts hidden state into _____ output",
"answerOptions": [
{
"answerText": "one-hot-encoded",
"isCorrect": true
},
{
"answerText": "sequence",
"isCorrect": false
},
{
"answerText": "number",
"isCorrect": false
}
]
},
{
"questionText": "Selecting the character with higher probabilities always gives a meaningful text.",
"answerOptions": [
{
"answerText": "true",
"isCorrect": false
},
{
"answerText": "false",
"isCorrect": true
},
{
"answerText": "maybe",
"isCorrect": false
}
]
},
{
"questionText": "Many-to-many can also be referred to as _____",
"answerOptions": [
{
"answerText": "one-to-one",
"isCorrect": false
},
{
"answerText": "sequence-to-sequence",
"isCorrect": true
},
{
"answerText": "one-to-many",
"isCorrect": false
}
]
}
]
}
]
}
]
53 changes: 53 additions & 0 deletions etc/quiz-src/questions-en.txt
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,59 @@ Lesson 14E Embeddings: Post Quiz
- symbol
- number

Lesson 16B RNN: Pre Quiz
* RNN is short for?
- regression neural network
+ recurrent neural network
- re-iterative neural network
* Simple RNN cell has two weight _____
+ matrices
- cell
- neuron
* vanishing gradients is a problem of _____
+ RNN
- CNN
- KNN

Lesson 16E RNN: Post Quiz
* _____ takes some information from the input and hidden vector, and inserts it into state
- forget gate
- output gate
+ input gate
* Bidirectional RNNs runs recurrent computation in _____
+ both directions
- nort-west direction
- left-right direction
* All RNN Cells have the same shareable weights
+ True
- False

Lesson 17B Generative networks: Pre Quiz
* RNNs can be for generative tasks
+ yes
- no
* _____ is a traditional neural network with one input and one output
+ one-to-one
- sequence-to-sequence
- one-to-many
* RNN generate texts by generating next output character for each input character
+ true
- false

Lesson 17E Generative networks: Post Quiz
* Output encoder converts hidden state into _____ output
+ one-hot-encoded
- sequence
- number
* Selecting the character with higher probabilities always gives a meaningful text.
- true
+ false
- maybe
* Many-to-many can also be referred to as _____
- one-to-one
+ sequence-to-sequence
- one-to-many

Lesson 23B Multi-Agent Modeling: Pre Quiz
* By modeling the behavior of simple agents, we can understand more complex behaviors of a system.
+ true
Expand Down
8 changes: 4 additions & 4 deletions lessons/5-NLP/15-LanguageModeling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@

Semantic embeddings, such as Word2Vec and GloVe, are in fact a first step towards **language modeling** - creating models that somehow *understand* (or *represent*) the nature of the language.

The main idea behind language modeling is training them on unlabeled datesets in unsupervised manner. It is important, because we have huge amounts of unlabeled text available, while the amount of labeled text would always be limited by the amount of effort we can spend on labeling. Most often, we build language models that can **predict missing words** in the text, because it is easy to mask out a random word in text and use it as a training sample.
The main idea behind language modeling is training them on unlabeled datesets in unsupervised manner. It is important, because we have huge amounts of unlabeled text available, while the amount of labeled text would always be limited by the amount of effort we can spend on labeling. Most often, we build language models that can **predict missing words** in the text, because it is easy to mask out a random word in text and use it as a training sample.

## Training embeddings

In our previous examples, we have been using pre-trained semantic embeddings, but it is interesting to see how those embeddings can be trained using either CBoW, or Skip-gram architectures.
In our previous examples, we have been using pre-trained semantic embeddings, but it is interesting to see how those embeddings can be trained using either CBoW, or Skip-gram architectures.

![](../14-Embeddings/images/example-algorithms-for-converting-words-to-vectors.png)

> Image from [this paper](https://arxiv.org/pdf/1301.3781.pdf)
The idea of CBoW is exactly predicting a missing word, however, to do this we take a small sliding window of text tokens (we can denote them from W<sub>-2</sub> to W<sub>2</sub>), and train a model to predict the central word W<sub>0</sub> from few surrounding words.
The idea of CBoW is exactly predicting a missing word, however, to do this we take a small sliding window of text tokens (we can denote them from W<sub>-2</sub> to W<sub>2</sub>), and train a model to predict the central word W<sub>0</sub> from few surrounding words.

## More Info

* [Official PyTorch tutorial on Language Modeling](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html).
* [Official PyTorch tutorial on Language Modeling](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html).
* [Official TensorFlow tutorial on training Word2Vec model](https://www.TensorFlow.org/tutorials/text/word2vec).
* Using **gensim** framework to train most commonly used embeddings in a few lines of code is as described [in this documentation](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html).
15 changes: 11 additions & 4 deletions lessons/5-NLP/16-RNN/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Recurrent Neural Networks

## [Pre-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/116)

In the previous sections, we have been using rich semantic representations of text, and a simple linear classifier on top of the embeddings. What this architecture does is to capture aggregated meaning of words in a sentence, but it does not take into account the **order** of words, because aggregation operation on top of embeddings removed this information from the original text. Because these models are unable to model word ordering, they cannot solve more complex or ambiguous tasks such as text generation or question answering.

To capture the meaning of text sequence, we need to use another neural network architecture, which is called a **recurrent neural network**, or RNN. In RNN, we pass our sentence through the network one symbol at a time, and the network produces some **state**, which we then pass to the network again with the next symbol.
Expand All @@ -8,7 +10,7 @@ To capture the meaning of text sequence, we need to use another neural network a

> Image by author
Given the input sequence of tokens X<sub>0</sub>,...,X<sub>n</sub>, RNN creates a sequence of neural network blocks, and trains this sequence end-to-end using back propagation. Each network block takes a pair (X<sub>i</sub>,S<sub>i</sub>) as an input, and produces S<sub>i+1</sub> as a result. Final state S<sub>n</sub> or (output Y<sub>n</sub>) goes into a linear classifier to produce the result. All network blocks share the same weights, and are trained end-to-end using one backpropagation pass.
Given the input sequence of tokens X<sub>0</sub>,...,X<sub>n</sub>, RNN creates a sequence of neural network blocks, and trains this sequence end-to-end using back propagation. Each network block takes a pair (X<sub>i</sub>,S<sub>i</sub>) as an input, and produces S<sub>i+1</sub> as a result. Final state S<sub>n</sub> or (output Y<sub>n</sub>) goes into a linear classifier to produce the result. All network blocks share the same weights, and are trained end-to-end using one back propagation pass.

Because state vectors S<sub>0</sub>,...,S<sub>n</sub> are passed through the network, it is able to learn the sequential dependencies between words. For example, when the word *not* appears somewhere in the sequence, it can learn to negate certain elements within the state vector, resulting in negation.

Expand All @@ -24,7 +26,7 @@ Simple RNN cell has two weight matrices inside: one transforms input symbol (let

> Image by author
In many cases, input tokens are passed through the embedding layer before entering the RNN to lower the dimensionality. In this case, if the dimension of the input vectors is *emb_size*, and state vector is *hid_size* - the size of W is *emb_size*&times;*hid_size*, and the size of H is *hid_size*&times;*hid_size*.
In many cases, input tokens are passed through the embedding layer before entering the RNN to lower the dimensionality. In this case, if the dimension of the input vectors is *emb_size*, and state vector is *hid_size* - the size of W is *emb_size*&times;*hid_size*, and the size of H is *hid_size*&times;*hid_size*.

## Long Short Term Memory (LSTM)

Expand All @@ -33,7 +35,8 @@ One of the main problems of classical RNNs is so-called **vanishing gradients**
![Image showing an example long short term memory cell](./images/long-short-term-memory-cell.svg)

LSTM Network is organized in a manner similar to RNN, but there are two states that are being passed from layer to layer: actual state C, and hidden vector H. At each unit, hidden vector H<sub>i</sub> is concatenated with input X<sub>i</sub>, and they control what happens to the state C via **gates**. Each gate is a neural network with sigmoid activation (output in the range [0,1]), which can be thought of as bitwise mask when multiplied by the state vector. There are the following gates (from left to right on the picture above):
* **forget gate** takes hidden vector and determines, which components of the vector C we need to forget, and which to pass through.

* **forget gate** takes hidden vector and determines, which components of the vector C we need to forget, and which to pass through.
* **input gate** takes some information from the input and hidden vector, and inserts it into state.
* **output gate** transforms state via some linear layer with *tanh* activation, then selects some of its components using hidden vector H<sub>i</sub> to produce new state C<sub>i+1</sub>.

Expand All @@ -43,7 +46,7 @@ Components of the state C can be thought of as some flags that can be switched o
## Bidirectional and multilayer RNNs

We have discussed recurrent networks that operate in one direction, from beginning of a sequence to the end. It looks natural, because it resembles the way we read and listen to speech. However, since in many practical cases we have random access to the input sequence, it might make sense to run recurrent computation in both directions. Such networks are call **bidirectional** RNNs. When dealing with bidirectional network, we would need two hidden state vectors, one for each direction.
We have discussed recurrent networks that operate in one direction, from beginning of a sequence to the end. It looks natural, because it resembles the way we read and listen to speech. However, since in many practical cases we have random access to the input sequence, it might make sense to run recurrent computation in both directions. Such networks are call **bidirectional** RNNs. When dealing with bidirectional network, we would need two hidden state vectors, one for each direction.

Recurrent network, one-directional or bidirectional, captures certain patterns within a sequence, and can store them into state vector or pass into output. As with convolutional networks, we can build another recurrent layer on top of the first one to capture higher level patterns, build from low-level patterns extracted by the first layer. This leads us to the notion of **multi-layer RNN**, which consists of two or more recurrent networks, where output of the previous layer is passed to the next layer as input.

Expand All @@ -59,3 +62,7 @@ Recurrent network, one-directional or bidirectional, captures certain patterns w
## RNNs for other tasks

In this unit, we have seen that RNNs can be used for sequence classification, but in fact, they can handle many more tasks, such as text generation, machine translation, and more. We will consider those tasks in the next unit.

## [Post-lecture quiz](https://black-ground-0cc93280f.1.azurestaticapps.net/quiz/216)

> ✅ Todo: conclusion, Assignment, challenge, reference.
Loading

0 comments on commit 9b66995

Please sign in to comment.