logs20: Make a list of how we implement RL for seq2seq

Log Type	Detail
1: What specific output am I working on right now?	A list of concrete steps of how we implement RL based seq2seq
2: Thinking out loud - hypotheses about the current problem - what to work on next - how can I verify	1. Draw diagrams to explain RL pattern. 1. Write concrete steps as list.
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer	N/A
4: Results of runs and conclusion	1. done Save and commit 1. done Find a way to implement sample method considering how we backprop. 1. done Implement sample using empty note. done 1. Port the implementation. done 1.Think how we can keep reward=1 case. 1. Make a list of how we test it.
5: Next steps
6: mega.nz	N/A

Train Op needs trainable_variable in the traing graph.
Train Op requires logits (from sampling) to do backprop
So we need to port sample to model not infer model.

How to test RL in small

Connect all and see if it's not failing. (with reward == 1)
Modify
test small
- Initial training with normal seq2seq
test medium
Clean up all old code?

logs20: Make a list of how we implement RL for seq2seq

How to test RL in small

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally