Skip to content

logs20: Make a list of how we implement RL for seq2seq

Higepon Taro Minowa edited this page May 24, 2018 · 7 revisions
Log Type Detail
1: What specific output am I working on right now? A list of concrete steps of how we implement RL based seq2seq
2: Thinking out loud
- hypotheses about the current problem
- what to work on next
- how can I verify
1. Draw diagrams to explain RL pattern.
1. Write concrete steps as list.
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer N/A
4: Results of runs and conclusion 1. done Save and commit
1. done Find a way to implement sample method considering how we backprop.
1. done Implement sample using empty note.
done 1. Port the implementation.
done 1.Think how we can keep reward=1 case.
1. Make a list of how we test it.
5: Next steps
6: mega.nz N/A
  • Train Op needs trainable_variable in the traing graph.
  • Train Op requires logits (from sampling) to do backprop
  • So we need to port sample to model not infer model.

How to test RL in small

  • Connect all and see if it's not failing. (with reward == 1)
  • Modify
  • test small
    • Initial training with normal seq2seq
  • test medium
  • Clean up all old code?
Clone this wiki locally