Hidden state propagation between batches #1

stante · 2019-05-26T05:36:42Z

How the training data is separated into batches and how the state propagation between batches is done seems not to be correct.

The whole input text is split into sequences of sequence length. The batch generation then takes the batch size amount of sequences and returns them for training and the hidden state is propagated between batches. This however means that each hidden state does not see the input sequentially but after a first sequence it skips over a batch size of sequences.

One idea how to solve this would be to split the whole input in batch size parts, and then iterate through each part during batch generation. It would be interesting to compare the performance of the existing approach with the one suggested in this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hidden state propagation between batches #1

Hidden state propagation between batches #1

stante commented May 26, 2019

Hidden state propagation between batches #1

Hidden state propagation between batches #1

Comments

stante commented May 26, 2019