You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How the training data is separated into batches and how the state propagation between batches is done seems not to be correct.
The whole input text is split into sequences of sequence length. The batch generation then takes the batch size amount of sequences and returns them for training and the hidden state is propagated between batches. This however means that each hidden state does not see the input sequentially but after a first sequence it skips over a batch size of sequences.
One idea how to solve this would be to split the whole input in batch size parts, and then iterate through each part during batch generation. It would be interesting to compare the performance of the existing approach with the one suggested in this issue.
The text was updated successfully, but these errors were encountered:
How the training data is separated into batches and how the state propagation between batches is done seems not to be correct.
The whole input text is split into sequences of sequence length. The batch generation then takes the batch size amount of sequences and returns them for training and the hidden state is propagated between batches. This however means that each hidden state does not see the input sequentially but after a first sequence it skips over a batch size of sequences.
One idea how to solve this would be to split the whole input in batch size parts, and then iterate through each part during batch generation. It would be interesting to compare the performance of the existing approach with the one suggested in this issue.
The text was updated successfully, but these errors were encountered: