You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the great contribution. The program works fine with tinyshakespeare dataset and other dataset, however part of "train.py" code looks quite strange to me. Line 87-91:
for i in xrange(jump * n_epochs):
x_batch = np.array([train_data[(jump * j + i) % whole_len]
for j in xrange(batchsize)])
y_batch = np.array([train_data[(jump * j + i + 1) % whole_len]
for j in xrange(batchsize)])
While "train_data" is the source character sequence, x_data seems to consist of characters from separate positions, that is, from every "jump" distant positions. To train RNN, internal state must be carried over to next input, but this minibatch data seems to violate this input data continuity. I would appreciate if you explain why the code works fine. Thanks.
The text was updated successfully, but these errors were encountered:
As far as I understand, in general, a minibatch should process independent examples (for the gradient to be a good estimation of the global gradient). In RNNs, examples are not independent, but if we take the minibatch from far away characters, we get a good approximation. So the minibatch acts like a rake in which teeth are separated by the jump value, and which is moved from character to next.
Hi,
Thank you for the great contribution. The program works fine with tinyshakespeare dataset and other dataset, however part of "train.py" code looks quite strange to me. Line 87-91:
for i in xrange(jump * n_epochs):
x_batch = np.array([train_data[(jump * j + i) % whole_len]
for j in xrange(batchsize)])
y_batch = np.array([train_data[(jump * j + i + 1) % whole_len]
for j in xrange(batchsize)])
While "train_data" is the source character sequence, x_data seems to consist of characters from separate positions, that is, from every "jump" distant positions. To train RNN, internal state must be carried over to next input, but this minibatch data seems to violate this input data continuity. I would appreciate if you explain why the code works fine. Thanks.
The text was updated successfully, but these errors were encountered: