You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Switched to a fit_generator implementation of generating sequences for training, instead of loading all sequences into memory. This will allow training large text files (10MB+) without requiring ridiculous amounts of RAM.
Better word_level support:
The model will only keep max_words words and discard the rest.
The model will not train to predict words not in the vocabulary
All punctuation (including smart quotes) are their own token.
When generating, newlines/tabs have surrounding whitespace stripped. (this is not the case for other punctuation as there are too many rules around that)
Training on single text no longer uses meta tokens to indicate the start/end of the text and does not use them when generating, which results in slightly better output.