-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM layer doesn't learn with TextVectorization output (padding ?) #20898
Comments
So I had a look at the way the TextVectorization object does the padding ( https://github.com/keras-team/keras/blob/v3.8.0/keras/src/layers/preprocessing/text_vectorization.py#L568 ). It transforms the input into a It relates to this tensorflow issue ( tensorflow/tensorflow#34793 (comment) ) that asked for a pre-padding option in the However he proposed a function to pre-pad a 2D RaggedTensor, that might be general enough for the TextVectorization case:
That could be an inspiration to add something like this to support pre-padding that in Buuuuuut, maybe that's not the right way, and maybe it would be a better idea to look at the LSTM layer and make it handle post-padded input by default, I don't know. |
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you. |
Hi @mehtamansi29, Moreover, if I reverse the order of the input sequences by passing So to me this is still a regression compared to Keras 2 and should be labeled as a bug, which seems to be related with the padding produced by |
I just realized I couldn't edit your gist, here is a copy that only adds |
I have an issue on my own textual data when training an LSTM for sentiment analysis. I used (Keras 2) to encode the textual data with the old
Tokenizer
+pad_sequences
way, and switched to the newTextVectorization
object, but it doesn't learn anymore (losses don't change, accuracies around .50). So I tried with an example from the documentation :I reran the example "Text classification from scratch" from the doc ( https://keras.io/examples/nlp/text_classification_from_scratch/ ), and it works fine as is (validation accuracy is going up).
Then I replaced the Conv1D and GlobalMaxPooling1D layers by a LSTM layer, and the model doesn't learn, the train and validation accuracies stay around .50.
However if I pass
go_backwards=True
to the LSTM layer, then it learns correctly (but also reads each text backwards consequently).It might be due to the fact that the
TextVectorization
layer "post"-pads the input (the x input vectors are filled with zeros at the end), whereas the LSTM layers expects "pre"-padded inputs (the x input vectors are filled with zeros at the beginning), and thus doesn't iterate at all on the input tokens.Indeed, the "Bidirectional LSTM on IMDB" https://keras.io/examples/nlp/bidirectional_lstm_imdb/ works well, as it loads an already tokenized (but not padded) version of IMDB, and thus doesn't use the
TextVectorization
layer, but thekeras.utils.pad_sequences
function that "pre"-pads by default. What is weird though is that it still does learn when settingpadding='post'
, but the validation accuracy goes up much more slowly at each epoch than withpadding='pre'
. So it might be more complicated than it seems, but still seems to be a padding issue.Still it might be easily solved by allowing to choose whether to pre- or post-pad in the
TextVectorization
layer, similarly to thekeras.utils.pad_sequences
function parameterpadding
.I had the same results on two different machines, on CPU and GPU (Geforce GTX 1650 6go), with tensorflow and JAX backends, on Keras 3.8.0, and python 3.10 and 3.12.
Here is the modified example from "Text classification from scratch" with an LSTM layer instead of the Conv1D and GlobalMaxPooling1D layers :
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: