Replies: 1 comment
-
This was happening because there were NaN's in some of my text. So, when I did a deep dive, I could see However, I am still interested in knowing whether there is a DataGenerator setup for text data in Keras NLP? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have modified a Google Colab tutorial to finetune GPT-2. I have a large text dataset and only 22.3 GB GPU VRAM (L4 GPU, High ram).
My package versions are:
I can load my data to a list on the instance:
However, if I try to read the whole dataset as below, my instance crashes.
tf_train_ds = tf.data.Dataset.from_tensor_slices(training_list).batch(28)
I believe I need to create a DataGenerator that can load my data in batch sizes of 28, which is the most I can fit into memory with 22.3GB VRam. Am I correct? If not, how can I manage training with this "large" dataset?
For more context, the below is how I do the fitting, which runs for datasets that I can fit to memory:
Beta Was this translation helpful? Give feedback.
All reactions