Out of Memory after training a few epochs #20

waylonli · 2023-09-30T00:14:06Z

The code I'm using is in file "one_file_ref".
I was trying to apply Mistral Transformer on other non-text tubular data. I initialised "positions" as torch.arange(1, num_of_most_instances) where "num_of_most_instances" is equivalent to the number of tokens in the longest sequence.
However, I have observed that each time I called loss.backward() and enter the next batch, there would be 30mb of gpu memory which could not be released. Thus, after 1000 steps it took 30gb of gpu memory.

Also I found that it always entered line 131 and never went into the "else" branch with my initialised "positions".
Is there any mistake of my usage of "positions"? Though the issue does not happen again after I comment out all the codes related to self.cache, I'm wondering if that will affect the attention mechanism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory after training a few epochs #20

Out of Memory after training a few epochs #20

waylonli commented Sep 30, 2023 •

edited

Loading

Out of Memory after training a few epochs #20

Out of Memory after training a few epochs #20

Comments

waylonli commented Sep 30, 2023 • edited Loading

waylonli commented Sep 30, 2023 •

edited

Loading