Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory after training a few epochs #20

Open
waylonli opened this issue Sep 30, 2023 · 0 comments
Open

Out of Memory after training a few epochs #20

waylonli opened this issue Sep 30, 2023 · 0 comments

Comments

@waylonli
Copy link

waylonli commented Sep 30, 2023

The code I'm using is in file "one_file_ref".
I was trying to apply Mistral Transformer on other non-text tubular data. I initialised "positions" as torch.arange(1, num_of_most_instances) where "num_of_most_instances" is equivalent to the number of tokens in the longest sequence.
However, I have observed that each time I called loss.backward() and enter the next batch, there would be 30mb of gpu memory which could not be released. Thus, after 1000 steps it took 30gb of gpu memory.

Also I found that it always entered line 131 and never went into the "else" branch with my initialised "positions".
Is there any mistake of my usage of "positions"? Though the issue does not happen again after I comment out all the codes related to self.cache, I'm wondering if that will affect the attention mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant