-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attention can be None in ModernBertForSequenceClassification #35917
Comments
Hi @ashmikuz, when no attention mask is passed then we can't really work out which positions are masked! Although we could add code to estimate this (like Would you be interested in making a PR for that? |
Hi @Rocketknight1 , @ashmikuz, I had the same issue. Is anyone working on this? Otherwise I will raise a PR and add this RuntimeError:
|
Sorry I was quite busy in the last few days. Shouldn't it match how other models behave? As far as I understand, other models just print a warning and then create an attention mask from torch.ones, right? |
Yes, you are right, I see the torch.ones for e.g. deberta here I don't see the warning but I may have missed it ... |
I'm working on a quick PR, just a moment and i'll send it. Hopefully it fixes the issue and is in line with other models. |
In the ModernBertForSequenceClassification class, the attention is never computed outside of the self.model (which is a ModernBertModel). Therefore when the attention is not used as input for the model the .unsqueeze() here fails.
I solved this issue by assagning torch.ones(batch_size,seq_len) to the attention_mask, but I am not sure whether this is correct.
The text was updated successfully, but these errors were encountered: