You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I need contextual embeddings of individual words for analysis. However, it turns out that ModernBERT performs much worse than BERT. I use the right version of transformer mentioned in Hugging Face and the following code (same for BERT which works well) to obtain the word-level embeddings. Could there be any problem?
Yeah, I don't know what you are evaluating exactly, but if you have per word metrics, it might be related to the issue mentioned.
We should have fixed the tokenizer in the latest version of transformers, could you try installing it (maybe from source just to be sure) and to use is_split_into_words=True while calling the tokenizer and also maybe use add_prefix_space=True when creating the tokenizer.
Could you try checking the tokenization with BERT and ModernBERT to make sure they are similar?
I need contextual embeddings of individual words for analysis. However, it turns out that ModernBERT performs much worse than BERT. I use the right version of transformer mentioned in Hugging Face and the following code (same for BERT which works well) to obtain the word-level embeddings. Could there be any problem?
The text was updated successfully, but these errors were encountered: