[QUESTION] Why is reset_attention_mask=False
by default?
#1096
Unanswered
dtamayo-nlp
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your question
When we want to make a training in LLMs with a lot of corpora, I understand that the usual approach is to introduce the documents with the following format:
[doc 1] <sep> [doc 2] <sep> ...
Until the context length is full. However, the intuitive way of optimizing that I see is using something that you call
reset_attention_mask
and you have implemented here.What I did not expect is to find this attribute as False in most
yaml
configurations of open models. Examples:While I understand that there might be some potential benefits to not using masking, I don't trivially see why it should be the default approach. I haven't found much on the internet on this topic, any information would be welcome!
Beta Was this translation helpful? Give feedback.
All reactions