[QUESTION] Why is `reset_attention_mask=False` by default? #1096

dtamayo-nlp · 2024-07-26T10:40:43Z

dtamayo-nlp
Jul 26, 2024

Your question

When we want to make a training in LLMs with a lot of corpora, I understand that the usual approach is to introduce the documents with the following format:
[doc 1] <sep> [doc 2] <sep> ...
Until the context length is full. However, the intuitive way of optimizing that I see is using something that you call reset_attention_mask and you have implemented here.

What I did not expect is to find this attribute as False in most yaml configurations of open models. Examples:

While I understand that there might be some potential benefits to not using masking, I don't trivially see why it should be the default approach. I haven't found much on the internet on this topic, any information would be welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Why is `reset_attention_mask=False` by default? #1096

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

[QUESTION] Why is reset_attention_mask=False by default? #1096

dtamayo-nlp Jul 26, 2024

Replies: 0 comments

[QUESTION] Why is `reset_attention_mask=False` by default? #1096

dtamayo-nlp
Jul 26, 2024