We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
以MlmDataset 中 最简单的字粒度为例,不开启full-sentence开关 当样本长度超过max_length时候,样本被切分 然而此时的 [CLS] [SEP] token 却只存在一份,这是由之前的 document 传入的,样本拆分后并没有产生额外的头尾 token 这种行为符合预期么,理论上每个单独的样本都应该具有一个 [CLS] 头 [SEP] 尾
The text was updated successfully, but these errors were encountered:
有道理,这个问题我确认一下
Sorry, something went wrong.
No branches or pull requests
以MlmDataset 中 最简单的字粒度为例,不开启full-sentence开关
当样本长度超过max_length时候,样本被切分
然而此时的 [CLS] [SEP] token 却只存在一份,这是由之前的 document 传入的,样本拆分后并没有产生额外的头尾 token
这种行为符合预期么,理论上每个单独的样本都应该具有一个 [CLS] 头 [SEP] 尾
The text was updated successfully, but these errors were encountered: