We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the feature
大规模数据集,直接 load json 文件内存会爆,只能使用 iterable 的形式,请问支持 webdataset 么,或者直接用 多个jsonl + streaming 形式就可以支持超大规模 sft 么~~ 求问
The text was updated successfully, but these errors were encountered:
可以的
Sorry, something went wrong.
请问下是可以 webdataset 还是 直接用多 jsonl 呀~ 有木有文档或者 case 可以参考呀 ~ 🙏
No branches or pull requests
Describe the feature
大规模数据集,直接 load json 文件内存会爆,只能使用 iterable 的形式,请问支持 webdataset 么,或者直接用 多个jsonl + streaming 形式就可以支持超大规模 sft 么~~ 求问
The text was updated successfully, but these errors were encountered: