Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问支持 webdataset 作为 qwen2.5VL 的输入么? #3214

Open
rover5056 opened this issue Feb 21, 2025 · 2 comments
Open

请问支持 webdataset 作为 qwen2.5VL 的输入么? #3214

rover5056 opened this issue Feb 21, 2025 · 2 comments

Comments

@rover5056
Copy link

Describe the feature

大规模数据集,直接 load json 文件内存会爆,只能使用 iterable 的形式,请问支持 webdataset 么,或者直接用 多个jsonl + streaming 形式就可以支持超大规模 sft 么~~ 求问

@Jintao-Huang
Copy link
Collaborator

可以的

@rover5056
Copy link
Author

rover5056 commented Feb 21, 2025

请问下是可以 webdataset 还是 直接用多 jsonl 呀~
有木有文档或者 case 可以参考呀 ~ 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants