You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing the data and code.
Could you also share some details about how the raw videos are converted to the patch files?
Although you listed the raw data sources on this page: https://huggingface.co/datasets/THUdyh/Oryx-SFT-Data
I suppose there are also some filtering steps to select only 600k data from them?
The text was updated successfully, but these errors were encountered:
Hi, the raw videos are clipped into frames and saved in bytes format in patch files. We use patch files to increase the i/o efficiency during training. We do not use any downsampling or merging operation for our raw video sources and we just add all the videos from our source datasets together. If you have any detailed questions, please feel free to ask further.
I see, thanks for the prompt reply.
So is it still possible to trace back from the current id oryx_0000... to the video / question ids in their original dataset?
When people trying to add more data for training, this information can be very useful for deduplication
And we may also want to try different frame rates. The current frame rate is 1 FPS right?
Thanks for sharing the data and code.
Could you also share some details about how the raw videos are converted to the patch files?
Although you listed the raw data sources on this page: https://huggingface.co/datasets/THUdyh/Oryx-SFT-Data
I suppose there are also some filtering steps to select only 600k data from them?
The text was updated successfully, but these errors were encountered: