Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organization of data #1

Open
rese1f opened this issue Jul 18, 2024 · 5 comments
Open

Organization of data #1

rese1f opened this issue Jul 18, 2024 · 5 comments

Comments

@rese1f
Copy link

rese1f commented Jul 18, 2024

I was working on the data, and I have some questions regarding the structure of the dataset.
How does the formatting of the identifiers work. I see that it has a year number, youtube id, but I'm not sure what the last 2 numbers mean. I've tried downloading the videos and then splitting them into clips, but I am unclear what the 2 numbers in the bounds folder are. It seems in some of the files the start_bound is larger than the end_bound, and after extracting clips some clips, it doesn't match with the captions, so maybe the frames I extracted don't align with the ones that were captioned, but there is no way to verify. Additionally, is there a reason that the data is formatted into a separate file for every single attribute instead of a larger json? It seems to bloat the file io when for example there are 114k numpy files that each store just 2 integers. Meow meow meow meow meow

@robincourant
Copy link
Owner

Hi,

Thank you for your interest in our dataset.

The file identifier follows this format: {year}_{video_id}_{shot_index}_{chunk_index}. The year, video_id, and shot_index correspond to attributes from CondensedMovies, while the chunk_index is generated during my preprocessing.

You can find the frame bounds for each sample in the bounds folder of the dataset.

@patina27
Copy link

Hi Robin, thanks for hard work on the dataset, very appreciated! However, I think I am having the same issue as rese1f mentioned, specifically, I found that the start index and end index in the bounds doesn't really match up with the corresponding caption_cam and traj npy files in terms of length, (e.g. end - start does not match with the the length of camera trajectory file), and some of the clips has start frame > end frame.

Example:
2016_pQ68ImO9dBU_00000_00003
2019_aoc1wqaK8cc_00008_00001
2019_ML4CcaTFsZE_00010_00001

I've also attached a mismatch txt file
mismatch.txt

@robincourant
Copy link
Owner

Hi,

Thank you for pointing out the issue. I apologize for the typo in my script. I have recomputed the bounds and reuploaded them there. Please let me know if you encounter any further problems.

@patina27
Copy link

Hi Robin, thanks for the update! Looks like the bounds are making more sense now!

I still find that the camera caption not very accurate, and looks misaligned sometimes. I noticed that for some videos, the length of the array in cam_segments does not match up with that of traj/xxx.npy. Could this be an issue?

Best,

@robincourant
Copy link
Owner

Hi,

Do you have examples of camera segments that do not match the trajectory file length?
Keep in mind that the number of camera segments will always be one less than the trajectory length, as each segment is defined between two poses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants