Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the DSG integrated in the generation process #3

Open
DiegoBiagini opened this issue Aug 8, 2024 · 0 comments
Open

How is the DSG integrated in the generation process #3

DiegoBiagini opened this issue Aug 8, 2024 · 0 comments

Comments

@DiegoBiagini
Copy link

Thank you for having released the code for this project.
I'm having trouble wrapping my head around the integration of the DSG into the generation process, when comparing against the paper, and I wanted to ask for some clarifications.

In step 2 it is said: "This step uses gold DSG of video for the updating of recurrent graph Transformer in 3D-UNet."
However the config file specifies that the DSG conditioning is not added, through use_temporal_transformer:False, is this still a T2V only pretraining step?

It would also be nice to have some more explanation on 'how to parse the DSG annotations in advance with the tools in dysen/DSG', since the original code is made for images, not videos. Or if possible provide pre-parsed representations for one of the video datasets used.

I might have misunderstood the process, but even in bash shellscripts/run_sample_vdm_text2video.sh I can't find the step that lies between the textual representation and the graph representation. Is that the script that is used to generate the data passed to shellscripts/run_eval_dysen_vdm.sh?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant