Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

varying depth values #30

Open
arlo-ml opened this issue Oct 19, 2024 · 10 comments
Open

varying depth values #30

arlo-ml opened this issue Oct 19, 2024 · 10 comments

Comments

@arlo-ml
Copy link

arlo-ml commented Oct 19, 2024

Hi, Thanks for your amazing work

I've been testing your code with long videos (from 300 to 800 frames). I often get varying values of the background over the time.
For example, with this video (383 frames), I get different values of the background, using these parameters, from frame 231 to frame 315:
Output frame rate: 24
Inference steps: 25
Guidance scale: 1.2
Dataset: kitti

Is it expected with longer videos?

01
02

@arlo-ml arlo-ml closed this as completed Oct 19, 2024
@arlo-ml arlo-ml reopened this Oct 21, 2024
@arlo-ml
Copy link
Author

arlo-ml commented Oct 21, 2024

And these are other videos. As you can see, the values of the background change over time:

01
02

01
02

@juntaosun
Copy link

juntaosun commented Oct 21, 2024

The project mainly relies on stable video diffusion.
Each time you enter the inference prediction it changes, it is random,
I don't think it is truly time consistent.

@acgourley
Copy link

acgourley commented Oct 22, 2024

My understanding of the paper is that they are using a sliding context window of around 1.5s for inference and so it makes sense it would shift over periods longer than a couple seconds. I doubt there is a simple fix but I'd love to hear it if people have ideas.

@wbhu
Copy link
Collaborator

wbhu commented Oct 23, 2024

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.

Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.

Best,
Wenbo

@arlo-ml
Copy link
Author

arlo-ml commented Oct 23, 2024

hi Wenbo, thank you for the explanation
I've already tested around 40 videos (all the sequences are black-and-white films), and I can confirm that your designed inference strategy works for most cases, even with videos longer than 110 frames. I was wondering if using different values for the noise initialization and latent interpolation may help adapting to different scenarios.
I've tried to do a brief search, but I could not find any terminal commands that would help me solving those isolated cases. Would it require to modify your original code?

@wbhu
Copy link
Collaborator

wbhu commented Oct 23, 2024

Hi, the noise initialization for overlapped segments has been included in the code. For the failure case, you may try to set a different random seed (which is default set to 42) by adding the argument "--seed xxx". I'm not sure if this will help or not ...

What will influence for sure is where to segment the video, you may tune this for the failure case

@arlo-ml
Copy link
Author

arlo-ml commented Oct 23, 2024

Thank you, I'll try to make more tests, following your suggestions

@STUDYHARD2113
Copy link

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.

Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.

Best, Wenbo

hi Wenbo,
I found the code to norm the whole segment depth, if I need to split a very long sequence (>150 frames) into different parts of the infer, if I want to keep the different segments consistent, do I need to remove this part? Because although different parts may have scene overlap, but from the depth gt to consider, then surely the depth range of different segment is not the same?

# normalize the depth map to [0, 1] across the whole video
res = (res - res.min()) / (res.max() - res.min())

@wbhu
Copy link
Collaborator

wbhu commented Nov 26, 2024

Hi, now we have released the v1.0.1 version with improved quality and speed. The issue of "over-saturated" depth estimation is greatly alleviated. You may give it a try to check the latest results

@wbhu
Copy link
Collaborator

wbhu commented Nov 26, 2024

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.
Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.
Best, Wenbo

hi Wenbo, I found the code to norm the whole segment depth, if I need to split a very long sequence (>150 frames) into different parts of the infer, if I want to keep the different segments consistent, do I need to remove this part? Because although different parts may have scene overlap, but from the depth gt to consider, then surely the depth range of different segment is not the same?

# normalize the depth map to [0, 1] across the whole video res = (res - res.min()) / (res.max() - res.min())

Hi, thanks for your valuable comments. I think it may produce a little bit better results if the normalization is performed globally. But I'm not sure, since we found the predicted values are almost between [0,1], even without post-normalization. Glad to hear your comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants