varying depth values #30

arlo-ml · 2024-10-19T14:46:31Z

Hi, Thanks for your amazing work

I've been testing your code with long videos (from 300 to 800 frames). I often get varying values of the background over the time.
For example, with this video (383 frames), I get different values of the background, using these parameters, from frame 231 to frame 315:
Output frame rate: 24
Inference steps: 25
Guidance scale: 1.2
Dataset: kitti

Is it expected with longer videos?

arlo-ml · 2024-10-21T13:53:54Z

And these are other videos. As you can see, the values of the background change over time:

juntaosun · 2024-10-21T16:23:10Z

The project mainly relies on stable video diffusion.
Each time you enter the inference prediction it changes, it is random,
I don't think it is truly time consistent.

acgourley · 2024-10-22T00:24:08Z

My understanding of the paper is that they are using a sliding context window of around 1.5s for inference and so it makes sense it would shift over periods longer than a couple seconds. I doubt there is a simple fix but I'd love to hear it if people have ideas.

wbhu · 2024-10-23T06:35:03Z

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.

Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.

Best,
Wenbo

arlo-ml · 2024-10-23T09:32:16Z

hi Wenbo, thank you for the explanation
I've already tested around 40 videos (all the sequences are black-and-white films), and I can confirm that your designed inference strategy works for most cases, even with videos longer than 110 frames. I was wondering if using different values for the noise initialization and latent interpolation may help adapting to different scenarios.
I've tried to do a brief search, but I could not find any terminal commands that would help me solving those isolated cases. Would it require to modify your original code?

wbhu · 2024-10-23T09:43:46Z

Hi, the noise initialization for overlapped segments has been included in the code. For the failure case, you may try to set a different random seed (which is default set to 42) by adding the argument "--seed xxx". I'm not sure if this will help or not ...

What will influence for sure is where to segment the video, you may tune this for the failure case

arlo-ml · 2024-10-23T10:49:20Z

Thank you, I'll try to make more tests, following your suggestions

STUDYHARD2113 · 2024-10-24T13:49:41Z

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.

Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.

Best, Wenbo

hi Wenbo,
I found the code to norm the whole segment depth, if I need to split a very long sequence (>150 frames) into different parts of the infer, if I want to keep the different segments consistent, do I need to remove this part? Because although different parts may have scene overlap, but from the depth gt to consider, then surely the depth range of different segment is not the same?

# normalize the depth map to [0, 1] across the whole video
res = (res - res.min()) / (res.max() - res.min())

wbhu · 2024-11-26T09:35:10Z

Hi, now we have released the v1.0.1 version with improved quality and speed. The issue of "over-saturated" depth estimation is greatly alleviated. You may give it a try to check the latest results

wbhu · 2024-11-26T09:39:41Z

Hi, thank you for your feedback. Due to memory restriction, the max processing length for one time is 110. Videos longer than 110 are processed in overlapped segments.
Temporal consistency within the same segment is very good, I think. As for temporal consistency among segments, our designed inference strategy (including noise initialization and latent interpolation) works for most cases, but it's hard to always guarantee the consistency among segments due to temporal context.
Best, Wenbo

hi Wenbo, I found the code to norm the whole segment depth, if I need to split a very long sequence (>150 frames) into different parts of the infer, if I want to keep the different segments consistent, do I need to remove this part? Because although different parts may have scene overlap, but from the depth gt to consider, then surely the depth range of different segment is not the same?

# normalize the depth map to [0, 1] across the whole video res = (res - res.min()) / (res.max() - res.min())

Hi, thanks for your valuable comments. I think it may produce a little bit better results if the normalization is performed globally. But I'm not sure, since we found the predicted values are almost between [0,1], even without post-normalization. Glad to hear your comments

arlo-ml closed this as completed Oct 19, 2024

arlo-ml reopened this Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

varying depth values #30

varying depth values #30

arlo-ml commented Oct 19, 2024 •

edited

Loading

arlo-ml commented Oct 21, 2024 •

edited

Loading

juntaosun commented Oct 21, 2024 •

edited

Loading

acgourley commented Oct 22, 2024 •

edited

Loading

wbhu commented Oct 23, 2024

arlo-ml commented Oct 23, 2024

wbhu commented Oct 23, 2024

arlo-ml commented Oct 23, 2024

STUDYHARD2113 commented Oct 24, 2024

wbhu commented Nov 26, 2024

wbhu commented Nov 26, 2024

varying depth values #30

varying depth values #30

Comments

arlo-ml commented Oct 19, 2024 • edited Loading

arlo-ml commented Oct 21, 2024 • edited Loading

juntaosun commented Oct 21, 2024 • edited Loading

acgourley commented Oct 22, 2024 • edited Loading

wbhu commented Oct 23, 2024

arlo-ml commented Oct 23, 2024

wbhu commented Oct 23, 2024

arlo-ml commented Oct 23, 2024

STUDYHARD2113 commented Oct 24, 2024

wbhu commented Nov 26, 2024

wbhu commented Nov 26, 2024

arlo-ml commented Oct 19, 2024 •

edited

Loading

arlo-ml commented Oct 21, 2024 •

edited

Loading

juntaosun commented Oct 21, 2024 •

edited

Loading

acgourley commented Oct 22, 2024 •

edited

Loading