Why do we input a 16-frame video and reconstruct only 13 frames? #11

QChencq · 2025-03-15T07:05:59Z

Thanks for your exciting work!
We input a 16-frame video. When decoding, the first block of the video is reconstructed as 1 frame, and the second block is reconstructed as 12 frames. When concated together, there are only 13 frames. Why is this?

qqingzheng · 2025-03-15T07:43:04Z

Hi. Because we use Causal Convolution, you should input a video with 1 + t_down * n frames like 9, 17, 25, 33. Here, t_down = 4.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we input a 16-frame video and reconstruct only 13 frames? #11

Why do we input a 16-frame video and reconstruct only 13 frames? #11

QChencq commented Mar 15, 2025

qqingzheng commented Mar 15, 2025 •

edited

Loading

Why do we input a 16-frame video and reconstruct only 13 frames? #11

Why do we input a 16-frame video and reconstruct only 13 frames? #11

Comments

QChencq commented Mar 15, 2025

qqingzheng commented Mar 15, 2025 • edited Loading

qqingzheng commented Mar 15, 2025 •

edited

Loading