You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thank you for the excellent work and publicly available code.
I am using syncnet to find if there is lip-sync error in the video. I am getting very random values of AV offset and confidence, after using the train weights available on official website.
I am confused about this paragraph from the paper -
Determining the lip-sync error -
To find the time offset between the audio and the video, we take a sliding-window
approach. For each sample, the distance is computed between one 5-frame video
feature and all audio features in the ± 1 second range. The correct offset is when
this distance is at a minimum. However as Table 2 suggests, not all samples in
a clip are discriminative (for example, there may be samples in which nothing
is being said at that particular time), therefore multiple samples are taken for
each clip, and then averaged.
I am missing something in this paragraph. How do I collect multiple samples for each clip?
I would like to know how to get a proper value of metric (AV offset, Confidence) that show the out of sync of video and audio on sample.
Thank you
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for the excellent work and publicly available code.
I am using syncnet to find if there is lip-sync error in the video. I am getting very random values of AV offset and confidence, after using the train weights available on official website.
I am confused about this paragraph from the paper -
Determining the lip-sync error -
To find the time offset between the audio and the video, we take a sliding-window
approach. For each sample, the distance is computed between one 5-frame video
feature and all audio features in the ± 1 second range. The correct offset is when
this distance is at a minimum. However as Table 2 suggests, not all samples in
a clip are discriminative (for example, there may be samples in which nothing
is being said at that particular time), therefore multiple samples are taken for
each clip, and then averaged.
I am missing something in this paragraph. How do I collect multiple samples for each clip?
I would like to know how to get a proper value of metric (AV offset, Confidence) that show the out of sync of video and audio on sample.
Thank you
The text was updated successfully, but these errors were encountered: