You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,sorry to interrupt! how did you get the ground truth to train the second part?I mean that the style code is extracted from other videos,there shouldn't have the ground truth about the output video which identity from the static photo while the style from another video, so I am a little confused about the ground truth Y in the second loss.Can you explain it?Thank you!
The text was updated successfully, but these errors were encountered:
Hi, during the training phase, we only use the style code, the audio and the video from the same footage. During inference, we can feed style codes from other footages.
That's effective because we feed style code into the middle layer of Resnet, which enforce Resnet to incorporate more style information.
Hi,sorry to interrupt! how did you get the ground truth to train the second part?I mean that the style code is extracted from other videos,there shouldn't have the ground truth about the output video which identity from the static photo while the style from another video, so I am a little confused about the ground truth Y in the second loss.Can you explain it?Thank you!
The text was updated successfully, but these errors were encountered: