You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed the same issue. It appears that the author has just utilized the encoder and decoder network architecture from movqgan and has retrained a VAE model. I'm not sure if my understanding is correct, so I would appreciate clarification from the author.
After training, each token of the encoder's output is very close to a certain vector in the codebook. I've tried adding the VQ to refine the encoder's output and find the MSE between the features before VQ and after VQ is small, thus w/ VQ or w/o VQ produces nearly the same decoding results.
I have some questions regarding the implementation of MoVQ and would appreciate your clarification.
From the original MoVQ paper, it is mentioned that a multi-channle VQ is adopted.
However, the implementation of kandinsky3 does not involve any vector quantization operation:
May I ask if it is a misunderstanding on my part regarding MoVQ, or if Kandinsky has made some modifications to the implementation of MoVQ?
The text was updated successfully, but these errors were encountered: