You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you have any plans for releasing the evaluation scripts?
I would like to reproduce the results provided in the tables in the paper, but it seems that enough details are not provided.
For example,
How can we obtain discrete units from pretrained DinoSR model?
I believe the argmax call in this line would produce them, but because this forward function doesn't seem to be targeted for evaluation, I'm not sure if the arguments given to the function is OK as it is.
@cromz22 Hi Shuichiro, I am also using this repo and facing the same problem. I am wondering that have you managed to work out a way for obtaining discrete units from pretrained DinoSR model after posting this issue? I will be very grateful for any help : )
I read through the code carefully, and I believe you are right. The discrete units should be the argmax values of negative distances between layer outputs and codebooks.
Thank you for opensourcing this amazing work!
Do you have any plans for releasing the evaluation scripts?
I would like to reproduce the results provided in the tables in the paper, but it seems that enough details are not provided.
For example,
How can we obtain discrete units from pretrained DinoSR model?
I believe the
argmax
call in this line would produce them, but because thisforward
function doesn't seem to be targeted for evaluation, I'm not sure if the arguments given to the function is OK as it is.dinosr/models/dinosr.py
Line 630 in 5a38d5e
The definition of the 5th layer seems unclear. Is it the 5th layer in 12 layers of Transformer or in the top 8 layers that was used for DinoSR?
What kind of forced alignment method was used? If Montreal Forced Aligner was used, which acoustic/dictionary models were used?
The text was updated successfully, but these errors were encountered: