-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to change speaker encoder to one-hot encoder #102
Comments
Hi @Jwaminju, Not sure if you still need it, but this might be helpful for anyone looking to do the same. The emb variable is indeed the right one to change. The embeddings currently used are created using the GE2E loss. I haven't tested it (and wrote this quite quickly), but something like this should work. for speaker in sorted(subdirList):
print('Processing speaker: %s' % speaker)
utterances = []
utterances.append(speaker)
_, _, fileList = next(os.walk(os.path.join(dirName,speaker)))
# make speaker embedding
assert len(fileList) >= num_uttrs
idx_uttrs = np.random.choice(len(fileList), size=num_uttrs, replace=False)
embs = [] with: # use unemerate to get index
for i, speaker in enumerate( sorted(subdirList)):
print('Processing speaker: %s' % speaker)
utterances = []
utterances.append(speaker)
# -----
# one hot embedding
# create zero array of shape (256,), note that this shape is right since squeeze effectivly changes the (1,256) into shape (256,)
emb = np.zeros(256, dtype=np.float32)
# set speaker id
emb[i] = 1
utterances.append(emb) The whole second for loop can be removed here since we don't need the mel spectogram to create the embeddings anymore. If you have more than 256 speakers or you want to change the embedding size to match the number of speakers you have, you'll have to pass the --dim_emb parameter on main. |
Hi @yenebeb, |
@yenebeb Thanks a lot for the comment! |
@WGQ123-code short answer, yes it's important to train with the one-hot embedding. Somewhat longer answer: @WildFire212 |
@yenebeb Thank you for the clarification! |
@yenebeb Thank you very much for your guidance! Wish you a happy life! |
Hi, I'm interested in this project, and I'm looking forward to run this with my Korean audio files.
But I'm undergraduated student with less knowledge about audio processing programming.
I've read a lot of issues in this repo, but I was confused.. so I uploaded this issue.
The Zero shot model demo got result, but I want to run AutoVC-One-Hot to compare.
Maybe I have to change make_metadata.py file to use one-hot encoder.
I tried to change speaker encoder to one-hot using tf.one_hot, but the print log of the variable, emb's shape(which was [1, 128, 80, 256]) was not same with the result of C(melsp)(whish was [1, 256])
I used the data same as demo wavs file.
Could you help me how to code the one-hot encodings? Thank you.
The text was updated successfully, but these errors were encountered: