Issues with Tokenizer in TTS Es Multispeaker FastPitch HiFiGAN Model - Problems with Date Tokenization #8075 #8076
-
Hello, I am currently working on integrating a Latin American Spanish text-to-speech (TTS) model into NVIDIA Riva, specifically using the tts_es_multispeaker_fastpitchhifigan model from NVIDIA NGC. After converting the model to a .riva extension using nemo2riva, I downloaded the tokenizer and verbalizer from Inverse Normalization ES-US. I have deployed it in riva with docker run --init -it --rm --gpus '"'"device=0"'"' -v $(pwd):/data -v riva-model-repo:/data-volumen -e "MODEL_DEPLOY_KEY=tlt_encode" --name riva-service-maker nvcr.io/nvidia/riva/riva-speech:2.13.0-servicemaker
riva-build speech_synthesis tts_es_hifigan_ft_fastpitch_multispeaker.rmir:tlt_encode tts_es_fastpitch_multispeaker.riva:tlt_encode tts_es_hifigan_ft_fastpitch_multispeaker.riva:tlt_encode --voice_name Latin-American-Spanish --wfst_tokenizer_model=tokenize_and_classify.far --wfst_verbalizer_model=verbalize.far --sample_rate 44100 --language_code es-US --num_speakers=174 --phone_set=ipa --subvoices 0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7,8:8,9:9,10:10,11:11,12:12,13:13,14:14,15:15,16:16,17:17,18:18,19:19,20:20,21:21,22:22,23:23,24:24,25:25,26:26,27:27,28:28,29:29,30:30,31:31,32:32,33:33,34:34,35:35,36:36,37:37,38:38,39:39,40:40,41:41,42:42,43:43,44:44,45:45,46:46,47:47,48:48,49:49,50:50,51:51,52:52,53:53,54:54,55:55,56:56,57:57,58:58,59:59,60:60,61:61,62:62,63:63,64:64,65:65,66:66,67:67,68:68,69:69,70:70,71:71,72:72,73:73,74:74,75:75,76:76,77:77,78:78,79:79,80:80,81:81,82:82,83:83,84:84,85:85,86:86,87:87,88:88,89:89,90:90,91:91,92:92,93:93,94:94,95:95,96:96,97:97,98:98,99:99,100:100,101:101,102:102,103:103,104:104,105:105,106:106,107:107,108:108,109:109,110:110,111:111,112:112,113:113,114:114,115:115,116:116,117:117,118:118,119:119,120:120,121:121,122:122,123:123,124:124,125:125,126:126,127:127,128:128,129:129,130:130,131:131,132:132,133:133,134:134,135:135,136:136,137:137,138:138,139:139,140:140,141:141,142:142,143:143,144:144,145:145,146:146,147:147,148:148,149:149,150:150,151:151,152:152,153:153,154:154,155:155,156:156,157:157,158:158,159:159,160:160,161:161,162:162,163:163,164:164,165:165,166:166,167:167,168:168,169:169,170:170,171:171,172:172,173:173
riva-deploy -f tts_es_hifigan_ft_fastpitch_multispeaker.rmir:tlt_encode /data/models
However, I've encountered a significant issue: the tokenizer struggles with words that require tokenization, especially dates. Instead of processing and pronouncing these elements correctly, it either skips them or handles them incorrectly. This is in stark contrast to the performance of the default models in riva_quickstart_v2.13.0, where the es-ES model handles such tokenization challenges efficiently. According to the NVIDIA NGC, the text normalizers were built using NeMo, but they do not seem to function correctly for dates, unlike the default models for es-ES. I would like to inquire:
Best regards. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
my guess it is related to text normalization issue. Please comment in this repo: NVIDIA/NeMo-text-processing#135 |
Beta Was this translation helpful? Give feedback.
my guess it is related to text normalization issue. Please comment in this repo: NVIDIA/NeMo-text-processing#135