Export to ONNX Format #214

DakeQQ · 2024-10-22T09:52:02Z

如果有人对F5_TTS-ONNX模型感兴趣，可以参考这链结: )。
If anyone is interested in the F5_TTS-ONNX model, they can refer to this link : ).
Export_ONNX

SWivid · 2024-10-22T10:05:42Z

@DakeQQ 非常感谢！
我们也有尝试过Int8，也发现慢了。希望有能人志士帮忙看看~

Thanks a lot!
We also tried Int8 and found it slow. Hope someone help with it ~

GreenLandisaLie · 2024-10-23T01:07:47Z

Thank you!

Its working and I can even use onnxruntime-directml (package) to run this on my AMD GPU! For that - the provider of ort_session_A and ort_session_C needs to be forced to ['CPUExecutionProvider'] but ort_session_B can use ['DmlExecutionProvider', 'CPUExecutionProvider'] and its blazing fast vs CPU.
Funny this works yet I cannot get torch_directml to work with the base .safetensors model (in gradio_app.py) no matter what I tried.

I'm facing a problem though - the ouputs are always in chinese... What do I need to change in 'Export_F5.py' to make this work for english?

DakeQQ · 2024-10-23T01:21:12Z

Thank you for your testing. However, the setup for the English version may need to be answered by the original author of the F5-TTS project. The code for ONNX export and execution is based on the original work.

According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.

GreenLandisaLie · 2024-10-23T01:38:17Z

According to my tests, ort_session_A and ort_session_C together take up less than 1% of the time cost, while ort_session_B occupies the majority of the time.

Yes and is why inference speed is pretty much not affected by setting those to CPU. ort_session_B is what matters and it runs fine on AMD GPUs using onnxruntime-directml!

Anyways, I've tried messing around with vocab and ofc the reference audio and text but the speaker always tries to speak chinese - even when ref text+audio and gen_text are in english. May be worth noting this has nothing to do with the fact I'm using directml because it also happened before I even tried that.

Looking forward to get this working on English... @SWivid please check this out when you have time. Thanks once again!

DakeQQ · 2024-10-23T08:02:53Z

Hello~ The issue with the English voice should have been resolved. Please try again using the latest F5-TTS-ONNX version. @GreenLandisaLie

GreenLandisaLie · 2024-10-23T18:00:47Z

Its working now both in chinese and english! Thanks!

@SWivid Maybe its worth adding a 'ONNX' branch at https://huggingface.co/SWivid/F5-TTS/tree/main.

SWivid · 2024-10-23T18:37:12Z

@GreenLandisaLie Yes, the onnx version is great!

Maybe better for @DakeQQ to do that?
and we will also add link to that onnx repo (currently credit and link to F5-TTS-ONNX repo).

eschmidbauer · 2024-10-24T13:30:44Z

can someone share the onnx export ? i would love to try it out!
Thanks

KungFuFurniture · 2024-10-28T18:52:16Z

If anyone would be willing to run me through how to do this and get it working on my Win10 5700xt I would be eternally greatful. (well at least until the next TTS upgrade comes out.)

eschmidbauer · 2024-10-28T18:55:09Z

@KungFuFurniture see this repo
i haven't tried in a few days but seems there has been some updates

KungFuFurniture · 2024-10-28T19:00:56Z

Yes I saw that, cloned the repo, changed some path directories in the export.py... But now I'm lost. I am really new to all this (maybe a year or so) so I am not 100% on what I am getting wrong.

Traceback (most recent call last):
  File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 316, in <module>
    torch.onnx.export(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 551, in export
    _export(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1648, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1170, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 1046, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\onnx\utils.py", line 950, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 1497, in _get_trace_graph
    outs = ONNXTracedModule(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 141, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\jit\_trace.py", line 132, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "D:\Games\F5\F5-TTS1\src\f5_tts\export_f5.py", line 154, in forward
    pred = self.f5_transformer(x=noise, cond=cat_mel_text, cond_drop=cat_mel_text_drop, time=self.time_expand[:, time_step], rope_cos=rope_cos, rope_sin=rope_sin, qk_rotated_empty=qk_rotated_empty)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Games\F5\F5-TTS\env\lib\site-packages\torch\nn\modules\module.py", line 1543, in _slow_forward
    result = self.forward(*input, **kwargs)
TypeError: DiT.forward() got an unexpected keyword argument 'cond_drop'

This is my error message.

DakeQQ · 2024-10-29T01:11:42Z

The wrong message "DiT.forward() got an unexpected keyword argument 'cond_drop'", shows that the export process used the original code doing the export process.

First, we use shutil.copyfile (Export_F5.py at line 77-82) to replace the original code with the modified version. Ensure that the modified Python scripts are stored in the 'modeling_modified' folder.

shutil.copyfile(modified_path + 'vocos/heads.py', python_package_path + '/vocos/heads.py')
shutil.copyfile(modified_path + 'vocos/models.py', python_package_path + '/vocos/models.py')
shutil.copyfile(modified_path + 'vocos/modules.py', python_package_path + '/vocos/modules.py')
shutil.copyfile(modified_path + 'vocos/pretrained.py', python_package_path + '/vocos/pretrained.py')
shutil.copyfile(modified_path + 'F5/modules.py', F5_project_path + '/model/modules.py')
shutil.copyfile(modified_path + 'F5/dit.py', F5_project_path + '/model/backbones/dit.py')

（We may have accidentally deleted some code. Please fetch the latest code and try again.）

KungFuFurniture · 2024-10-29T17:01:58Z

The wrong message "DiT.forward() got an unexpected keyword argument 'cond_drop'", shows that the export process used the original code doing the export process.

So I did a complete startover. Grabbed fresh F5, fresh venv, grabbed the link above, changed file locations from user Dake... It seems my file structure and some names are a bit different, and I believe that is getting me in some trouble. For example:

from src.f5_tts.model import CFM, DiT
from src.f5_tts.infer.utils_infer import load_checkpoint

load_checkpoints is in utils_infer not models.utils in my version of the f5 repo. But I believe I have found most of those things. Now I am stuck here:

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 14, in <module>
    from src.f5_tts.infer.utils_infer import load_checkpoint
  File "D:\Games\TTS\F5-TTS\src\f5_tts\infer\utils_infer.py", line 32, in <module>
    vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
  File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 69, in from_pretrained
    model = cls.from_hparams(config_path)
  File "D:\Games\TTS\F5-TTS\env\lib\site-packages\vocos\pretrained.py", line 54, in from_hparams
    with open(config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'charactr/vocos-mel-24khz/config.yaml'

I mean I have the config and pytorch_model but I can't figure out where to put em. I have tried about 16 different folders from a cached huggingface folder to the aforementioned infer folder. I dunno. I don't know anything about vocos and its lil brick road is far from Yellow. I fell outta Kansas quick.

SWivid · 2024-10-29T19:06:21Z

replace
vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
with

vocos = Vocos.from_hparams(f"{local_path}/config.yaml")
state_dict = torch.load(f"{local_path}/pytorch_model.bin", map_location=device)
vocos.load_state_dict(state_dict)
vocos.eval()

KungFuFurniture · 2024-10-29T19:43:52Z

Alright, making progress. Thank you for the help. After defining the local_path, I got the DiT uncond error again. Compared the 2 dit.py files they are the same. So it did copy. I ran it again... Got a different error.

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
    from model.modules import (
ModuleNotFoundError: No module named 'model'

Which you can see in the path model is there and module is within and so are the functions we are after. So I added the following line to the dit.py, as I used that once in a different project to resolve a similar "can't find the module" issue.

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))

That did not help...

Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 16, in <module>
    from model.modules import (
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
ImportError: cannot import name 'DiT' from partially initialized module 'f5_tts.model.backbones.dit' (most likely due to a circular import) (D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py)

But hey new errors are progress right?

SWivid · 2024-10-29T20:05:21Z

error due to literally a circular import
not with sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
but from model.modules import ( to from f5_tts.model.modules import (
we have reorganize the repo making it compatible for pkg form, check the lastest version

KungFuFurniture · 2024-10-29T20:07:28Z

Git pulled, got an update... Same thing

(env) D:\Games\TTS\F5-TTS>python export_f5.py
Traceback (most recent call last):
  File "D:\Games\TTS\F5-TTS\export_f5.py", line 13, in <module>
    from src.f5_tts.model import CFM, DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 1, in <module>
    from f5_tts.model.cfm import CFM
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\__init__.py", line 4, in <module>
    from f5_tts.model.backbones.dit import DiT
  File "D:\Games\TTS\F5-TTS\src\f5_tts\model\backbones\dit.py", line 15, in <module>
    from model.modules import (
ModuleNotFoundError: No module named 'model'

SWivid · 2024-10-29T20:18:39Z

The point is:
when replacing the modified script for ONNX compatibility, e.g. Export_ONNX/F5_TTS/modeling_modified/F5/dit.py,
need to keep an eye on the differences like
https://github.com/DakeQQ/F5-TTS-ONNX/blob/259d6198b6e91d6911bbd1f1e3a5ca96c0d21711/Export_ONNX/F5_TTS/modeling_modified/F5/dit.py#L16
Just put together the two repos and take a while look into it, you'll get it

GreenLandisaLie · 2024-10-29T21:44:20Z

@KungFuFurniture
You just need to replace the existing F5 repo files with the equivalent ones from the ONNX export repo and do the same for the VOCO package installation files as well (...\Lib\site-packages\vocos).
Place Export_F5.py directly on the root F5 folder (where gradio_app.py is), activate F5 environment then run it. Once converted, replace the files you replaced with their original counterparts (do a new install if you must). I think you have most of this figured out by now.

Just want to add one important thing if you want to run this on a AMD GPU you might need to do this:

'pip uninstall onnxruntime' then 'pip install onnxruntime-directml' and change the inference code by setting ort_session_B's providers to ['DmlExecutionProvider', 'CPUExecutionProvider']. The inference code for ONNX is essentially the last part of the Export_F5.py script and if you want to run it with gradio just make a copy of the gradio_app.py file, add 'import onnxruntime' and 'import jieba' followed by all of the necessary changes that are a bit too many for me to list them all. But in essence, you just need to replace the original pytorch inference code with the onnx equivalent, remove spectrogram inputs and ouputs from gradio as well as its functions during inference and force load your onnx models while ignoring the other pytorch ones... that's pretty much it.

PS: this is how I did it a week ago but the Export_T5.py file has been changed many times since then and so this might no longer work. Additionally - at the time - the Export_T5.py file did not contain necessary audio transformations that allow for invalid format .wav reference audio files so I had to copy paste those from the original code. You might or might not need to do this as well. Good Luck :D Hopefully someone will release the converted .onnx models with a pipeline for it so it will be easy to use in the future.

DakeQQ · 2024-10-30T02:02:17Z

@KungFuFurniture
We are very sorry for your poor experience. Due to the rapid updates of the original work, we were unable to update in time. Now, we have adapted and tested the export for the latest SWivid/F5-TTS. Please download the F5-TTS-ONNX export code again and try it once more.

Please note that, we use the modified load vocos method by the following code at line 52:
shutil.copyfile(modified_path + '/vocos/pretrained.py', python_package_path + '/vocos/pretrained.py')
If you can access the HuggingFace repository 'charactr/vocos-mel-24khz' directly, you can disable that line of code and re-install the vocos python package(it may be modified). Next, set the vocos_model_path = 'charactr/vocos-mel-24khz'

KungFuFurniture · 2024-10-30T16:32:53Z

First let me say to everyone, Thank you for the help.
@DakeQQ Certainly not a poor experience, but a learning experience. I certainly appreciate the work you have done here. An effort is Awesome.
So I made the Execution provider change to "B" as suggested. I got the Export.py to run successfully. I swapped back all the files it changed, both vocos, and f5 (modules, pretrained, etc.)
@GreenLandisaLie I have onnxruntime-directml (torch too). Gradio_app.py is no longer a thing, but there is an alternative. I am not sure that's where the change needs to be made any longer.

So here is where I am. Export seems to have worked, and I can still run the app, and it works. But it works exactly the same. Not using the GPU. (AMD 5700xt) That is I am sure a result of what Green mentioned in adjustments to app.py.

I feel like such a Kindergardner in College. I am so far in over my head gang. I learned Python from Youtube. lol. I know nothing about onnx - torch except that they help make the magic work.

So any suggestions on what to do next... ? Again all help is super appreciated. And I get it if you don't have time to educate me.

Cheers to all.

DakeQQ · 2024-10-31T01:28:59Z

@KungFuFurniture
If you're a beginner, it's advisable to start with simpler models like YOLO-v9, which are well-suited for NPUs and GPUs due to their gpu-friendly architecture.

Begin by successfully invoking the GPU with a simple model, as image processing models are generally easier to handle.
Export the model with static input and output shapes by setting dynamic_axes=None. This increases the likelihood of successful GPU code building.
Quantize the exported ONNX model to Float16 format. It better for GPU compute.
Use optimization tools such as onnxsim (pip install onnxsim) to simplify the exported model.
You can visualize the model structure using the Netron tool. If all operator node input/output shapes are numeric, it indicates a high probability of successful GPU execution.

Additionally, set the ONNX Runtime log level to 0 or 1 with session_opts.log_severity_level = 0. This provides detailed error reports from ONNX, which can be used to seek help from ChatGPT. Following these error reports should help you resolve most issues.

amblamps · 2024-11-05T09:41:37Z

It looks like the repo has changed a lot since the last ONNX export attempt.
I'm getting this error when trying to export to ONNX after replacing the modified vocos, and f5 files.

RuntimeError: Error(s) in loading state_dict for CFM:
	Missing key(s) in state_dict: "mel_spec.mel_stft.spectrogram.window", "mel_spec.mel_stft.mel_scale.fb".

Any ideas?

SWivid · 2024-11-05T12:13:15Z

@amblamps thought fixed by @DakeQQ , many thanks!
Mainly for the change with 712d527 to 3152302 at

F5-TTS/src/f5_tts/model/modules.py

Lines 30 to 143 in 4a69e6b

    
           def get_bigvgan_mel_spectrogram( 
        
               waveform, 
        
               n_fft=1024, 
        
               n_mel_channels=100, 
        
               target_sample_rate=24000, 
        
               hop_length=256, 
        
               win_length=1024, 
        
               fmin=0, 
        
               fmax=None, 
        
               center=False, 
        
           ):  # Copy from https://github.com/NVIDIA/BigVGAN/tree/main 
        
               device = waveform.device 
        
               key = f"{n_fft}_{n_mel_channels}_{target_sample_rate}_{hop_length}_{win_length}_{fmin}_{fmax}_{device}" 
        
               if key not in mel_basis_cache: 
        
                   mel = librosa_mel_fn(sr=target_sample_rate, n_fft=n_fft, n_mels=n_mel_channels, fmin=fmin, fmax=fmax) 
        
                   mel_basis_cache[key] = torch.from_numpy(mel).float().to(device)  # TODO: why they need .float()? 
        
                   hann_window_cache[key] = torch.hann_window(win_length).to(device) 
        
               mel_basis = mel_basis_cache[key] 
        
               hann_window = hann_window_cache[key] 
        
               padding = (n_fft - hop_length) // 2 
        
               waveform = torch.nn.functional.pad(waveform.unsqueeze(1), (padding, padding), mode="reflect").squeeze(1) 
        
               spec = torch.stft( 
        
                   waveform, 
        
                   n_fft, 
        
                   hop_length=hop_length, 
        
                   win_length=win_length, 
        
                   window=hann_window, 
        
                   center=center, 
        
                   pad_mode="reflect", 
        
                   normalized=False, 
        
                   onesided=True, 
        
                   return_complex=True, 
        
               ) 
        
               spec = torch.sqrt(torch.view_as_real(spec).pow(2).sum(-1) + 1e-9) 
        
               mel_spec = torch.matmul(mel_basis, spec) 
        
               mel_spec = torch.log(torch.clamp(mel_spec, min=1e-5)) 
        
               return mel_spec 
        
           def get_vocos_mel_spectrogram( 
        
               waveform, 
        
               n_fft=1024, 
        
               n_mel_channels=100, 
        
               target_sample_rate=24000, 
        
               hop_length=256, 
        
               win_length=1024, 
        
           ): 
        
               mel_stft = torchaudio.transforms.MelSpectrogram( 
        
                   sample_rate=target_sample_rate, 
        
                   n_fft=n_fft, 
        
                   win_length=win_length, 
        
                   hop_length=hop_length, 
        
                   n_mels=n_mel_channels, 
        
                   power=1, 
        
                   center=True, 
        
                   normalized=False, 
        
                   norm=None, 
        
               ).to(waveform.device) 
        
               if len(waveform.shape) == 3: 
        
                   waveform = waveform.squeeze(1)  # 'b 1 nw -> b nw' 
        
               assert len(waveform.shape) == 2 
        
               mel = mel_stft(waveform) 
        
               mel = mel.clamp(min=1e-5).log() 
        
               return mel 
        
           class MelSpec(nn.Module): 
        
               def __init__( 
        
                   self, 
        
                   n_fft=1024, 
        
                   hop_length=256, 
        
                   win_length=1024, 
        
                   n_mel_channels=100, 
        
                   target_sample_rate=24_000, 
        
                   mel_spec_type="vocos", 
        
               ): 
        
                   super().__init__() 
        
                   assert mel_spec_type in ["vocos", "bigvgan"], print("We only support two extract mel backend: vocos or bigvgan") 
        
                   self.n_fft = n_fft 
        
                   self.hop_length = hop_length 
        
                   self.win_length = win_length 
        
                   self.n_mel_channels = n_mel_channels 
        
                   self.target_sample_rate = target_sample_rate 
        
                   if mel_spec_type == "vocos": 
        
                       self.extractor = get_vocos_mel_spectrogram 
        
                   elif mel_spec_type == "bigvgan": 
        
                       self.extractor = get_bigvgan_mel_spectrogram 
        
                   self.register_buffer("dummy", torch.tensor(0), persistent=False) 
        
               def forward(self, wav): 
        
                   if self.dummy.device != wav.device: 
        
                       self.to(wav.device) 
        
                   mel = self.extractor( 
        
                       waveform=wav, 
        
                       n_fft=self.n_fft, 
        
                       n_mel_channels=self.n_mel_channels, 
        
                       target_sample_rate=self.target_sample_rate, 
        
                       hop_length=self.hop_length, 
        
                       win_length=self.win_length, 
        
                   ) 
        
                   return mel

DakeQQ · 2024-11-05T12:16:37Z

@amblamps
You can disable the src/f5_tts/infer/utils_infer.py, line 164-166, directely. Or use the lastest exported code and try more once.

# for key in ["mel_spec.mel_stft.mel_scale.fb", "mel_spec.mel_stft.spectrogram.window"]:
#     if key in checkpoint["model_state_dict"]:
#         del checkpoint["model_state_dict"][key]

amblamps · 2024-11-05T15:58:29Z

@amblamps You can disable the src/f5_tts/infer/utils_infer.py, line 164-166, directely. Or use the lastest exported code and try more once.
# for key in ["mel_spec.mel_stft.mel_scale.fb", "mel_spec.mel_stft.spectrogram.window"]:
#     if key in checkpoint["model_state_dict"]:
#         del checkpoint["model_state_dict"][key]

Thanks! That worked.

eschmidbauer · 2024-11-05T16:07:00Z

has anyone shared a recent onnx export and code for inference?

amblamps · 2024-11-05T20:33:06Z

@DakeQQ Do any other modifications need to be made to the script to export the E2 TTS model aside from pointing it to the correct checkpoint?

DakeQQ · 2024-11-06T01:39:39Z

We have not yet attempted to export the E2-TTS model. If its function call path is the same as that of F5-TTS, theoretically, only modifying the model file path would be necessary to make the corresponding adjustments. However, the actual situation may be more complex, so we currently do not have specific plans to export E2-TTS in ONNX format.

smickovskid · 2024-11-17T01:44:33Z

There still seem to be issues with the mel params, has anyone been able to export recently ?

DakeQQ · 2024-11-17T02:05:27Z

@smickovskid What mel parameter issues are you encountering？
Could the STFT_Process.py script resolve them?

smickovskid · 2024-11-17T02:17:38Z

Getting the same issue that @amblamps encountered

Traceback (most recent call last):
  File "F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 273, in <module>
    f5_model = load_model(F5_safetensors_path)
  File "F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 202, in load_model
    return load_checkpoint(model, ckpt_path, 'cpu', use_ema=True)
  File "F5-TTS/src/f5_tts/infer/utils_infer.py", line 168, in load_checkpoint
    model.load_state_dict(checkpoint["model_state_dict"])
  File "/miniconda3/envs/f5-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CFM:
	Missing key(s) in state_dict: "mel_spec.mel_stft.spectrogram.window", "mel_spec.mel_stft.mel_scale.fb".

I am using a custom fine tuned model, I also ran STFT_Process.py but still getting the same.

This is my Export_F5.py config

F5_project_path      = "/home/smickovskid/ai/F5-TTS"                               # The F5-TTS Github project download path.  URL: https://github.com/SWivid/F5-TTS
F5_safetensors_path  = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/model_last.pt"                 # The F5-TTS model download path.           URL: https://huggingface.co/SWivid/F5-TTS/tree/main/F5TTS_Base
vocos_model_path     = "/home/smickovskid/ai/F5-TTS-ONNX/vocos"                                     # The Vocos model download path.            URL: https://huggingface.co/charactr/vocos-mel-24khz/tree/main
onnx_model_A         = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Preprocess.onnx"                # The exported onnx model path.
onnx_model_B         = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Transformer.onnx"               # The exported onnx model path.
onnx_model_C         = "/home/smickovskid/ai/F5-TTS-ONNX/F5_Decode.onnx"                    # The exported onnx model path.
python_package_path  = '/home/smickovskid/miniconda3/envs/f5-tts/lib/python3.10/site-packages'  # The Python package path.
modified_path        = '/home/smickovskid/ai/F5-TTS-ONNX/Export_ONNX/F5_TTS/modeling_modified'

reference_audio      = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/samples/step_20000_ref.wav"   # The reference audio path.
generated_audio      = "/home/smickovskid/ai/F5-TTS/ckpts/ClapTrap/samples/step_20000_gen.wav"      # The generated audio path.
ref_text             = "Sanctuary. This Glacier's full of nothing but murderers or jerkbags, like that Hammerlock dude. Minion! I've got my eyesight back, and you're far uglier than I remembered. Anyway, it's time to get to the Resistance in Sanctuary!"
gen_text             = "Sanctuary. This Glacier's full of nothing but murderers or jerkbags, like that Hammerlock dude. Minion! I've got my eyesight back, and you're far uglier than I remembered. Anyway, it's time to get to the Resistance in Sanctuary!"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

I am running python Export_ONNX/F5_TTS/Export_F5.py as the command

Edit:

I've changed

model.load_state_dict(checkpoint["model_state_dict"], strict=False)

and it passes now but it fails down the line with

2024-11-17 03:26:34.026251302 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Mul node. Name:'/f5_transformer/transformer_blocks.0/attn/Mul_15' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 2048 by 2814

Traceback (most recent call last):
  File "/home/smickovskid/ai/F5-TTS-ONNX/Export_ONNX/F5_TTS/Export_F5.py", line 467, in <module>
    noise = ort_session_B.run(
  File "/home/smickovskid/miniconda3/envs/f5-tts/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 266, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Mul node. Name:'/f5_transformer/transformer_blocks.0/attn/Mul_15' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 2048 by 2814

DakeQQ · 2024-11-17T10:21:12Z

@smickovskid
Apologies for the delayed response. The main issue is that your audio input exceeds the maximum length defined in the exported ONNX model settings. Specifically, MAX_SIGNAL_LENGTH = 2048 (set at line 68 in Export_F5.py), while your audio, after the STFT process, has a length of 2814. Please re-export all ONNX models with an appropriately larger value for MAX_SIGNAL_LENGTH.

smickovskid · 2024-11-17T17:34:18Z

Hey @DakeQQ, sorry for the late response. Yeah that fixed it! Thanks for all the help.

patientx · 2024-11-28T14:40:05Z

Thank you!

Its working and I can even use onnxruntime-directml (package) to run this on my AMD GPU! For that - the provider of ort_session_A and ort_session_C needs to be forced to ['CPUExecutionProvider'] but ort_session_B can use ['DmlExecutionProvider', 'CPUExecutionProvider'] and its blazing fast vs CPU. Funny this works yet I cannot get torch_directml to work with the base .safetensors model (in gradio_app.py) no matter what I tried.

I'm facing a problem though - the ouputs are always in chinese... What do I need to change in 'Export_F5.py' to make this work for english?

Can you provide step by step guide for this ? do I clone that repository over this one , do I run both on the same venv ?

Initially, I only copied the "F5-TTS-ONNX-Inference.py" from that repository to inside f5-tts ,activated its venv , downloaded the provided onnx models, changed the directory names etc. When I just run the inference it automatically works with cpu but of course no gpu support (windows 10, rx 6600).

If I just add 'DmlExecutionProvider', 'CPUExecutionProvider' to ort session B providers I get this

AttributeError: module 'onnxruntime' has no attribute 'set_seed'

Of course I uninstalled onnxruntime and installed onnxruntime-gpu.

Edit : reinstalled onnxruntime, it is working but ofc no gpu support

DakeQQ · 2024-11-28T14:48:24Z

Perhaps try pip install onnxruntime-directml; onnxruntime-gpu is specifically for NVIDIA GPUs.

patientx · 2024-11-28T15:02:58Z

Perhaps try pip install onnxruntime-directml; onnxruntime-gpu is specifically for NVIDIA GPUs.

finally. uninstalled all onnxruntime packages, only installed onnxruntime-directml . Also changed the

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=ORT_Accelerate_Providers.append('CPUExecutionProvider'))

to

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider'])

(was just adding dmlexecutionprovider inside the paranthesis before)

Speed up is almost 4.5 times than CPU. Now the only thing to solve for me is to figure out how to convert to onnx to be able to use longer sample sizes. Right now it is limited to just around ten seconds. And it seems to be only using around 2 gb of gpu memory , clearly can do better with the gpu.

KungFuFurniture · 2024-11-28T18:20:33Z

Also changed the

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=ORT_Accelerate_Providers.append('CPUExecutionProvider'))

to

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider'])

(was just adding dmlexecutionprovider inside the paranthesis before)

That's what I needed!! Roaring on the gpu now!! SWEET!!! Happy Thanksgiving to those who apply.

912602337 · 2024-12-05T09:59:33Z

Has onnx improved the inference speed of the model

patientx · 2024-12-05T10:15:06Z

Has onnx improved the inference speed of the model

it "improved" it over cpu for sure :) Most probably still slower than cuda but since we amd users can't use it "easily" onnx is ok.

912602337 · 2024-12-06T01:37:10Z

Has onnx improved the inference speed of the model

it "improved" it over cpu for sure :) Most probably still slower than cuda but since we amd users can't use it "easily" onnx is ok.

Thank you very much for your reply

Rek-Malorm · 2024-12-26T22:33:52Z

Raised a PR on how to use with AMD gpu. It just works with standard Torch on linux

#671

Francis235 · 2025-01-06T10:16:27Z

Does anyone deploy the f5-tts on Qualcomm chips?

Sedherthe · 2025-01-30T11:28:34Z

Perhaps try pip install onnxruntime-directml; onnxruntime-gpu is specifically for NVIDIA GPUs.

finally. uninstalled all onnxruntime packages, only installed onnxruntime-directml . Also changed the

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=ORT_Accelerate_Providers.append('CPUExecutionProvider'))

to

ort_session_B = onnxruntime.InferenceSession(onnx_model_B, sess_options=session_opts, providers=['DmlExecutionProvider'])

(was just adding dmlexecutionprovider inside the paranthesis before)

Speed up is almost 4.5 times than CPU. Now the only thing to solve for me is to figure out how to convert to onnx to be able to use longer sample sizes. Right now it is limited to just around ten seconds. And it seems to be only using around 2 gb of gpu memory , clearly can do better with the gpu.

Hey @patientx , can you share any benchmark for timing? How much time does it take to generate?

SWivid mentioned this issue Oct 22, 2024

ONNX Version of Models #22

Closed

SWivid added the enhancement New feature or request label Nov 4, 2024

SWivid mentioned this issue Nov 16, 2024

Issues on macOS #360

Closed

Export to ONNX Format #214

Export to ONNX Format #214

Comments

DakeQQ commented Oct 22, 2024

SWivid commented Oct 22, 2024

GreenLandisaLie commented Oct 23, 2024

DakeQQ commented Oct 23, 2024 • edited Loading

GreenLandisaLie commented Oct 23, 2024 • edited Loading

DakeQQ commented Oct 23, 2024

GreenLandisaLie commented Oct 23, 2024

SWivid commented Oct 23, 2024 • edited Loading

eschmidbauer commented Oct 24, 2024

KungFuFurniture commented Oct 28, 2024

eschmidbauer commented Oct 28, 2024

KungFuFurniture commented Oct 28, 2024 • edited Loading

DakeQQ commented Oct 29, 2024

KungFuFurniture commented Oct 29, 2024

SWivid commented Oct 29, 2024

KungFuFurniture commented Oct 29, 2024

SWivid commented Oct 29, 2024

KungFuFurniture commented Oct 29, 2024 • edited Loading

SWivid commented Oct 29, 2024 • edited Loading

GreenLandisaLie commented Oct 29, 2024 • edited Loading

DakeQQ commented Oct 30, 2024 • edited Loading

KungFuFurniture commented Oct 30, 2024

DakeQQ commented Oct 31, 2024

amblamps commented Nov 5, 2024

SWivid commented Nov 5, 2024 • edited Loading

DakeQQ commented Nov 5, 2024

amblamps commented Nov 5, 2024

eschmidbauer commented Nov 5, 2024

amblamps commented Nov 5, 2024

DakeQQ commented Nov 6, 2024

smickovskid commented Nov 17, 2024

DakeQQ commented Nov 17, 2024

smickovskid commented Nov 17, 2024 • edited Loading

DakeQQ commented Nov 17, 2024

smickovskid commented Nov 17, 2024

patientx commented Nov 28, 2024 • edited Loading

DakeQQ commented Nov 28, 2024

patientx commented Nov 28, 2024

KungFuFurniture commented Nov 28, 2024

912602337 commented Dec 5, 2024

patientx commented Dec 5, 2024

912602337 commented Dec 6, 2024

Rek-Malorm commented Dec 26, 2024

Francis235 commented Jan 6, 2025

Sedherthe commented Jan 30, 2025 • edited Loading

DakeQQ commented Oct 23, 2024 •

edited

Loading

GreenLandisaLie commented Oct 23, 2024 •

edited

Loading

SWivid commented Oct 23, 2024 •

edited

Loading

KungFuFurniture commented Oct 28, 2024 •

edited

Loading

KungFuFurniture commented Oct 29, 2024 •

edited

Loading

SWivid commented Oct 29, 2024 •

edited

Loading

GreenLandisaLie commented Oct 29, 2024 •

edited

Loading

DakeQQ commented Oct 30, 2024 •

edited

Loading

SWivid commented Nov 5, 2024 •

edited

Loading

smickovskid commented Nov 17, 2024 •

edited

Loading

patientx commented Nov 28, 2024 •

edited

Loading

Sedherthe commented Jan 30, 2025 •

edited

Loading