Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX Model Fail to run #95

Open
rajeevbaalwan opened this issue Oct 1, 2023 · 14 comments
Open

ONNX Model Fail to run #95

rajeevbaalwan opened this issue Oct 1, 2023 · 14 comments

Comments

@rajeevbaalwan
Copy link

rajeevbaalwan commented Oct 1, 2023

Hi have exported the espnet model trained on my custom dataset using espnet_onnx. Model fails to work properly on some audios. Below is the error which i am getting

Non-zero status code returned while running Add node. Name:'/encoders/encoders.0/self_attn/Add' Status Message: /encoders/encoders.0/self_attn/Add: right operand cannot broadcast on dim 3 LeftShape: {1,8,171,171}, RightShape: {1,1,1,127}

Any idea what could be the issue here. I have infered model on 1500 audio clips and i am getting exactly same error on around 400 audio clips.

@Masao-Someki
Copy link
Collaborator

Hi @rajeevbaalwan I would like to confirm some points:

  • Would you tell me which encoder you use in your model?
  • Did you observe any similarities between them?

@rajeevbaalwan
Copy link
Author

Hi @rajeevbaalwan I would like to confirm some points:

  • Would you tell me which encoder you use in your model?
  • Did you observe any similarities between them?

Thanks @Masao-Someki for your reply. I have used a simple transformer encoder.
I didn't get your question regarding similarity. Do you want to know the similarity between error outputs or something else ?

@rajeevbaalwan
Copy link
Author

@Masao-Someki I have tried with conformer encoder based ASR model also but getting same error.

2023-10-08 23:12:29.048358681 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Add node. Name:'/encoders/encoders.0/self_attn/Add_5' Status Message: /encoders/encoders.0/self_attn/Add_5: right operand cannot broadcast on dim 3 LeftShape: {1,8,187,187}, RightShape: {1,1,1,127}

@Masao-Someki
Copy link
Collaborator

@rajeevbaalwan
The node /encoders/encoders.0/self_attn/Add is the masking process. I think increasing the max_seq_len will fix this issue!

tag_name = 'your model'
m = ASRModelExport()

# Add the following export config
m.set_export_config(
    max_seq_len=5000,
)

m.export_from_pretrained(tag_name, quantize=False, optimize=False)

@Masao-Someki
Copy link
Collaborator

In the masking process, your input audio seems to have a 171 frame length, while the mask has a 127 frame length. This difference causes this issue. The frame length is estimated during the onnx inference, but the maximum frame length is limited to the max_seq_len value. So increasing this value might fix this problem.

@rajeevbaalwan
Copy link
Author

@Masao-Someki Thanks it worked for me. But the exported ONNX models do not work with batch input, right ? It only works for a single audio clip.

@Masao-Someki
Copy link
Collaborator

@rajeevbaalwan
Yes, it does not work with batched input.

If you want to run batched inference, then you need to:

  1. Add the dynamic axes for batch dimension in the below script.
  2. Fix the inference function.

def get_dynamic_axes(self):
return {"feats": {1: "feats_length"}, "encoder_out": {1: "enc_out_length"}}

@rajeevbaalwan
Copy link
Author

rajeevbaalwan commented Oct 10, 2023

@Masao-Someki Thanks for the reply. I have already made the changes in the dynamic axes but this only won't solve the problem as the forward function only takes feats and not the actual length of the inputs in the batch, that's why enc_out_length is always wrong for the batch input as features length is calculated as below

feats_length = torch.ones(feats[:, :, 0].shape).sum(dim=-1).type(torch.long)

Is there any plan to handle batch inference during ONNX export in espnet_onnx? The complete inference function needs to be changed. If espnet_onnx is supposed to be implemented to prepare models for production then batch inferencing support is a must in the exported models. Single clip inference won't help in production.

@Masao-Someki
Copy link
Collaborator

@rajeevbaalwan
Sorry for the inconvenience, but currently we have no plan to support batch inference.
We have investigated the speed up with batched inference in our paper by tring to apply onnx hubert for training, but onnx seems to be less effective with large batch size.

@rajeevbaalwan
Copy link
Author

@Masao-Someki You are absolutely right ONNX exported do not give huge speed up for large batch sizes but for small batch size like 4, 8, etc it is better than single clip inferencing. So it is better to have GPU-based implementation as it will be the generic implementation that will work for both single clip as well as multiple clips so that the user can have the flexibility. Event batch implementation doesn't degrade the performance for single clip inference. So can you take this feature into consideration?

@rajeevbaalwan
Copy link
Author

@Masao-Someki is ESPnetLanguageModel is support in ONNX?

@Masao-Someki
Copy link
Collaborator

@rajeevbaalwan
I assume that the user of this library is more like an individual who wants to execute the ESPnet model on a low-resource constraint, such as Raspi. If the inference with the onnx format does not provide enough speedup, then we don't need ESPnet-ONNX, we can just use GPU.
Of course, I know having a multiple-batch inference option may be better, but I don't think it is worth implementing here.

is ESPnetLanguageModel is support in ONNX?

Yes, you can include an external language model.

@rajeevbaalwan
Copy link
Author

@rajeevbaalwan I assume that the user of this library is more like an individual who wants to execute the ESPnet model on a low-resource constraint, such as Raspi. If the inference with the onnx format does not provide enough speedup, then we don't need ESPnet-ONNX, we can just use GPU. Of course, I know having a multiple-batch inference option may be better, but I don't think it is worth implementing here.

is ESPnetLanguageModel is support in ONNX?

Yes, you can include an external language model.

@Masao-Someki I can't find the code to export the Language Model in ONNX in the repo.

@Masao-Someki
Copy link
Collaborator

@rajeevbaalwan
In the following line, ESPnet-onnx has export function for language models!

# export lm
lm_model = None
if not model.asr_model.use_transducer_decoder:
if "lm" in model.beam_search.scorers.keys():
lm_model = get_lm(model.beam_search.scorers["lm"], self.export_config)
else:
if model.beam_search_transducer.use_lm:
lm_model = get_lm(model.beam_search_transducer.lm, self.export_config)
if lm_model is not None:
self._export_lm(lm_model, export_dir, verbose)
model_config.update(lm=lm_model.get_model_config(export_dir))
else:
model_config.update(lm=dict(use_lm=False))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants