You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for this great job! I followed the instructions INFERENCE , but encountered some difficulties.
from parler_tts import ParlerTTSForConditionalGeneration
import torch
from transformers import AutoTokenizer
import soundfile as sf
torch_device = "cuda:0" # use "mps" for Mac
torch_dtype = torch.float32
model_name = "parler-tts/parler-tts-mini-v1"
attn_implementation = "sdpa" # "sdpa" or "flash_attention_2"
model = ParlerTTSForConditionalGeneration.from_pretrained(model_name, torch_dtype=torch_dtype, attn_implementation=attn_implementation).to(torch_device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Hey, how are you doing today?"
description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(torch_device)
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(torch_device)
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().to(torch.float32).numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
when I set attn_implementation="sdpa",get an error
ValueError: T5EncoderModel does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: https://github.com/huggingface/transformers/issues/28005. If you believe this error is a bug, please open an issue in Transformers GitHub repository and load your model with the argument `attn_implementation="eager"` meanwhile. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="eager")`
and set attn_implementation="flash_attention_2",get an error
ValueError: T5EncoderModel does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/google/flan-t5-large/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
I use A100 GPU, my environment is:
transformers 4.46.1
torch 2.3.0
flash-attn 2.5.8
Am I missing some important configuration information?
The text was updated successfully, but these errors were encountered:
@aixingxy I am getting the same issue It was working before on an old installation that I have on Conda it seems there was some update that made it happen as I installed a new one and got it this week. I advise you to use the default one as when I tested all of them on 3090 and L40s I didnt see much difference in speed.
Hello, thanks for this great job! I followed the instructions INFERENCE , but encountered some difficulties.
when I set
attn_implementation="sdpa"
,get an errorand set
attn_implementation="flash_attention_2"
,get an errorI use A100 GPU, my environment is:
Am I missing some important configuration information?
The text was updated successfully, but these errors were encountered: