Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech Recognition Streaming only transcribing "You" #187

Open
theman23290 opened this issue Nov 19, 2023 · 5 comments
Open

Speech Recognition Streaming only transcribing "You" #187

theman23290 opened this issue Nov 19, 2023 · 5 comments

Comments

@theman23290
Copy link

Then the speech recognition is streaming the transcribed output is always "you". It is using whisper for the transcribing. When I specifically use whisper and click on the microphone it works perfectly. But when streaming it only shows the word "you" on the terminal even if I don't say anything. I can confirm the microphone is activated when recording the audio. I have used SillyTavern on Windows 11, Debian, and Modded Debian with the same result. Any recommendations on what I can do to resolve this? I am on the latest ffmpeg, running the latest Extras in conda, and have enough horsepower to run the Extras program as intended.

@theman23290
Copy link
Author

theman23290 commented Nov 19, 2023

This issue seems to be related to this issue with Whisper: openai/whisper#679
TLDR: Implement --condition_on_previous_text and VAD, and the issues go away. Any way to implement that fix into this project?

@Cohee1207
Copy link
Member

That's for @Tony-sama to consider.

@Cohee1207
Copy link
Member

Check the recent commit. Is that what you asked?

@theman23290
Copy link
Author

I believe so. The fix still didn't fix the original issue though. I don't know if this is a whisper issue or if it is an issues with how whisper is implemented in this code. Here is the output on the terminal while a client is connected through api.

/home/senpai/miniconda/envs/extras/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Transcripted from audio file (whisper): you
172.18.0.2 - - [19/Nov/2023 21:31:21] "POST /api/speech-recognition/streaming/record-and-transcript HTTP/1.1" 200 -
172.18.0.2 - - [19/Nov/2023 21:31:21] "OPTIONS /api/speech-recognition/streaming/record-and-transcript HTTP/1.1" 200 -
Start recording from: default with samplerate 44100
Transcripted from microphone stream (vosk):
Recorded message saved to stt_test.wav
/home/senpai/miniconda/envs/extras/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Transcripted from audio file (whisper): you
172.18.0.2 - - [19/Nov/2023 21:31:27] "POST /api/speech-recognition/streaming/record-and-transcript HTTP/1.1" 200 -
172.18.0.2 - - [19/Nov/2023 21:31:27] "OPTIONS /api/speech-recognition/streaming/record-and-transcript HTTP/1.1" 200 -
Start recording from: default with samplerate 44100
Transcripted from microphone stream (vosk):
Recorded message saved to stt_test.wav

It repeats this output until the client disconnects. IDK where the bug is. From the research that I look into, it is more of an issue with the way whisper is implemented.

@Statford
Copy link

Hi, I had the same problem and all I did was leave it for a week, reboot it, and it (for whatever reason) worked perfectly after that. I wish I could be more helpful than that, but I had the same problem with my installation of whisper.
#217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants