-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIS Server CPU Mode - Model returns "you." only #114
Comments
Using WebRTC chances are it's not negotiating properly and you're not actually getting audio to Whisper ("you" is a pretty common hallucination when there is no speech/audio). I looked over your PR and will comment on it separately but this likely isn't related to disabling triton (triton and auto-gptq are only used for LLM support). Those infer times are painful but not exactly unexpected given the configuration you're running. WebRTC can be tough to debug - what client are you using? Can you give some network details? I fear that running on Mac with Docker Desktop (and the VM it uses), etc will present network challenges that make WebRTC support somewhere between difficult and impossible. |
Sure thing - I am running OSX 13.5 using Firefox 116.0.1 (64-bit) connected on loopback to the Docker exposed Webservice... |
That's what I suspected... Can you try Chrome? It's what we do most of our testing with and it has significantly better WebRTC support compared to Firefox. I suspect you'll still have negotiation issues but it's a good first debugging step. |
Ok, we are in business. I'm no longer sure of the exact issue - perhaps it was transient /user error. As I try to get native libraries going, here is a perf. reference on a M2 Mac w/ CPU support on Docker 24. I think the inference numbers look great for the base/tiny model... Original Text from Newspaper: Russian Groups fudge freight costs to mitigate impact of G7 oil press cap. $1 billion benefit in single-quarter. Customs data exposes adjustment. Baltic-India trade in spotlight. Standard Model (10x faster than realtime) :Transcribed Text: Russian Group's flood freight cost to mitigate impact of G7 oil press cap in the financial times. $1 billion benefit in single-quarter custom data exposes adjustment, Baltic India trade and spotlight.
Tiny
Large: Russian groups fudge freight costs to mitigate impact of G7 oil price cap. One billion benefit in single quarter. Customs data expose adjustment. Baltic India trade in spotlight.
|
I've made some progress on native Mac support. However, I'm currently experiencing a crash with signal 11 when attempting to run Whisper models. I think it could be related to my Macbook M1 only having 16GB of RAM but I'm going to continue to work through this. Thank you for providing these performance details - all in all not that bad considering but native performance plus Apple Accelerate support should be substantially better. Roughly 10.5x realtime for 13s of audio with base is still pretty bad considering a GTX 1070 for 10 seconds of audio with base is 115x - the realtime multiple increases dramatically with longer speech segments (same params on a GTX 1070 for ~30s of audio reaches 149x realtime). Per usual not really a fair comparison but based on what I've seen with native Mac performance with other projects we can likely do substantially better natively while still not approaching CUDA GPU speeds (for the time being). |
I created a feature/mac branch. You can read the commit message to get started. I'd be really interested to see how your testing goes with this. As I only have one Mac device (16GB Macbook Pro M1) I'm not able to do much more testing and as I've mentioned I think the issues I'm experiencing could be RAM related (or not). |
@NickJLange - Have you had a chance to try the feature/mac branch? |
Please also reference PR #113 for the run-as environment producing the below on CPU-only in Docker. All recorded output returns "You". I'm not in a position to confirm that the recorded audio passed to the model is the sample spoken text (could be anything), but wanted to park this as an issue that is perhaps known or a result of leaving off the triton optimizer on OSX.
The text was updated successfully, but these errors were encountered: