You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Texted displayed on ESP32: "Front door has been locked"
Audio: none
Expected audio: Front door has been locked
[2024-08-15 22:53:06 +0000] [93] [DEBUG] FASTAPI: Got WILLOW request for model medium beam size 1 language detection False
[2024-08-15 22:53:06 +0000] [93] [DEBUG] WILLOW: Audio information: sample rate: 16000, bits: 16, channel(s): 1, codec: pcm
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WILLOW: Source audio is raw PCM, creating WAV container
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Loading audio took 1.5610000000000002 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Feature extraction took 34.336 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using system default language en
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using model medium with beam size 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] Processing GPU batch 1 of expected 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Model took 322.387 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Decode took 0.339 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: ASR transcript: Lock front door.
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference took 359.313 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference speedup: 3x
[2024-08-15 22:53:09 +0000] [93] [DEBUG] FASTAPI: Got TTS request for speaker CLB with format FLAC and text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Got request for speaker CLB with text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loaded included speaker CLB
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loading speaker embedding took 1.484 ms
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Getting inputs took 1.0970000000000002 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating audio took 493.322 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating file took 3.4099999999999997 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Total time took 499.855 ms
Using WIS in docker, on ubuntu 24.04 (running as proxmox VM with GPU passthrough for Tesla P40).
Side note: webrtc also doesn't work for recording but I can generate TTS speech through API documents
The text was updated successfully, but these errors were encountered:
Using ESP32 S Box 3 with willow installed.
"Hi ESP, lock Front door"
Texted displayed on ESP32: "Front door has been locked" Audio: noneExpected audio: Front door has been locked
[2024-08-15 22:53:06 +0000] [93] [DEBUG] FASTAPI: Got WILLOW request for model medium beam size 1 language detection False
[2024-08-15 22:53:06 +0000] [93] [DEBUG] WILLOW: Audio information: sample rate: 16000, bits: 16, channel(s): 1, codec: pcm
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WILLOW: Source audio is raw PCM, creating WAV container
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Loading audio took 1.5610000000000002 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Feature extraction took 34.336 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using system default language en
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Using model medium with beam size 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] Processing GPU batch 1 of expected 1
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Model took 322.387 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Decode took 0.339 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: ASR transcript: Lock front door.
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference took 359.313 ms
[2024-08-15 22:53:07 +0000] [93] [DEBUG] WHISPER: Inference speedup: 3x
[2024-08-15 22:53:09 +0000] [93] [DEBUG] FASTAPI: Got TTS request for speaker CLB with format FLAC and text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Got request for speaker CLB with text: Front door has been locked.
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loaded included speaker CLB
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Loading speaker embedding took 1.484 ms
[2024-08-15 22:53:09 +0000] [93] [DEBUG] TTS: Getting inputs took 1.0970000000000002 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating audio took 493.322 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Generating file took 3.4099999999999997 ms
[2024-08-15 22:53:10 +0000] [93] [DEBUG] TTS: Total time took 499.855 ms
Using WIS in docker, on ubuntu 24.04 (running as proxmox VM with GPU passthrough for Tesla P40).
Side note: webrtc also doesn't work for recording but I can generate TTS speech through API documents
The text was updated successfully, but these errors were encountered: