Replies: 4 comments 1 reply
-
Could be combined with the existing audio detection label of speech to efficiently determine which clips should have speech to text applied. |
Beta Was this translation helpful? Give feedback.
-
This could also be helpful: https://github.com/Carleslc/AudioToText I would really like this feature! |
Beta Was this translation helpful? Give feedback.
-
Looks like this one is small, fast and available as ONNX models. I wonder if real time word triggers could be a thing as part of audio detection? https://www.reddit.com/r/LocalLLaMA/comments/1hh5y87/moonshine_web_realtime_inbrowser_speech/ |
Beta Was this translation helpful? Give feedback.
-
The yt-wsp.sh script over at whisper.cpp could be pretty helpful for anyone who doesn't want to wait for an implementation in Frigate. Right now it downloads a youtube video and works on the mp4 there but with minimal modifications it can certainly do the same for things recorded with frigate as it produces an srt file then bakes it into the orignal video. |
Beta Was this translation helpful? Give feedback.
-
Using faster whisper or something similar, process complete video segments and ether add a caption track (probably easier to play back later) or caption file that can then be added to frigate search.
It might be possible to leverage existing projects to add this capability.
https://github.com/McCloudS/subgen
Beta Was this translation helpful? Give feedback.
All reactions