Suggestions for prompts/prefix to include repetitions and false starts #2003
Replies: 1 comment
-
Hi Joe, Many thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hi Joe, Many thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hi everyone!
I'm using Whisper on a research project where we're hoping to use it as a first step in transcribing data verbatim according to a strict transcription protocol. This data is real, spontaneous speech rather than subtitles for a TV show or similar, so there's a lot of hesitations, filler words, repetitions of words, etc, and we want these transcribed. I've managed to write a prompt to include 'um's and 'uh's, but I was wondering if anyone has previously managed to get Whisper to successfully transcribe false starts and repetitions of words using prompts or any other features/settings?
Our current prompt is something like this:
This has helped a lot, but it doesn't catch all the repetitions, and doesn't do anything for false starts. Ideally, we want to end up with transcripts that include repetitions, as well as false starts in brackets followed by an underscore, so that the transcript look a bit like this:
Has anyone had any success with getting Whisper to do anything like this, or have any suggestions for what I could try? I've also tried a prompt modelled off the above examples with includes things like '(th_) there's a cat', but unfortunately this doesn't work at all, as well as having the completely unintended effect of swear words being censored with underscores.
Any help or thoughts would be much appreciated - thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions