Add more explanation in README

shun-liang · Oct 15, 2024 · 32780ff · 32780ff
1 parent 3c8f9dd
commit 32780ff
Showing 1 changed file with 14 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ yt2doc is meant to work fully locally, without invoking any external API. The Op
 
 ## Why
 
-There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break and topic segmentation. This project aims to transcribe videos with that post processing. 
+There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing. 
 
 ## Installation
 
@@ -59,12 +59,24 @@ yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>
 
 Among smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well.
 
-For MacOS devices running Apple Silicon, (a hacky) support for [whisper.cpp](https://github.com/ggerganov/whisper.cpp) is supported:
+By default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):
+
+```
+yt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>
+```
+
+For the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper.
+
+
+If you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:
 
 ```
 yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable>  --whisper-cpp-model <path-to-whisper-cpp-model>
 ```
 
+See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.
+
+
 yt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model:
 ```
 yt2doc --video <video-url> --sat-model <sat-model>