From 32780ff991f990b73bd4537e99beb059879bc1a9 Mon Sep 17 00:00:00 2001 From: Shun Liang Date: Tue, 15 Oct 2024 16:25:30 +0100 Subject: [PATCH] Add more explanation in README --- README.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8fea776..1b4f33f 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ yt2doc is meant to work fully locally, without invoking any external API. The Op ## Why -There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break and topic segmentation. This project aims to transcribe videos with that post processing. +There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing. ## Installation @@ -59,12 +59,24 @@ yt2doc --video --segment-unchaptered --llm-model Among smaller size models, `gemma2:9b`, `llama3.1:8b`, and `qwen 2.5:7b` work reasonably well. -For MacOS devices running Apple Silicon, (a hacky) support for [whisper.cpp](https://github.com/ggerganov/whisper.cpp) is supported: +By default, yt2doc uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc): + +``` +yt2doc --video --whisper-model --whisper-device --whisper-compute-type +``` + +For the meaning and choices of `--whisper-model`, `--whisper-device` and `--whisper-compute-type`, please refer to this [comment](https://github.com/SYSTRAN/faster-whisper/blob/v1.0.3/faster_whisper/transcribe.py#L101-L127) of faster-whisper. + + +If you are running yt2doc on Apple Silicon, [whisper.cpp](https://github.com/ggerganov/whisper.cpp) gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented: ``` yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable --whisper-cpp-model ``` +See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration. + + yt2doc uses [Segment Any Text (SaT)](https://github.com/segment-any-text/wtpsplit) to segment the transcript into sentences and paragraphs. You can change the SaT model: ``` yt2doc --video --sat-model