diff --git a/README.md b/README.md index 1d21530..bca019c 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,37 @@ -# Automatic subtitles in your videos +## Automatic subtitles for your videos + + This repository uses `ffmpeg` and [OpenAI's Whisper](https://openai.com/blog/whisper) to automatically generate and overlay subtitles on any video. + Bigup to m1guelpf for releasing this tool + Kudos for all the improvements made by RapDoodle + +## Why forking it + + Because it needed some fixes, installer, dependencies.... and I wanted also to make it more flexible + + The first iteration of this tool consistently failed generating spanish subtitles in movies that start with english songs, + + even though all the movie is in spanish... this should ifx it by manually forcing the language in the parameters. + +## Advantages of this version (so far) + + - Can force subtitles to be generated in spanish + - Updated dependencies + - Fix audio out of sync issue + - Wildcard support for filenames + - Convert audio to subtitles (output `.srt` files) + - Option to pick a language instead of using language auto detection + - Extract audio from videos in parallel + - Disable `condition_on_previous_text` by default to avoid stucking in failure loop (especially for videos with long intervals between talks), with option `--enhance-consistency` to enable it. + - Many more new command options -This repository uses `ffmpeg` and [OpenAI's Whisper](https://openai.com/blog/whisper) to automatically generate and overlay subtitles on any video. ## Installation -To get started, you'll need Python 3.7 or newer. Install the binary by running the following command: +To get started, you'll need Python >= 3.7 && <= 3.11.9. Install the binary by running the following command: - pip install git+https://github.com/m1guelpf/auto-subtitle.git +```bash + pip install git+https://github.com/Sectumsempra82/auto-subtitle-plus.git +``` You'll also need to install [`ffmpeg`](https://ffmpeg.org/), which is available from most package managers: @@ -19,25 +44,205 @@ brew install ffmpeg # on Windows using Chocolatey (https://chocolatey.org/) choco install ffmpeg + +#you might also need python-ffmpeg + +pip3 install python-ffmpeg ``` -## Usage +## How to make it use your GPU for 3x faster generations + +Follow thsese instructions only if your gpu is powerful enough to be worth switching to torch-cuda + + - pip uninstall torch + + - pip cache purge + + - pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 + + or for python 3.11 + + - pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 + + Having a decent gpu can drammatically increase the performance + +![image](https://user-images.githubusercontent.com/19196549/221421292-fc09b38e-c3aa-46e3-8684-e46c1e4cc691.png) + + +## Options + + -h, --help show this help message and exit + --model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large} + name of the Whisper model to use (default: small) + --output-dir OUTPUT_DIR, -o OUTPUT_DIR + directory to save the outputs (default: .) + --output-srt, -s output the .srt file in the output directory (default: False) + --output-audio, -a output the audio extracted (default: False) + --output-video, -v generate video with embedded subtitles (default: False) + --enhance-consistency + use the previous output as input to the next window to improve consistency (may stuck in a + failure loop) (default: False) + --extract-workers EXTRACT_WORKERS + number of workers to extract audio (only useful when there are multiple videos) (default: 3) + --verbose print out the progress and debug messages (default: False) + --task {transcribe,translate} + whether to perform X->X speech recognition ('transcribe') or X->English translation + ('translate') (default: transcribe) + --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl, ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian, Burmese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian, Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Maori,Marathi,Moldavian,Moldovan, Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese, Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba} + language spoken in the audio, specify None to perform language detection (default: None) + --device DEVICE device to use for PyTorch inference (default: cuda) -The following command will generate a `subtitled/video.mp4` file contained the input video with overlayed subtitles. - auto_subtitle /path/to/video.mp4 -o subtitled/ +## Usage -The default setting (which selects the `small` model) works well for transcribing English. You can optionally use a bigger model for better results (especially with other languages). The available models are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, `medium.en`, `large`. +The following command will generate a `subtitled/video.mp4` file contained the input video with overlayed subtitles. - auto_subtitle /path/to/video.mp4 --model medium + auto_subtitle_plus /path/to/video.mp4 --output-video -o subtitled/ + +Convert all `mp4` videos in the current directory to `.srt` subtitles and store it in the current directory + + auto_subtitle_plus *.mp4 --output-srt + +---------------------- Recommended---------------- + +The following command will only generate an `.srt` file next to your video + + auto_subtitle_plus.exe 'video.avi' --model medium --output-srt + +-------------------------------------------------- + +The default setting (which selects the `small` model) works well for transcribing English and Spanish to a certain extent. + +--------------- NEW ------------------------------------------------------ + +You can use the --language parameter to force the output for the following languages: + +Afrikaans + - Albanian + - Amharic + - Arabic + - Armenian + - Assamese + - Azerbaijani + - Bashkir + - Basque + - Belarusian + - Bengali + - Bosnian + - Breton + - Bulgarian + - Burmese + - Castilian + - Catalan + - Chinese + - Croatian + - Czech + - Danish + - Dutch + - English + - Estonian + - Faroese + - Finnish + - Flemish + - French + - Galician + - Georgian + - German + - Greek + - Gujarati + - Haitian + - Haitian Creole + - Hausa + - Hawaiian + - Hebrew + - Hindi + - Hungarian + - Icelandic + - Indonesian + - Italian + - Japanese + - Javanese + - Kannada + - Kazakh + - Khmer + - Korean + - Lao + - Latin + - Latvian + - Letzeburgesch + - Lingala + - Lithuanian + - Luxembourgish + - Macedonian + - Malagasy + - Malay + - Malayalam + - Maltese + - Maori + - Marathi + - Moldavian + - Moldovan + - Mongolian + - Myanmar + - Nepali + - Norwegian + - Nynorsk + - Occitan + - Panjabi + - Pashto + - Persian + - Polish + - Portuguese + - Punjabi + - Pushto + - Romanian + - Russian + - Sanskrit + - Serbian + - Shona + - Sindhi + - Sinhala + - Sinhalese + - Slovak + - Slovenian + - Somali + - Spanish + - Sundanese + - Swahili + - Swedish + - Tagalog + - Tajik + - Tamil + - Tatar + - Telugu + - Thai + - Tibetan + - Turkish + - Turkmen + - Ukrainian + - Urdu + - Uzbek + - Valencian + - Vietnamese + - Welsh + - Yiddish + - Yorubaa + + +Further details on accuracy and models can be obtained here: https://github.com/openai/whisper#available-models-and-languages +-------------------------------------------------------------------------- + +You can optionally use a bigger model for better results (especially with other languages). The available models are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, `medium.en`, `large`. + + auto_subtitle_plus.exe /path/to/video.mp4 --model medium Adding `--task translate` will translate the subtitles into English: - auto_subtitle /path/to/video.mp4 --task translate + auto_subtitle_plus.exe /path/to/video.mp4 --task translate Run the following to view all available options: - auto_subtitle --help + auto_subtitle_plus.exe --help ## License diff --git a/auto_subtitle/cli.py b/auto_subtitle/cli.py deleted file mode 100644 index a58d14f..0000000 --- a/auto_subtitle/cli.py +++ /dev/null @@ -1,108 +0,0 @@ -import os -import ffmpeg -import whisper -import argparse -import warnings -import tempfile -from .utils import filename, str2bool, write_srt - - -def main(): - parser = argparse.ArgumentParser( - formatter_class=argparse.ArgumentDefaultsHelpFormatter) - parser.add_argument("video", nargs="+", type=str, - help="paths to video files to transcribe") - parser.add_argument("--model", default="small", - choices=whisper.available_models(), help="name of the Whisper model to use") - parser.add_argument("--output_dir", "-o", type=str, - default=".", help="directory to save the outputs") - parser.add_argument("--output_srt", type=str2bool, default=False, - help="whether to output the .srt file along with the video files") - parser.add_argument("--srt_only", type=str2bool, default=False, - help="only generate the .srt file and not create overlayed video") - parser.add_argument("--verbose", type=str2bool, default=False, - help="whether to print out the progress and debug messages") - - parser.add_argument("--task", type=str, default="transcribe", choices=[ - "transcribe", "translate"], help="whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')") - - args = parser.parse_args().__dict__ - model_name: str = args.pop("model") - output_dir: str = args.pop("output_dir") - output_srt: bool = args.pop("output_srt") - srt_only: bool = args.pop("srt_only") - os.makedirs(output_dir, exist_ok=True) - - if model_name.endswith(".en"): - warnings.warn( - f"{model_name} is an English-only model, forcing English detection.") - args["language"] = "en" - - model = whisper.load_model(model_name) - audios = get_audio(args.pop("video")) - subtitles = get_subtitles( - audios, output_srt or srt_only, output_dir, lambda audio_path: model.transcribe(audio_path, **args) - ) - - if srt_only: - return - - for path, srt_path in subtitles.items(): - out_path = os.path.join(output_dir, f"{filename(path)}.mp4") - - print(f"Adding subtitles to {filename(path)}...") - - video = ffmpeg.input(path) - audio = video.audio - - ffmpeg.concat( - video.filter('subtitles', srt_path, force_style="OutlineColour=&H40000000,BorderStyle=3"), audio, v=1, a=1 - ).output(out_path).run(quiet=True, overwrite_output=True) - - print(f"Saved subtitled video to {os.path.abspath(out_path)}.") - - -def get_audio(paths): - temp_dir = tempfile.gettempdir() - - audio_paths = {} - - for path in paths: - print(f"Extracting audio from {filename(path)}...") - output_path = os.path.join(temp_dir, f"{filename(path)}.wav") - - ffmpeg.input(path).output( - output_path, - acodec="pcm_s16le", ac=1, ar="16k" - ).run(quiet=True, overwrite_output=True) - - audio_paths[path] = output_path - - return audio_paths - - -def get_subtitles(audio_paths: list, output_srt: bool, output_dir: str, transcribe: callable): - subtitles_path = {} - - for path, audio_path in audio_paths.items(): - srt_path = output_dir if output_srt else tempfile.gettempdir() - srt_path = os.path.join(srt_path, f"{filename(path)}.srt") - - print( - f"Generating subtitles for {filename(path)}... This might take a while." - ) - - warnings.filterwarnings("ignore") - result = transcribe(audio_path) - warnings.filterwarnings("default") - - with open(srt_path, "w", encoding="utf-8") as srt: - write_srt(result["segments"], file=srt) - - subtitles_path[path] = srt_path - - return subtitles_path - - -if __name__ == '__main__': - main() diff --git a/auto_subtitle/__init__.py b/auto_subtitle_plus/__init__.py similarity index 100% rename from auto_subtitle/__init__.py rename to auto_subtitle_plus/__init__.py diff --git a/auto_subtitle_plus/cli.py b/auto_subtitle_plus/cli.py new file mode 100644 index 0000000..6f3c5a9 --- /dev/null +++ b/auto_subtitle_plus/cli.py @@ -0,0 +1,162 @@ +import os +import glob +import psutil +import ffmpeg +import whisper +import argparse +import warnings +import tempfile +import subprocess +import multiprocessing +from torch.cuda import is_available +from .utils import get_filename, write_srt, is_audio, ffmpeg_extract_audio + + +def main(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument("paths", nargs="+", type=str, + help="paths/wildcards to video files to transcribe") + parser.add_argument("--model", default="small", + choices=whisper.available_models(), help="name of the Whisper model to use") + parser.add_argument("--output-dir", "-o", type=str, + default=".", help="directory to save the outputs") + parser.add_argument("--output-srt", "-s", action='store_true', default=False, + help="output the .srt file in the output directory") + parser.add_argument("--output-audio", "-a", action='store_true', default=False, + help="output the audio extracted") + parser.add_argument("--output-video", "-v", action='store_true', default=False, + help="generate video with embedded subtitles") + parser.add_argument("--enhance-consistency", action='store_true', default=False, + help="use the previous output as input to the next window to improve consistency (may stuck in a failure loop)") + parser.add_argument("--extract-workers", type=int, default=max(1, psutil.cpu_count(logical=False) // 2), + help="number of workers to extract audio (only useful when there are multiple videos)") + parser.add_argument("--verbose", action='store_true', default=False, + help="print out the progress and debug messages") + + parser.add_argument("--task", type=str, default="transcribe", choices=[ + "transcribe", "translate"], help="whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')") + parser.add_argument("--language", type=str, default=None, + choices=sorted(whisper.tokenizer.LANGUAGES.keys()) + sorted([k.title() for k in whisper.tokenizer.TO_LANGUAGE_CODE.keys()]), + help="language spoken in the audio, specify None to perform language detection") + parser.add_argument("--device", default="cuda" if is_available() else "cpu", help="device to use for PyTorch inference") + + args = parser.parse_args().__dict__ + model_name: str = args.pop("model") + output_dir: str = args.pop("output_dir") + output_srt: bool = args.pop("output_srt") + output_video: bool = args.pop("output_video") + output_audio: bool = args.pop("output_audio") + device: str = args.pop("device") + extract_wokers: str = args.pop("extract_workers") + enhace_consistency: bool = args.pop("enhance_consistency") + os.makedirs(output_dir, exist_ok=True) + + # Default output_srt to True if output_video is False + if not output_video and not output_srt: + output_srt = True + + # Process wildcards + paths = [] + for path in args['paths']: + paths += list(glob.glob(path)) + n = len(paths) + if n == 0: + print('Video file not found.') + return + elif n > 1: + print('List of videos:') + for i, path in enumerate(paths): + print(f' {i+1}. {path}') + args.pop('paths') + + # Load models + if model_name.endswith(".en"): + warnings.warn( + "forcing English detection") + args["language"] = "en" + + model = whisper.load_model(model_name, device=device) + + # Extract audio from video. Skip if it is already an audio file + audios = get_audio(paths, output_audio, output_dir, extract_wokers) + + # Generate subtitles with whisper + subtitles = get_subtitles( + audios, output_srt, output_dir, + lambda audio_path: model.transcribe(audio_path, condition_on_previous_text=enhace_consistency, **args) + ) + + if not output_video: + return + + for path, srt_path in subtitles.items(): + # Skip audio files + if is_audio(path): + continue + + print(f"Adding subtitles to {path}...") + + out_path = os.path.join(output_dir, f"{get_filename(path)}.mp4") + if os.path.exists(out_path) and os.path.samefile(path, out_path): + out_path = os.path.join(output_dir, f"{get_filename(path)}-subtitled.mp4") + warnings.warn(f"{path} will overwrite the original file. Renaming the output file to {out_path}") + + video = ffmpeg.input(path) + audio = video.audio + + ffmpeg.concat( + video.filter('subtitles', srt_path, force_style="OutlineColour=&H40000000,BorderStyle=3"), audio, v=1, a=1 + ).output(out_path).run(quiet=False, overwrite_output=True) + + print(f"Saved subtitled video to {os.path.abspath(out_path)}.") + + +def get_audio(paths, output_audio, output_dir, num_workers=1): + temp_dir = tempfile.gettempdir() + audio_paths = {} + func_args = [] + + for path in paths: + if is_audio(path): + # Skip audio files + output_path = path + else: + output_path = output_dir if output_audio else tempfile.gettempdir() + output_path = os.path.join(output_path, f"{get_filename(path)}.mp3") + func_args.append((path, output_path)) + + audio_paths[path] = output_path + + # Execute on multiple processes + pool = multiprocessing.Pool(num_workers) + pool.starmap(ffmpeg_extract_audio, func_args) + + return audio_paths + + +def get_subtitles(audio_paths: list, output_srt: bool, output_dir: str, transcribe: callable): + subtitles_path = {} + + for path, audio_path in audio_paths.items(): + srt_path = output_dir if output_srt else tempfile.gettempdir() + srt_path = os.path.join(srt_path, f"{get_filename(path)}.srt") + + print( + f"Generating subtitles for {path}... This might take a while." + ) + + warnings.filterwarnings("ignore") + result = transcribe(audio_path) + warnings.filterwarnings("default") + + with open(srt_path, "w", encoding="utf-8") as srt: + write_srt(result["segments"], file=srt) + + subtitles_path[path] = srt_path + + return subtitles_path + + +if __name__ == '__main__': + main() diff --git a/auto_subtitle/utils.py b/auto_subtitle_plus/utils.py similarity index 72% rename from auto_subtitle/utils.py rename to auto_subtitle_plus/utils.py index ee5515b..a354ff2 100644 --- a/auto_subtitle/utils.py +++ b/auto_subtitle_plus/utils.py @@ -1,4 +1,5 @@ import os +import subprocess from typing import Iterator, TextIO @@ -42,5 +43,15 @@ def write_srt(transcript: Iterator[dict], file: TextIO): ) -def filename(path): +def get_filename(path): return os.path.splitext(os.path.basename(path))[0] + + +def is_audio(path): + return True if path.endswith(('.mp3', '.wav', '.flac', '.m4a', '.wma', '.aac')) else False + + +def ffmpeg_extract_audio(input_path, output_path): + print(f"Extracting audio from {input_path}...") + if subprocess.run(('ffmpeg', '-y', '-i', input_path, '-ac', '1', '-async', '1', output_path), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL).returncode > 0: + raise Exception(f'Error occurred while extracting audio from {input_path}') diff --git a/build/lib/auto_subtitle_plus/__init__.py b/build/lib/auto_subtitle_plus/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/build/lib/auto_subtitle_plus/cli.py b/build/lib/auto_subtitle_plus/cli.py new file mode 100644 index 0000000..6f3c5a9 --- /dev/null +++ b/build/lib/auto_subtitle_plus/cli.py @@ -0,0 +1,162 @@ +import os +import glob +import psutil +import ffmpeg +import whisper +import argparse +import warnings +import tempfile +import subprocess +import multiprocessing +from torch.cuda import is_available +from .utils import get_filename, write_srt, is_audio, ffmpeg_extract_audio + + +def main(): + parser = argparse.ArgumentParser( + formatter_class=argparse.ArgumentDefaultsHelpFormatter) + parser.add_argument("paths", nargs="+", type=str, + help="paths/wildcards to video files to transcribe") + parser.add_argument("--model", default="small", + choices=whisper.available_models(), help="name of the Whisper model to use") + parser.add_argument("--output-dir", "-o", type=str, + default=".", help="directory to save the outputs") + parser.add_argument("--output-srt", "-s", action='store_true', default=False, + help="output the .srt file in the output directory") + parser.add_argument("--output-audio", "-a", action='store_true', default=False, + help="output the audio extracted") + parser.add_argument("--output-video", "-v", action='store_true', default=False, + help="generate video with embedded subtitles") + parser.add_argument("--enhance-consistency", action='store_true', default=False, + help="use the previous output as input to the next window to improve consistency (may stuck in a failure loop)") + parser.add_argument("--extract-workers", type=int, default=max(1, psutil.cpu_count(logical=False) // 2), + help="number of workers to extract audio (only useful when there are multiple videos)") + parser.add_argument("--verbose", action='store_true', default=False, + help="print out the progress and debug messages") + + parser.add_argument("--task", type=str, default="transcribe", choices=[ + "transcribe", "translate"], help="whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate')") + parser.add_argument("--language", type=str, default=None, + choices=sorted(whisper.tokenizer.LANGUAGES.keys()) + sorted([k.title() for k in whisper.tokenizer.TO_LANGUAGE_CODE.keys()]), + help="language spoken in the audio, specify None to perform language detection") + parser.add_argument("--device", default="cuda" if is_available() else "cpu", help="device to use for PyTorch inference") + + args = parser.parse_args().__dict__ + model_name: str = args.pop("model") + output_dir: str = args.pop("output_dir") + output_srt: bool = args.pop("output_srt") + output_video: bool = args.pop("output_video") + output_audio: bool = args.pop("output_audio") + device: str = args.pop("device") + extract_wokers: str = args.pop("extract_workers") + enhace_consistency: bool = args.pop("enhance_consistency") + os.makedirs(output_dir, exist_ok=True) + + # Default output_srt to True if output_video is False + if not output_video and not output_srt: + output_srt = True + + # Process wildcards + paths = [] + for path in args['paths']: + paths += list(glob.glob(path)) + n = len(paths) + if n == 0: + print('Video file not found.') + return + elif n > 1: + print('List of videos:') + for i, path in enumerate(paths): + print(f' {i+1}. {path}') + args.pop('paths') + + # Load models + if model_name.endswith(".en"): + warnings.warn( + "forcing English detection") + args["language"] = "en" + + model = whisper.load_model(model_name, device=device) + + # Extract audio from video. Skip if it is already an audio file + audios = get_audio(paths, output_audio, output_dir, extract_wokers) + + # Generate subtitles with whisper + subtitles = get_subtitles( + audios, output_srt, output_dir, + lambda audio_path: model.transcribe(audio_path, condition_on_previous_text=enhace_consistency, **args) + ) + + if not output_video: + return + + for path, srt_path in subtitles.items(): + # Skip audio files + if is_audio(path): + continue + + print(f"Adding subtitles to {path}...") + + out_path = os.path.join(output_dir, f"{get_filename(path)}.mp4") + if os.path.exists(out_path) and os.path.samefile(path, out_path): + out_path = os.path.join(output_dir, f"{get_filename(path)}-subtitled.mp4") + warnings.warn(f"{path} will overwrite the original file. Renaming the output file to {out_path}") + + video = ffmpeg.input(path) + audio = video.audio + + ffmpeg.concat( + video.filter('subtitles', srt_path, force_style="OutlineColour=&H40000000,BorderStyle=3"), audio, v=1, a=1 + ).output(out_path).run(quiet=False, overwrite_output=True) + + print(f"Saved subtitled video to {os.path.abspath(out_path)}.") + + +def get_audio(paths, output_audio, output_dir, num_workers=1): + temp_dir = tempfile.gettempdir() + audio_paths = {} + func_args = [] + + for path in paths: + if is_audio(path): + # Skip audio files + output_path = path + else: + output_path = output_dir if output_audio else tempfile.gettempdir() + output_path = os.path.join(output_path, f"{get_filename(path)}.mp3") + func_args.append((path, output_path)) + + audio_paths[path] = output_path + + # Execute on multiple processes + pool = multiprocessing.Pool(num_workers) + pool.starmap(ffmpeg_extract_audio, func_args) + + return audio_paths + + +def get_subtitles(audio_paths: list, output_srt: bool, output_dir: str, transcribe: callable): + subtitles_path = {} + + for path, audio_path in audio_paths.items(): + srt_path = output_dir if output_srt else tempfile.gettempdir() + srt_path = os.path.join(srt_path, f"{get_filename(path)}.srt") + + print( + f"Generating subtitles for {path}... This might take a while." + ) + + warnings.filterwarnings("ignore") + result = transcribe(audio_path) + warnings.filterwarnings("default") + + with open(srt_path, "w", encoding="utf-8") as srt: + write_srt(result["segments"], file=srt) + + subtitles_path[path] = srt_path + + return subtitles_path + + +if __name__ == '__main__': + main() diff --git a/build/lib/auto_subtitle_plus/utils.py b/build/lib/auto_subtitle_plus/utils.py new file mode 100644 index 0000000..a354ff2 --- /dev/null +++ b/build/lib/auto_subtitle_plus/utils.py @@ -0,0 +1,57 @@ +import os +import subprocess +from typing import Iterator, TextIO + + +def str2bool(string): + string = string.lower() + str2val = {"true": True, "false": False} + + if string in str2val: + return str2val[string] + else: + raise ValueError( + f"Expected one of {set(str2val.keys())}, got {string}") + + +def format_timestamp(seconds: float, always_include_hours: bool = False): + assert seconds >= 0, "non-negative timestamp expected" + milliseconds = round(seconds * 1000.0) + + hours = milliseconds // 3_600_000 + milliseconds -= hours * 3_600_000 + + minutes = milliseconds // 60_000 + milliseconds -= minutes * 60_000 + + seconds = milliseconds // 1_000 + milliseconds -= seconds * 1_000 + + hours_marker = f"{hours}:" if always_include_hours or hours > 0 else "" + return f"{hours_marker}{minutes:02d}:{seconds:02d}.{milliseconds:03d}" + + +def write_srt(transcript: Iterator[dict], file: TextIO): + for i, segment in enumerate(transcript, start=1): + print( + f"{i}\n" + f"{format_timestamp(segment['start'], always_include_hours=True)} --> " + f"{format_timestamp(segment['end'], always_include_hours=True)}\n" + f"{segment['text'].strip().replace('-->', '->')}\n", + file=file, + flush=True, + ) + + +def get_filename(path): + return os.path.splitext(os.path.basename(path))[0] + + +def is_audio(path): + return True if path.endswith(('.mp3', '.wav', '.flac', '.m4a', '.wma', '.aac')) else False + + +def ffmpeg_extract_audio(input_path, output_path): + print(f"Extracting audio from {input_path}...") + if subprocess.run(('ffmpeg', '-y', '-i', input_path, '-ac', '1', '-async', '1', output_path), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL).returncode > 0: + raise Exception(f'Error occurred while extracting audio from {input_path}') diff --git a/requirements.txt b/requirements.txt index 73bca28..6fb0566 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1,2 @@ openai-whisper +psutil diff --git a/setup.py b/setup.py index ca2ed5b..15797fa 100644 --- a/setup.py +++ b/setup.py @@ -1,17 +1,19 @@ from setuptools import setup, find_packages setup( - version="1.0", - name="auto_subtitle", + version="0.1", + name="auto_subtitle_plus", packages=find_packages(), - py_modules=["auto_subtitle"], - author="Miguel Piedrafita", + py_modules=["auto_subtitle_plus"], + author="Sectux - based on the work of Miguel Piedrafita and RapDoodle", install_requires=[ - 'openai-whisper', + 'youtube-dl', + 'psutil', + 'openai-whisper' ], - description="Automatically generate and embed subtitles into your videos", + description="Automatically generate and/or embed subtitles into your videos", entry_points={ - 'console_scripts': ['auto_subtitle=auto_subtitle.cli:main'], + 'console_scripts': ['auto_subtitle_plus=auto_subtitle_plus.cli:main'], }, include_package_data=True, )