Skip to content

Latest commit

 

History

History
597 lines (391 loc) · 22.9 KB

README.md

File metadata and controls

597 lines (391 loc) · 22.9 KB

PRs welcome PyPI version Latest release Codecov Continuous Integration Continuous Deployment Python versions Code style: black License Clean commits

shuku logo: stylised bonsai tree with speech bubble leaves in a circular frame, set against pastel hills

shuku

Shrink media to keep only the dialogue.

shuku demo

shuku — from しゅくしょう: "minification"

Designed for language learners, shuku creates dialogue-only versions of media (show episodes, films, or YouTube videos) using subtitles. Useful to revisit content efficiently and improve comprehension.

Read this blog post to learn more about the motivation and technical details behind shuku.

Running shuku:

shuku_demo.mp4

What the output looks/sounds like (audio is from condensed version):

blade_runner_2049_original_vs_condensed.mp4

Table of contents

Features

  • Create condensed audio, video, and subtitle files based on dialogue
  • Multiplatform support: GNU+Linux, macOS, and Windows
  • Extensive configuration options, including codec, quality, and custom FFmpeg arguments
  • Smart audio/subtitle track selection
  • Fuzzy matching for external subtitles
  • Pad subtitle timings for context and smoother transitions
  • Shift subtitle timing without extra tools
  • Skip unwanted subtitle lines (e.g., lyrics or sound effects)
  • Skip chapters (e.g., openings, previews, credits)
  • Smart metadata extraction (season, episode, title)
  • Logging and progress tracking
  • Generate LRC files from subtitles to use with music players
  • Batch processing of multiple files and directories

Comparison with similar tools

Feature shuku impd Condenser
Platform support
GNU+Linux
macOS
Windows
Output
Condensed audio
Condensed subtitles
Condensed video
Subtitle handling
Support internal and external subs
Fuzzy subtitle search
Language preference settings
Subtitle timing padding
Shift subtitle timing
Skip chapters (credits, opening…)
Skip subtitle lines based on patterns Unlimited patterns One pattern Filters content in parentheses & list of characters
File management
Clean filenames
Custom filename suffix
YouTube video processing
Playlist management
Automatic archiving of old files
User interface
Graphical user interface For user prompts
Command line interface
Configuration
Custom codec
Custom quality
Custom ffmpeg arguments
Can extract without reencoding
Configuration format TOML Key-value JSON
Metadata
Automatic metadata generation
Episode number
Season number
Clean media title
Repository metrics
Resolution time Average time to resolve an issue Average time to resolve an issue Average time to resolve an issue
Open issues Percentage of issues still open Percentage of issues still open Percentage of issues still open
License License License License
Stars Stars Stars Stars
Contributors Contributors Contributors Contributors
Last Commit Last Commit Last Commit Last Commit
Language Top Language Top Language Top Language
Code Coverage Code Coverage Code Coverage Code Coverage

Important

If you feel the comparisons are incomplete or unfair, please file an issue so this page can be improved. Even better, submit a pull request!

Installation

  1. Make sure FFmpeg is installed on your system. You can download it from ffmpeg.org or install it using your package manager:

    # Debian, Linux Mint, Ubuntu…
    sudo apt install ffmpeg
    
    # Arch Linux, Manjaro…
    sudo pacman -S ffmpeg
    
    # macOS with brew (https://brew.sh/)
    brew install ffmpeg

For Windows, here's a good guide.

  1. Install shuku with pipx (recommended) or pip:

    pipx install shuku
    # or
    pip install shuku

Alternatively, download a pre-built binary.

Quick start

If you run:

shuku video.mkv

shuku will create video (condensed).ogg containing only the dialog from the video, next to the original file.

For this to work, video.mp4 must either have internal subtitles, or a matching subtitle file in the same directory.

Configuration

To dump the default configuration, run:

shuku --init

This will create a shuku.toml file in your system's default configuration directory:

  • On Windows: %APPDATA%\shuku\shuku.toml
  • On Unix-like systems (GNU+Linux, macOS):
    • By default: ~/.config/shuku/shuku.toml
    • If XDG_CONFIG_HOME is set: $XDG_CONFIG_HOME/shuku/shuku.toml

The configuration file is self-documented and should be easy to understand.

The options are split into four categories: general options, condensed audio options, condensed video options, and condensed subtitles options.

General options

loglevel

Sets the verbosity of log messages. Only messages of the selected level or higher will be displayed.

Default: 'info' Choices: debug, info, success, warning, error, critical

clean_output_filename

If true, removes quality indicators, release group tags, and other technical information from output filenames. For example, [GROUP] Show.S01E01.1080p.x264-GROUP.mkv becomes Show S01E01.ext (where .ext is .ogg, .srt, etc.).

Default: true.

output_directory

Specifies the directory where output files will be saved. If not set, outputs to the same directory as the input file.

output_suffix

The suffix added to output files.

Default: ' (condensed)'

if_file_exists

What to do when output file exists. Can be:

  • 'ask': Prompt user to overwrite, rename, or skip
  • 'overwrite': Overwrite without prompting
  • 'rename': Automatically rename with timestamp
  • 'skip': Skip without prompting

Default: 'ask'

padding

The number of seconds to add before and after each subtitle timing.

Default: 0.5

subtitle_directory

Specifies a directory to search for external subtitle files. Overridden by the --subtitles command-line argument.

audio_languages

Specifies preferred languages for audio tracks, in order of preference.

Example: ['jpn', 'jp', 'ja', 'eng']

subtitle_languages

Specifies preferred languages for subtitle tracks, in order of preference.

Example: ['jpn', 'jp', 'ja', 'eng']

external_subtitle_search

Determines how external subtitles are matched to video files. disabled turns off external subtitle search, exact requires perfect filename matches, and fuzzy allows for inexact matches.

Default: 'fuzzy'

subtitle_match_threshold

When using fuzzy matching, this sets the minimum similarity score for subtitle files. Lower values allow more lenient matching but risk false positives.

Default: 0.6

skip_chapters

A list of chapter titles to skip when processing. Case-insensitive. Useful for skipping openings, previews, credits…

Default:

skip_chapters = ['avant', '1. opening credits', 'logos/opening credits', 'opening titles', 'opening', 'op', 'ending', 'ed', 'start credit', 'credits', 'end credits', 'end credit', 'closing credits', 'next episode', 'preview', 'avante', 'trailer']

line_skip_patterns

Regular expression patterns for subtitle lines to skip. Useful for removing song lyrics, sound effects, etc. Use single quotes to enclose patterns.

Default:

line_skip_patterns = [
  # Skip music.
  '^(~|〜)?♪.*',
  '^♬(~|〜)$',
  '^♪?(~|〜)♪?$',
  # Skip lines containing only '・~'
  '^・(~|〜)$',
  # Skip lines entirely enclosed in various types of brackets.
  '^\\([^)]*\\)$',  # Parentheses ()
  '^([^)]*)$',  # Full-width parentheses ()
  '^\\[.*\\]$',  # Square brackets []
  '^\\{[^\\}]*\\}$',  # Curly braces {}
  '^<[^>]*>$',  # Angle brackets <>
]

Condensed audio options

enabled

Whether to create condensed audio files.

Default: true

audio_codec

Audio codec for the condensed audio.

Default: 'libopus' Choices: libmp3lame, aac, libopus, flac, pcm_s16le, copy, mp3, wav, opus, ogg

audio_quality

See Audio quality settings for details.

Default: '48k'

custom_ffmpeg_args

Extra FFmpeg arguments to use when creating the final audio.

Example:

custom_ffmpeg_args = {
  "af" = 'loudnorm=I=-16:LRA=6:TP=-1,acompressor=threshold=-12dB:ratio=3:attack=200:release=1000'
}

Condensed video options

enabled

Whether to create condensed video files.

Default: false

audio_codec

Audio codec for the condensed video.

Default: 'copy' Choices: libmp3lame, aac, libopus, flac, pcm_s16le, copy, mp3, wav, opus, ogg

audio_quality

See Audio quality settings for details.

Example: '128k'

video_codec

Video codec to use. 'copy' copies the video stream without re-encoding.

Default: 'copy'

video_quality

Video quality.

Video quality settings

The video_quality setting depends on the chosen video codec:

libx264 / libx265 (H.264 / H.265)

Modern video codecs that offer excellent compression. H.265 generally achieves better compression than H.264 but may take longer to encode.

  • How to set quality: Uses Constant Rate Factor (CRF). Specify a number between 0-51. Lower values = better quality & larger files. A change of ±6 roughly doubles/halves the file size.

  • Examples:

    • '23': Default, good quality
    • '18': Very high quality, visually lossless
    • '28': Acceptable quality, smaller files

libvpx-vp9 (VP9)

An open video codec developed by Google, offering quality comparable to H.265.

  • How to set quality: Uses CRF like H.264/H.265. Specify a number between 0-63. Lower values = better quality & larger files.

  • Examples:

    • '31': Default balanced quality
    • '24': High quality
    • '35': More compression, smaller files

Other codecs

For other video codecs, quality is set using bitrate.

  • How to set quality: Specify the bitrate in bits per second with optional 'k' or 'M' suffix

    • '1000k' or '1M' = 1 Mbps
    • Lower values = lower quality & smaller files
  • Examples:

    • '1M' or '1000k': Medium quality
    • '2M' or '2000k': High quality
    • '500k': Lower quality, smaller file

copy

Copies the video stream without re-encoding. Use this to maintain original quality and for fastest processing.

  • Quality Setting: video_quality is ignored when using copy

custom_ffmpeg_args

Custom FFmpeg arguments for processing the final video.

Example:

custom_ffmpeg_args = { "preset" = 'faster', "crf" = '23', "threads" = '0', "tune" = 'film' }

Condensed subtitles options

enabled

Whether to create condensed subtitle files.

Default: false

format

Output format for condensed subtitles. 'auto' matches the input format.

Default: 'auto' Choices: auto, srt, ass, lrc

Audio quality settings

The audio_quality setting depends on the chosen audio codec:

libopus (ogg, opus) (Default)

A modern audio codec known for high quality and efficiency at low bitrates. The best size/quality ratio. If your media player supports it, don't even think about it.

  • How to set quality: Specify the desired bitrate either as a number representing bits per second (e.g., 48000 for 48 kbps) or using the 'k' suffix (e.g., '48k', '128k')."

  • Examples:

    • '48k' or 48000: Good quality with small file size (default).
    • '128k' or 128000: Higher quality, larger file size.

libmp3lame (mp3)

A widely supported format compatible with most devices.

  • How to set quality:

    • Variable Bitrate (VBR): Use 'V' followed by a number from 0 to 9 (e.g., 'V0', 'V5'). Lower numbers mean better quality.
    • Constant Bitrate (CBR): Specify the bitrate in kilobits per second with 'k' (e.g., '128k', '320k').
  • Examples:

    • 'V3': Good balance between quality and file size.
    • '192k': High-quality CBR MP3.

aac

Offers better sound quality than MP3 at similar bitrates.

  • How to set quality:

    • Variable Bitrate (VBR): Use a single digit from 1 to 5 (e.g., '2', '5'). Higher numbers mean better quality.
    • Bitrate: Specify the bitrate in kilobits per second with 'k' (e.g., '128k').
  • Examples:

    • '2': Good quality with reasonable file size.
    • '128k': Standard quality.

flac

A lossless codec that preserves original audio quality but results in larger files.

  • How to set quality: Specify the compression level from 0 to 12. Higher numbers provide better compression but take longer to encode.

  • Examples:

    • 5: Default compression level, good balance.
    • 12: Maximum compression, slower encoding.

pcm_s16le (wav)

An uncompressed audio format resulting in very large files. The only reason you might want to use it over FLAC is software compatibility.

  • Quality Setting: audio_quality is ignored.

copy

Copies the audio stream without any re-encoding. Use this if you want to keep the original audio as-is.

  • Quality Setting: audio_quality is ignored.

Command line options

--init

Create a default configuration file in the following locations depending on your operating system:

  • Windows: C:\Users\<YourUsername>\AppData\Roaming\shuku\shuku.toml
  • Unix-like systems (GNU+Linux, macOS): ~/.config/shuku/shuku.toml

-c <path>, --config <path>

Path to a configuration file, or "none" to use the default configuration.

Default: shuku.toml in the user config directory (e.g., ~/.config/shuku/shuku.toml).

-s <path>, --subtitles <path>

Path to subtitle file or directory containing subtitle files to match to input videos.

-o <path>, --output <path>

Path to the output directory. If not specified, the input file's directory will be used.

--audio-track-id <id>

ID of the audio track to use. You can use ffprobe {file} to list available tracks.

--sub-track-id <id>

ID of the subtitle track to use.

--sub-delay <ms>

Delay subtitles by <ms> milliseconds. Can be negative.

-v {level}, --loglevel {level}

Set the logging level. Choices: debug, info, success, warning, error, critical.

--log-file <path>

Logs will be written to this file in addition to the terminal.

-h, --help

Show the help message and exit.

-V, --version

Print the version number and exit.

Examples

  1. Process a single video file:

    shuku video.mkv
  2. Process multiple files:

    shuku video1.mp4 path/to/directory_with_videos/
  3. Use a specific subtitle file:

    shuku video.mp4 -s subtitles.srt
  4. Set output directory and logging level:

    shuku video.mkv -o ~/condensed -v debug
  5. Use a specific audio track and apply subtitle delay:

    shuku video.mp4 --audio-track-id 2 --sub-delay 500
  6. Use the default configuration, show all logs and save them to a file:

    shuku file.mkv -c none -v debug --log-file shuku.log

Support

Something not working? Have an idea? Let us know!

Contributing

Please do! We appreciate bug reports, feature or documentation improvements (however minor), feature requests…

Take a look at the Contributing Guidelines to learn more.

License

shuku is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.