impd should automatically choose right internal subs #7

asakura42 · 2023-10-14T23:07:28Z

Long story short. I have a video file with a bunch of internal subs:

impd probe output:

Index  Language  Title                                        Type
0      unknown   Www.SeiresHD.Com                             video
1      spa       unknown                                      audio
2      eng       unknown                                      audio
3      spa       Spanish - (Caption/Normal Size Char)         subtitle
4      eng       English - (Closed Caption/Normal Size Char)  subtitle
5      unknown   unknown                                      subtitle

When I add a video to my collection, it condenses video with 5 subtitle, which is that sub track for songs and other sounds. I think that impd should choose internal subs based on:

Target language
Size of subtitles

So the largest and target subs should be chosen for condensing. What do you think?

The text was updated successfully, but these errors were encountered:

tatsumoto-ren · 2023-10-15T07:18:20Z

Edit your config file and add the following lines:

langs=spa
prefer_internal_subs=yes

asakura42 · 2023-10-15T09:16:44Z

My config:

langs=spanish,spa,esp,lat,cas
prefer_internal_subs=yes
video_dir=/dev/null
bitrate=32k
recent_threshold=10
padding=0.5
line_skip_pattern="^♪〜$|^〜♪$"
filename_skip_pattern="NCOP|NCED"
extract_audio_add_args=()

asakura42 · 2023-10-15T09:23:34Z

Try it yourself with any file from this folder: https://mega.nz/folder/oW8ihKCZ#sHuu63kset-BAn-XqFa7Nw

Condensing doesn't work tho. But it's because of bmp fonts I guess. But that's not critical.

tatsumoto-ren · 2023-10-15T09:49:12Z

I guess you need to manually set what tracks you want because the tracks are incorrectly named.

asakura42 · 2023-10-15T10:24:06Z

language: spa

Where they are incorrectly named?

asakura42 · 2023-10-15T22:56:43Z

For example, here it chooses Forzados subtitle while should choose 4:

Index  Language  Title     Type
0      unknown   unknown   video
1      spa       unknown   audio
2      eng       unknown   audio
3      spa       Forzados  subtitle
4      spa       unknown   subtitle
5      eng       unknown   subtitle

Can you add smth to detect the largest target-language sub track?

tatsumoto-ren · 2023-10-16T13:50:42Z

impd chooses the first track that is:

not a song, caption, commentary, etc.
matches the preferred language

impd/impd

Line 111 in 48535fb

guess_track_priority() {

tatsumoto-ren · 2023-10-16T13:51:57Z

Can you add smth to detect the largest target-language sub track?

Based on the number of symbols used? If so, that is a good idea but I'm not sure if it's easy to do.

asakura42 · 2023-10-16T16:25:40Z

@tatsumoto-ren
You can use smth like:

function subs() {
    mkdir -p /tmp/impd_subs
    movie="${1}"
    filename="${1%.*}"
    mappings=`ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "${movie}"`
    OLDIFS=$IFS
    IFS=,
    ( while read idx lang
    do
        echo "Exctracting ${lang} subtitle #${idx} from ${movie}"
        ffmpeg -nostdin -hide_banner -loglevel quiet -i "${movie}" -map 0:"$idx" /tmp/impd_subs/"${filename}_${lang}_${idx}.srt"
    done <<< "${mappings}" )
    IFS=$OLDIFS
    wc --total=never -l /tmp/impd_subs/*.srt | grep "_spa_" | sort -r | awk -F_ '{print $NF}' | awk -F. '{print $1}' | head -n1
}

This outputs the number of the largest track.

(Main snippet found here: https://gist.github.com/kowalcj0/ae0bdc43018e2718fb75290079b8839a)

asakura42 · 2023-10-16T19:23:04Z

Or much simpler:

while IFS=',' read -r idx lang; do printf "$idx " && ffmpeg -nostdin -hide_banner -loglevel quiet -i "la_directora_S01E02.mkv" -map 0:"$idx" -f srt - | wc -l; done < <(ffprobe -loglevel error -select_streams s -show_entries stream=index:stream_tags=language -of csv=p=0 "la_directora_S01E02.mkv" | grep ",spa") | sort -nrk2,2 | head -n1 | awk '{print $1}'

tatsumoto-ren · 2023-10-16T21:10:48Z

This outputs the number of the largest track.

How fast does it work for a typical episode?

asakura42 · 2023-10-16T22:43:52Z

For 379mb mkv file output of time for this snippet at my old laptop is 0.32s user 0.34s system 108% cpu 0.607 total

tatsumoto-ren · 2023-10-16T23:18:58Z

Alright, if it's not too slow (need to test on anime specifically), you can submit the PR. But you also need to think about the following:

only apply this method to subtitle tracks; audio tracks can be autoselected using the current method only.
filter out (or give lower priority to) commentary tracks since they contain garbage but yet can be longer than normal subtitle tracks
filter out all other garbage tracks (songs, signs, comments) though it's likely that they will be shorter than the normal subtitle tracks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

impd should automatically choose right internal subs #7

impd should automatically choose right internal subs #7

asakura42 commented Oct 14, 2023

tatsumoto-ren commented Oct 15, 2023

asakura42 commented Oct 15, 2023

asakura42 commented Oct 15, 2023

tatsumoto-ren commented Oct 15, 2023

asakura42 commented Oct 15, 2023

asakura42 commented Oct 15, 2023 •

edited

Loading

tatsumoto-ren commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

asakura42 commented Oct 16, 2023

asakura42 commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

asakura42 commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

impd should automatically choose right internal subs #7

impd should automatically choose right internal subs #7

Comments

asakura42 commented Oct 14, 2023

tatsumoto-ren commented Oct 15, 2023

asakura42 commented Oct 15, 2023

asakura42 commented Oct 15, 2023

tatsumoto-ren commented Oct 15, 2023

asakura42 commented Oct 15, 2023

asakura42 commented Oct 15, 2023 • edited Loading

tatsumoto-ren commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

asakura42 commented Oct 16, 2023

asakura42 commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

asakura42 commented Oct 16, 2023

tatsumoto-ren commented Oct 16, 2023

asakura42 commented Oct 15, 2023 •

edited

Loading