-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download subtitles from videos that have embedded subtitles and option to render it in video #948
Comments
The subtitles are not in the TS video, no idea where you got that from. A different thing is that the subs are video encoded inside the stream, but that's up to the streamer using OBS or other software. Baking subs inside a video requires more knowledge that it look like, that's usually a whole different matter. If you mean to extract the chat from the video you need some OCR but it's still alpha software. |
Someone already requested including subtitles in #750, and I made some very minor progress on it.
The subtitles are stored in the TS video chunks. My initial implementation used FFmpeg to extract the subtitles which is extremely slow (~1.6s per 10 seconds of video). I felt this was unacceptable performance so I wanted to make a custom solution that is significantly faster, however other bug reports took priority at the time. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
There is free plugin for OBS (it's also ONLY one plugin) that will send text data inside videostream to Twitch. The player in Twitch support the CC inside video and will show it. There are other CC solutions, but they are not inside video, but send over second text stream. Those CC are done with browser that is open by streamer side. They will not show up in VOD (means to watch video back) so they wont work. Cannot watched later back because Twitch don't save seperate text stream. But like other say @ScrubN , it's inside TS video so it can scanned for it and generate first SRT file based on time stamps, and as option it can encoded with ffmpeg inside mp4 video OR make seperated SRT file. It's not impossible, some videoplayers even support it. We need make correct code to extract text data from TS video. Edit: the plugin is this one: https://github.com/ratwithacompiler/OBS-captions-plugin |
So the ID https://www.twitch.tv/videos/1980035805 has embedded subs in the TS? I'll check it I didn't know that was possible. |
This comment was marked as outdated.
This comment was marked as outdated.
See this earlier comment #948 (comment). I still want to implement custom caption extraction (if possible) because FFmpeg is so slow. |
Even it's slow, it's always welcome. I am happy to extract something. I will search around for other solutions, so i let you know soon, otherwise we have @ScrubN option. All my latest twitch streams (on MrDummy_NL channel) have CC video inside video. (2024 videos) so you can test on mine. My friend MarukaKou is using it too with CC in video lately, so you can also use his VODs to test it, Update: there is tool on github: https://github.com/kanongil/telxcc and it's also used in other tools like CCExtrator. Seems you can use it. Should nice to link to some 3rd party programs and run it when TS file is completed - or - use github code and ofc credit the creator with it. |
@MrDummyNL did you try any of these if actually work with the twitch sample? Some of these programs are 10 years old. |
None of them work, I've reported the issue to all of them. |
telxcc is written in C and provides no prebuilt binaries, and I really don't want to deal with makefiles or linking on linux. Also @superbonaci, you cannot extract the subtitles from the concatenated TS file because it is produced by concatenating the raw bytes from all of the parts together. It's honestly a miracle to me that FFmpeg can read the concatenated file because of how f-ed the metadata keys probably are. |
@ScrubN HandBrake is able to detect the Twitch subtitles as Here's the video: https://www.twitch.tv/videos/1923916260 If you save it as mkv and choose not to Burn into video, you can choose or not as subtitle track from VLC: What I don't know is if HandBrake used some external command to do it or is all built in to it, but it works great. As I said the other commands don't work: telxcc, subtitleedit, ts-cc-extractor. |
Oh strange. I was only able to detect the subtitles from the individual parts with both FFmpeg and MPV. It seems that VLC also detects the subtitles from the merged video though. Again, this is probably a result of their parsers being overly forgiving to non-standard/corrupted data. |
I'm not sure there is any corrupt data actually, because Video DownloadHelper downloads the same m2ts file as the merged with TwitchDownloader, so it must be correct. Yes VLC shows several subs tracks but only one works (or there's only one). I'll have to report the issue to VLC and see what they say. |
Any progress here about extracting CC part from TS videos? |
Sorry, I have been taking a break from TD to work on a private project with another developer. I have cleaned up a subtitle scanner I had written some time ago and committed it to a draft PR for transparency. Hopefully it should not take too long to finish and get into a working state. |
That is great news! Thank you to make it soon possible! |
I'm sorry for the wait. I was having a bit of a hard time when I learned that by changing how we concatenate the downloaded parts, the subtitles can be naturally preserved without any extra work. The only issue is that I need to rewrite how trimming is handled, so it may take a little while. More good news though, this alternative approach will make video finalization MUCH faster and possibly also fix some other issues. |
Good news:
Bad news:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
I'm really annoyed right now. The ffmpeg command I was using that was extracting the subtitles is no longer working. Handbrake does recognize the subtitles, but annoyingly it only lets me burn in the subtitles, not export them. I might actually need to write a custom subtitles parser and I'm not very happy about it. |
@MrDummyNL you said you currently extract the subtitles from the download cache. What tool(s) do you use to do that? |
Checklist
Write your feature request here
More streamers are using the subtitles that are send with the video. This is already possible and i use it too. Some of my friends are using it too. This is why it will shame if subtitle data cannot stored in SRT file.
That is why i ask you to add the function to read subtitle data from TS video (which is logical step because you can only read it with full TS video, then you can render it. Optionally, you can render subtitles together with video. So you have 2 options:
I know tools exists to bake subtitles in the video. Should not too hard.
I might save some interesting vods, but subtitle data will lost here. That is kinda sad.
Can you make it possible? Thanks in advance.
The text was updated successfully, but these errors were encountered: