Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [EIA-608] Can't extract subtitles in TS file #1626

Open
superbonaci opened this issue Jul 12, 2024 · 1 comment
Open

[BUG] [EIA-608] Can't extract subtitles in TS file #1626

superbonaci opened this issue Jul 12, 2024 · 1 comment

Comments

@superbonaci
Copy link
Contributor

Build info, mac Silicon:

ccextractor --version
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
CCExtractor detailed version info
	Version: 0.94
	Git commit: f12f12b9165c83d59e6ec43876ae8701dd354cf6
	Compilation date: 2024-07-12
	CEA-708 decoder: C
	File SHA256: Could not open file
Libraries used by CCExtractor
	Tesseract Version: 5.4.1
	Leptonica Version: leptonica-1.84.1
	libGPAC Version: 2.4
	zlib: 1.2.11
	utf8proc Version: 2.4.0
	protobuf-c Version: 1.3.1
	libpng Version: 1.6.37
	FreeType
	libhash
	nuklear
	libzvbi

Sample, only 1 subs track in English:

ccextractor 600_1080p60.ts -o output.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: 600_1080p60.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: 600_1080p60.ts
Detected MP4 box with name: meta
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 60.000000

Found large gap(734085) in PTS! Trying to recover ...
Error: Broken AVC stream - Leading bytes are non-zero...
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

6 samples, same exact video but different resolutions and fps: 600_all.zip

@superbonaci superbonaci changed the title [BUG] [EIA-608] Can't extract subtitles [BUG] [EIA-608] Can't extract subtitles in TS file Jul 12, 2024
@Z-xus
Copy link

Z-xus commented Nov 30, 2024

Re-encoding the video reduced the gaps, can someone give me directions to move further?

ffmpeg -i ~/Downloads/600_/600_1080p60.ts -c:v libx264 -c:a aac -strict -2 ~/Downloads/600_/fixed_output_reencoded.ts

ccextractor ~/Downloads/600_/fixed_output_reencoded.ts -o output.srt --fixpadding --fixptsjumps (works just as fine without padding n pts jump options)


CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: /home/neon/Downloads/600_/fixed_output_reencoded.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
[Tesseract PSM: 3]

-----------------------------------------------------------------
Opening file: /home/neon/Downloads/600_/fixed_output_reencoded.ts
Detected MP4 box with name: skip
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode

Found large gap(86) in PTS! Trying to recover ...
  3%  |  00:00
Found large gap(90) in PTS! Trying to recover ...

Found large gap(88) in PTS! Trying to recover ...

Found large gap(87) in PTS! Trying to recover ...

Found large gap(89) in PTS! Trying to recover ...
100%  |  00:09
Number of NAL_type_7: 3
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 2
Number of num_unexpected_sei_length: 0

Total frames time:        00:00:20:020  (600 frames at 29.97fps)

Min PTS:                                00:00:01:433
Max PTS:                                00:00:11:450
Length:                          00:00:10:017
Done, processing time = 0 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

output.srt:

1
00:00:00,333 --> 00:00:06,332
congratulations to papa plot and
their community for winning ■k71
million in a TV contest through 

2
00:00:06,334 --> 00:00:09,998
their community for winning ■k71
million in a TV contest through 
Twitch. I mean, just wild. Um   


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants