Skip to content

Convert bitmap subtitles into SubRip format using the macOS OCR engine

License

Notifications You must be signed in to change notification settings

ecdye/macSubtitleOCR

Repository files navigation

macSubtitleOCR

License CodeQL Build Lint

Overview

macSubtitleOCR is a tool written entirely in Swift that converts bitmap subtitles into the SubRip subtitle format (SRT) using Optical Character Recognition (OCR). It currently supports both PGS and VobSub bitmap subtitles. The tool utilizes the built-in macOS OCR engine, offering highly accurate text recognition.

For more details on performance, refer to the Accuracy section below.

Features

  • Export .png images of subtitles for manual correction of OCR output.
  • Use the macOS OCR engine's language recognition feature to enhance accuracy by validating character sequences as real words.
  • Export raw JSON output from the OCR engine for further analysis.

Supported Formats

  • PGS (.mkv, .sup)
  • VobSub (.sub, .idx)

Building the Project

Important

This project requires Swift 6 to compile and run correctly.

To build macSubtitleOCR, follow these steps:

git clone https://github.com/ecdye/macSubtitleOCR
cd macSubtitleOCR
swift build

The compiled build will be available in the .build/debug directory.

Running Tests

The testing process compares OCR output against known correct results. We aim for at least 95% accuracy, because slight differences may occur between machines.

swift test

Accuracy

In tests comparing macSubtitleOCR with the Tesseract OCR engine, the macOS OCR engine often outperforms Tesseract, particularly with challenging cases like the letter 'I'. While methods like binary image comparison, used by tools such as SubtitleEdit, may offer slightly better accuracy in some cases, the macOS OCR engine provides excellent results for most use cases.

Contribution and TODO

For information on how to contribute to the project, please refer to CONTRIBUTING.md.

If you're interested in working on specific features or improvements, check out issues tagged as enhancements.

References

About

Convert bitmap subtitles into SubRip format using the macOS OCR engine

Topics

Resources

License

Stars

Watchers

Forks

Languages