-
Notifications
You must be signed in to change notification settings - Fork 10
Make PDFs screen reader friendly with OCR
Cristos L-C edited this page Jun 25, 2020
·
2 revisions
- Purpose
- Pre-Requisites
- Initial Setup
- Converting PDFs Manually
- Converting and Watermarking PDFs Automatically
- Further Improvements
When you're lucky enough to find a town or city that has their sample ballot published, it may still be missing the text layer (OCR) that makes it accessible to tools like screen readers. You can easily convert such PDFs (or raw JPG, PNG, or TIFF images) to an accessible format with some free command-line tools.
This guide assumes you are working on a computer running macOS Mojave or later. The process is similar for Linux computers, but initial setup will be different. Tool availability on Windows computers has not been confirmed.
- Install Homebrew by following the instructions at https://brew.sh/
- Launch the Terminal app to access the command line
- Run the following command:
brew update && brew cask install homebrew/cask-versions/adoptopenjdk8 && brew install pdftk-java ocrmypdf
- If you want to add a "Sample" watermark overlay on your ballots, download the sample.pdf file.
- For each PDF you want to convert, run the following command in Terminal:
ocrmypdf --output-type pdf --sidecar --remove-background --deskew --remove-vectors "ORIGINALFILE.pdf" "DESTINATIONFILE.pdf"
- To add a "Sample" watermark overlaid on the ballot, run the following command:
pdftk "DESTINATIONFILE.pdf" stamp sample.pdf output "DESTINATIONFILE_WATERMARKED.pdf"
- NOTE: You cannot use the same filename as both the input and output for the watermark command. You must make sure that the filename after
output
is different from the filename afterpdftk
in the above command.
- NOTE: You cannot use the same filename as both the input and output for the watermark command. You must make sure that the filename after
To simplify the process, you can use a bash script that performs both OCR conversion and watermarking for you.
- Download ocr-and-watermark.sh
- In Terminal, locate the downloaded file and type
sudo chmod +x ocr-and-watermark.sh
. - To use it, type in Terminal:
./ocr-and-watermark.sh "SOURCEFILE.pdf" "DESTINATIONFILE.pdf"
- If you have a large number of files to convert, you can use a bash script to loop through all the files and perform the OCR and/or watermarking operations.