feat/OCR: extract english texts from images #11

cindyli · 2023-06-07T18:54:58Z

Description

Add an utility script that extracts English texts from images. This script will be part of the process to extract Bliss data from the archive website.

Steps to test

Click a publication on the archive website. In the "Download options" section, click to download the format "SINGLE PAGE PROCESSED JP2 ZIP". Unzip the file on a local computer. Place its content, a bunch of JP2 images, in a directory.
Follow the instruction in README to run the utility script.

Expected behavior:

English texts in each image should be extracted and saved as a txt file in the same directory.

cindyli added 2 commits June 7, 2023 14:33

feat/OCR: extract english texts from images

47c0e66

chore: update documentation

a2c9806

cindyli requested review from agamba and klown June 7, 2023 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/OCR: extract english texts from images #11

feat/OCR: extract english texts from images #11

cindyli commented Jun 7, 2023

feat/OCR: extract english texts from images #11

Are you sure you want to change the base?

feat/OCR: extract english texts from images #11

Conversation

cindyli commented Jun 7, 2023

Description

Steps to test