Quality improvement by searching for common OCR errors (transferred from OL) #97

RayBB · 2023-10-07T10:13:55Z

Original text from internetarchive/openlibrary#810:

Sorry if this is out of place, but I just stumbled across an oddity. It appears that the Google-digitized non-English editions have some habitual problems in the OCR which shows up in the boilerplate they inserted.

For instance, Googling: "carcfully scannod" site:archive.org
turns up 46,900 results, most of which are scanned from texts in languages that use diacritics. That can't be a coincidence. I'm wondering if it can be put to use for quality improvement. Might they just need a fresh run through OCR with more modern software?

More discussion in the thread.

RayBB mentioned this issue Oct 7, 2023

Quality improvement by searching for common OCR errors internetarchive/openlibrary#810

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality improvement by searching for common OCR errors (transferred from OL) #97

Quality improvement by searching for common OCR errors (transferred from OL) #97

RayBB commented Oct 7, 2023

Quality improvement by searching for common OCR errors (transferred from OL) #97

Quality improvement by searching for common OCR errors (transferred from OL) #97

Comments

RayBB commented Oct 7, 2023