New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

As an NLP expert, I want to assess the OCR quality of the pages in the test set so that I can offer a data-based recommendation on whether to re-OCR certain volumes. #21

Open

2 of 3 tasks

mnaydan opened this issue Apr 23, 2024 · 0 comments

Assignees

Collaborator

mnaydan commented Apr 23, 2024 •

edited by laurejt

Loading

Outcome is a rough estimate/percentage of quality for Gale texts and HathiTrust texts.

Create PPA page-level OCR quality evaluation #23
Augment PPA Character Stats with document frequency #24
Similarity measurements for example poems in the pages test set

mnaydan added this to the compile clean, accurate, complete full-text corpus milestone

mnaydan assigned laurejt

mnaydan added this to Iteration Planning Board

mnaydan moved this to Todo in Iteration Planning Board

laurejt moved this from Todo to In Progress in Iteration Planning Board

laurejt moved this from In Progress to Done in Iteration Planning Board

mnaydan removed this from the compile clean, accurate, complete full-text corpus milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment