You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there !
I found out that a GT section was added while I was tempted to create my own awesome list.
One thing that I think would be great is categorizing a little more this section (Manuscript / Early print / Modern / Contemporaneous ?). That would probably be a better way to browse these data.
The text was updated successfully, but these errors were encountered:
We're working on a project making open-source OCR readily deployable in libraries, archives etc. (https://github.com/OCR-D / http://ocr-d.de). An important part of highly accurate OCR esp. for historical texts is training, for training one needs the right ground truth and for it all to work together one needs to describe the ground truth itself, the corpora, the tools etc. in a structured way.
Therefore, we want to define a JSON schema for describing ground truth and restructure the list/table into a JSON file, c.f. cneud/ocr-gt#11. This should be aligned with OCR-D/spec#86 where we want to define schemas for both training data and trained models. Ideally, all inputs and outputs of individual steps in an OCR workflow would be data defined by such a schema.
Hi there !
I found out that a GT section was added while I was tempted to create my own awesome list.
One thing that I think would be great is categorizing a little more this section (Manuscript / Early print / Modern / Contemporaneous ?). That would probably be a better way to browse these data.
The text was updated successfully, but these errors were encountered: