Order the ground truth section by type ? #103

PonteIneptique · 2019-01-08T07:26:46Z

Hi there !
I found out that a GT section was added while I was tempted to create my own awesome list.
One thing that I think would be great is categorizing a little more this section (Manuscript / Early print / Modern / Contemporaneous ?). That would probably be a better way to browse these data.

kba · 2019-01-08T08:27:50Z

The list is @cneud's work and it's maintained at https://github.com/cneud/ocr-gt.

We're working on a project making open-source OCR readily deployable in libraries, archives etc. (https://github.com/OCR-D / http://ocr-d.de). An important part of highly accurate OCR esp. for historical texts is training, for training one needs the right ground truth and for it all to work together one needs to describe the ground truth itself, the corpora, the tools etc. in a structured way.

Therefore, we want to define a JSON schema for describing ground truth and restructure the list/table into a JSON file, c.f. cneud/ocr-gt#11. This should be aligned with OCR-D/spec#86 where we want to define schemas for both training data and trained models. Ideally, all inputs and outputs of individual steps in an OCR workflow would be data defined by such a schema.

@mittagessen has been describing the models for his kraken OCR engine in such a way for a while.

@wrznr @tboenig @cneud

kba mentioned this issue Jan 8, 2019

Add few Ground Truth Repositories #102

Merged

cneud mentioned this issue Jan 18, 2019

Make this a JSON file? cneud/ocr-gt#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order the ground truth section by type ? #103

Order the ground truth section by type ? #103

PonteIneptique commented Jan 8, 2019

kba commented Jan 8, 2019

Order the ground truth section by type ? #103

Order the ground truth section by type ? #103

Comments

PonteIneptique commented Jan 8, 2019

kba commented Jan 8, 2019