Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to find corresponding image from an hocr result? #1

Open
zuphilip opened this issue Jul 29, 2017 · 2 comments
Open

How to find corresponding image from an hocr result? #1

zuphilip opened this issue Jul 29, 2017 · 2 comments

Comments

@zuphilip
Copy link
Member

I have found something interesting here https://digi.bib.uni-mannheim.de/periodika/reichsanzeiger/ocr/film/tesseract-4.0.0-alpha.20170703/012-9419/0580.hocr and would like to see the corresponding image. How can I find it?

@stweil
Copy link
Member

stweil commented Jul 29, 2017

Get the microfilm number 012-9419 and the image number 0580 from the URL and use it in the viewer URL:

The correct image link should be offered by the search interface in the future.

@stweil
Copy link
Member

stweil commented Jul 29, 2017

Maybe the hOCR can be modified on the server side on the fly when it is requested by a web client:

A program could look up metadata in the database (date, issue, page number) and add it to the HTML answer (title tag, time information). Then it could add an image link, maybe also links for other visualisations (like hocrjs). The same program could also do post OCR and fix known OCR errors. That process would preserve the original OCR results, deliver the best post OCR available and preserve disk space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants