Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow User to improve OCR text #887

Open
stefanCCS opened this issue Jan 18, 2023 · 4 comments
Open

Allow User to improve OCR text #887

stefanCCS opened this issue Jan 18, 2023 · 4 comments
Labels
⚙ feature A new feature or enhancement.

Comments

@stefanCCS
Copy link

stefanCCS commented Jan 18, 2023

Description

Basic idea is, that a User for KITODO.PRESENTATION can improve the OCR text.
To do this, a kind of editing dialog is needed, where the User can edit the text.
To make the further processing more easily, it might be a good idea, to allow editing only line-by-line,
so that it is always known, to which line (ALTO-TextLine) an editing belongs to.
As minimum requirement this editing is somehow stored (e.g. a file referencing document, ALTO-TextLine, old Text, new Text).
This file can be used to merge back to original archive, repository or similar (and change according ALTO-File there) and also to update KITODO.PRESENTATION data from time to time.
As additional requirements

  • it would be good, if this editing is behind a kind of "login barrier"
  • it would be good, if this change is somehow immediately visible in KITODO.PRESENTATION for the editing User or even for all Users.
  • it would be good, if a kind of workflow can be established, in the meaning, that this change is somehow reviewed before releasing it to update the repository.

Expected Benefits of this Development

This would bring KITODO.PRESENTATION to a status where it can compete with commercial presentation tools.

Estimated Costs and Complexity

I cannot estimate the effort/cost.

@stefanCCS stefanCCS added the ⭐ development fund 2022 A candidate for the Kitodo e.V. development fund. label Jan 18, 2023
@sebastian-meyer sebastian-meyer added ⭐ development fund 2023 A candidate for the Kitodo e.V. development fund. and removed ⭐ development fund 2022 A candidate for the Kitodo e.V. development fund. labels Jan 18, 2023
@sebastian-meyer sebastian-meyer changed the title [FUND] Allow User to improve OCR text. [FUND] Allow User to improve OCR text Feb 7, 2023
@sebastian-meyer
Copy link
Member

Currently there is a funded project at UB Mannheim and SLUB Dresden with the goal of integrating Kitodo with OCR-D. Part of the project is a tool for the DFG-Viewer/Kitodo.Presentation by which users can give feedback on OCR-processed text. I am not sure if this includes a basic workflow for corrections.
Maybe @stweil can give some insights into the project's goals regarding this feature request?

@stweil
Copy link
Member

stweil commented Feb 7, 2023

I already had the same idea. For the frontend it might be sufficient to allow editing in the existing text view (that's a simple change) and add a submit button which sends the updated text back to the provider where it can be processed further.

At least for smaller changes line matching would still be possible, and the provider could decide how to review and integrate the corrections. If the changes are stored in a local Git repository on the provider side, the presentation could be modified to select among the different revisions. That would also allow updates with new OCR results by the provider.

@sebastian-meyer sebastian-meyer added the ⚙ feature A new feature or enhancement. label Mar 20, 2023
@sebastian-meyer
Copy link
Member

Votes: 2

@stweil
Copy link
Member

stweil commented Mar 24, 2023

@sebastian-meyer sebastian-meyer changed the title [FUND] Allow User to improve OCR text Allow User to improve OCR text Jul 21, 2023
@sebastian-meyer sebastian-meyer removed the ⭐ development fund 2023 A candidate for the Kitodo e.V. development fund. label Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙ feature A new feature or enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants