-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added transcription helpers for extracting text from a canvas #15
Conversation
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
At the moment, we are losing track of the Annotation target when parsing. It will very likely be the Canvas, but it could be
And clients might need to check when they are providing navigation using the selector that it's got the right target. |
Also need to pass in a language, so that the transcription can check for choices structured like this: |
This still needs more testing, will leave open. |
Transcription helper.
Will find the following transcriptions:
Cookbook:
Plaintext rendering on canvas:
VTT annotation body on AV canvases:
OCR annotations:
OR
Linking Directly to an ALTO File. (FUTURE, NOT IMPLEMENTED)
It will produce a standard format for both temporal and plaintext/positional plaintext, including selectors.
ParsedSelector
include spatial and temporal information. Either from an annotation or from VTT (very simple parsing at the moment - external libraries for it are heavy). If there is just plaintext by itself, then there are no segments.A viewer could start with just showing the plaintext, and then implement optional segments later.
Some new helpers too:
canvasHasTranscriptionSync()
- checks if there is a transcription on a canvas without making any network requestscanvasLoadExternalAnnotationPages()
loads and waits for external Annotation PagesannotationPageToTranscription()
- actual code for fetching the transcription - will also fetch all annotation pages. Recommended to use with Vault (to avoid multiple requests).