Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does not accept METS with OCR-D-style mets:agent provenance #38

Open
bertsky opened this issue Apr 14, 2023 · 5 comments
Open

does not accept METS with OCR-D-style mets:agent provenance #38

bertsky opened this issue Apr 14, 2023 · 5 comments
Labels
enhancement New feature or request
Milestone

Comments

@bertsky
Copy link

bertsky commented Apr 14, 2023

with our typical mets:metsHdr/mets:agent section, e.g. …

    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="OTHER" OTHERROLE="preprocessing/optimization/binarization">
      <mets:name>ocrd-sbb-binarize v0.0.11</mets:name>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="input-file-grp">ORIGINAL</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="output-file-grp">BIN</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="parameter">{"model": "default-2021-03-09", "operation_level": "page"}</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="page-id">PHYS_0075,PHYS_0477,PHYS_0009,PHYS_0007,PHYS_0141,PHYS_0133,PHYS_0324,PHYS_0533,PHYS_0364,PHYS_0068,PHYS_0266,PHYS_0416,PHYS_0204,PHYS_0470,PHYS_0073,PHYS_0475,PHYS_0005,PHYS_0139,PHYS_0421,</mets:note>
    </mets:agent>

…(which is valid by XSD validation in xmllint/xmlstarlet) I get the following exception:

 org.xml.sax.SAXParseException: cvc-type.3.1.1: Element 'mets:note' is a simple type, so it cannot have attributes, excepting those whose namespace name is identical to 'http://www.w3.org/2001/XMLSchema-instance' and whose [local name] is one of 'type', 'nil', 'schemaLocation' or 'noNamespaceSchemaLocation'. However, the attribute, 'ocrd:option' was found.
@bertsky
Copy link
Author

bertsky commented Apr 14, 2023

org.xml.sax.SAXParseException: cvc-type.3.1.1: Element 'mets:note' is a simple type, so it cannot have attributes, excepting those whose namespace name is identical to 'http://www.w3.org/2001/XMLSchema-instance' and whose [local name] is one of 'type', 'nil', 'schemaLocation' or 'noNamespaceSchemaLocation'. However, the attribute, 'ocrd:option' was found.

Perhaps MyCore's METS model is based on an older version of the METS schema? They added xsd:anyAttribute to mets:note only recently (May 2018 / v 1.12).

@M3ssman
Copy link
Member

M3ssman commented Apr 14, 2023

It is not related to an actual METS-version.
Under the hood mets-model uses rather old XML-Libraries: jdom:2.0.6 (2015) and jaxen:1.2.0 (2019), which don't like inline namespace declarations, hence the parser error.

By now it helps probably to register the ocrd-namespace once in the regular prologue at the root and then just refer internally like it was of old. I guess Python's lxml library provides means like lxml.etree.cleanup_namespace for this case.

@bertsky
Copy link
Author

bertsky commented Apr 14, 2023

Oh, that sounds very plausible.

By now it helps probably to register the ocrd-namespace once in the regular prologue at the root and then just refer internally like it was of old.

Yes, perhaps we should generate that correctly in core to begin with.

I guess Python's lxml library provides means like lxml.etree.cleanup_namespace for this case.

Interesting. This does work, you just need to pass it a kwarg top_nsmap={"ocrd": "https://ocr-d.de"}.

@M3ssman M3ssman pinned this issue Jul 27, 2023
@M3ssman
Copy link
Member

M3ssman commented Aug 3, 2023

@bertsky Would you mind to re-tackle this as brand new issue on mets-model project, which does the actual METS-handling?

@bertsky
Copy link
Author

bertsky commented Aug 16, 2023

@bertsky Would you mind to re-tackle this as brand new issue on mets-model project, which does the actual METS-handling?

I'd rather solve it in core.

@M3ssman M3ssman added this to the 2.x.x milestone Nov 1, 2024
@M3ssman M3ssman added the enhancement New feature or request label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants