Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make lobid-alma data valid against JSON schema #1340

Closed
acka47 opened this issue May 24, 2022 · 16 comments
Closed

Make lobid-alma data valid against JSON schema #1340

acka47 opened this issue May 24, 2022 · 16 comments

Comments

@acka47
Copy link
Contributor

acka47 commented May 24, 2022

https://gist.githubusercontent.com/TobiasNx/007a32d61457dc57e353c5f1cd97a5e0/raw/4e9f525f114c0ab06279d28b2c70854cc5c6cee8/validationError.txt

This is an list of the errors of the test data.

@TobiasNx
Copy link
Contributor

TobiasNx commented May 27, 2022

I spottet two errors in the validation process with the test data after running the updated script (#1344 ):

alma/(DE-605)TT050421649.json failed test
[
  {
    instancePath: '/describedBy/resultOf/endTime',
    schemaPath: 'describedBy.json/properties/resultOf/properties/endTime/pattern',
    keyword: 'pattern',
    params: {
      pattern: '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2})'
    },
    message: 'must match pattern "(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2})"'
  }
]
alma/(OCoLC)945571548.json failed test
[
  {
    instancePath: '/contribution/0',
    schemaPath: 'contribution.json/items/required',
    keyword: 'required',
    params: { missingProperty: 'role' },
    message: "must have required property 'role'"
  }
]

The first error is with every record the second is only with the one so far.

@TobiasNx
Copy link
Contributor

@dr0i /describedBy/resultOf/endTime is created correctly later in the process (when indexing?) in the transformation process it self there is only "dummi" as value added. Therefore it breaks. Any way that we still could validate these?

@dr0i
Copy link
Member

dr0i commented May 27, 2022

That's a "feature" as the test files, at whatever date created, comparable. It might be worth to think about using a valid dummy pattern, e.g. 0000-00-00T00:00:00 . Or you could expand the validator to allow "dummy".

@TobiasNx
Copy link
Contributor

0000-00-00T00:00:00

+1 for that

dr0i added a commit that referenced this issue May 27, 2022
dr0i added a commit that referenced this issue May 27, 2022
@dr0i
Copy link
Member

dr0i commented May 27, 2022

Should be fine now. Closing.

@dr0i dr0i closed this as completed May 27, 2022
@TobiasNx
Copy link
Contributor

Again I run this with the updates from #1344:

$ bash ./validateJsonTestFiles.sh  
Testing version: draft
strict mode: "items" is 1-tuple, but minItems or maxItems/additionalItems are not specified or different at path "type.json"
alma/(CKB)5280000000199164.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT000161712.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT000312236.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT003176544.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT004285445.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT005207972.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT006855611.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT012734833.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT012734884.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT015011399.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT015671602.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT016433929.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT016709661.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017015300.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017398609.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017411546.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017664407.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019075404.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019246898.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019631849.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020202475.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020391499.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020936481.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)TT003907920.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)TT050421649.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(OCoLC)945571548.json failed test
[
  {
    instancePath: '/contribution/0',
    schemaPath: 'contribution.json/items/required',
    keyword: 'required',
    params: { missingProperty: 'role' },
    message: "must have required property 'role'"
  }
]
Test FAILED

Still errors

@TobiasNx TobiasNx reopened this May 27, 2022
@TobiasNx
Copy link
Contributor

The item error is due to #1177

@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 8, 2022

List of things that do not validate:

  • /hasItem/*/type/* => expects item but due to POR and HOL it also provides other types.
  • oclcNumber => needs to be an array
  • /publication/*/location' => needs to be an array
  • /publication/*/publishedBy => needs to be an array
  • /subject/*/componentList/1 => need type element
  • subject/* => needs source element
  • /subject/*/type/* => value must be Concept
  • /isPartOf/*/type => needs to be an array
  • /contribution/* = requires role

these are the errors that appear while transforming via morph.

TobiasNx added a commit that referenced this issue Aug 10, 2022
This is to conform to the schema. But it is a workaround since all special items need to be remodeld #1373
TobiasNx added a commit that referenced this issue Aug 10, 2022
This conforms to the lobid schema.
TobiasNx added a commit that referenced this issue Aug 10, 2022
This resulted in invalid json files with regard to the schema
@TobiasNx
Copy link
Contributor

TobiasNx commented Aug 12, 2022

At the moment there are three schema problems left:

  1. HT017664407 -> only has no type besides BibliographicRessource, schema requires at least 2 types. Was Periodical in the old transformation but cannot unambiguously be identified as "Periodical" should have another look.

  2. subjects that are no "Concepts" and no "ComplexSubject" are typed as "Keyword" this is unvalid, how can we proceed with that

  3. hasItem is at the moment created from the specific publishing profil elements in a record (MNG, HOL, ITM, POR, etc.) the object itself is typed as Item and the marc-element name. We need to remodel this Distinguish Portfolio Items from other Items #1177 and Properly model MBD, POR, H52, ITM, etc. #1373

TobiasNx added a commit that referenced this issue Aug 12, 2022
Was in the subelement for agent.
@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 5, 2024

Pathes that have invalid data after fixing describedBy (#2025 ) now:

/hasItem/*/type/*
/publication/*/publishedBy
/spatial/*/focus/geo/lat
/spatial/*/focus/geo/lon
/spatial/*/source/id
/subject/*
/subject/*/componentList/*
/subject/*/source
/subject/*/source/id
/subject/*/type/*

Spatial source needs also to allow rpb spatial.
lat lon cannot be numbers since MF only produces strings, not sure how to handle this

publishedBy seems to be due to an faulty mapping: #2011 (comment)

@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 7, 2024

Concerning the missing subject labels for notations, we should ask ourselves if we drop label as mandatory for skosConcepts or introduce a third type of subject: notations which only need the notation, or we use the notation as fallback label if no label is provided.

TobiasNx added a commit that referenced this issue Aug 2, 2024
This is to allow institutions that are not specified by an ISIL as ebsco or proquest but have OCLC or ALMA institution codes. See
#1438
TobiasNx added a commit that referenced this issue Aug 5, 2024
To validate RPB spatial info.
@TobiasNx
Copy link
Contributor

TobiasNx commented Aug 5, 2024

I had a look at it again added a PR #2040 for spatial.
We have then three PR that need a review by @acka47 :

#2027 (hasItem)
#2025 (describedBy)
#2040 (spatial)

I also saw that ajv is waring about the type schema:
strict mode: "items" is 1-tuple, but minItems or maxItems/additionalItems are not specified or different at path "type.json"

Additionally the decision for: (#1340 (comment))

Concerning the missing subject labels for notations, we should ask ourselves if we drop label as mandatory for skosConcepts or introduce a third type of subject: notations which only need the notation, or we use the notation as fallback label if no label is provided.

is still open. After that the schema would be validate

maipet added a commit that referenced this issue Aug 6, 2024
Schema: Adjust hasItem: loose constant value for type property #1340
@TobiasNx
Copy link
Contributor

TobiasNx commented Aug 8, 2024

Additionally the decision for: (#1340 (comment))

Concerning the missing subject labels for notations, we should ask ourselves if we drop label as mandatory for skosConcepts or introduce a third type of subject: notations which only need the notation, or we use the notation as fallback label if no label is provided.

is still open. After that the schema would be validate

Talked to @acka47 off board. We decided to introduce a mechanism that either requires a label and Id with option for notation, or an notation with option for label and Id so that the connection label and Id is always enforced. (Typing this I am not sure about DDC if this is succifient.)

TobiasNx added a commit that referenced this issue Aug 8, 2024
@TobiasNx TobiasNx moved this from Done to Review in lobid-resources Aug 12, 2024
@acka47 acka47 moved this from Review to Working in lobid-resources Aug 12, 2024
@acka47 acka47 moved this from Working to Review in lobid-resources Aug 12, 2024
@TobiasNx
Copy link
Contributor

Finally done, closing. Next step: #1339

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Done
Development

No branches or pull requests

4 participants