Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a validation step to the workflow ensuring proper XML records and HTML views #12

Open
SvenLieber opened this issue Jul 31, 2021 · 0 comments

Comments

@SvenLieber
Copy link
Contributor

Currently Walder is used to provide XML-based exports per collection and to provide HTML views on collections (pending commit, internal mockup). These access-means are based on queries on our Knowledge Graph and thus rely on certain data qualities.

Constraints can be expressed in SHACL or ShEx and a validation step can be used to verify that the Knowledge Graph fulfills needed quality. Such a validation step could be included in our workflow such that the workflow looks (more or less) like the following when receiving a warc_created message:

  • map item-level data of the newly created WARC file
  • execute SPARQL CONSTRUCT/INSERT queries to create collection-level-related aggregations based on item-level data
  • execute a validation process with use case specific constraints (e.g. based on HTML views or XML record exports)
  • In case of invalid data, query the validation report to obtain information about what needs to be fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant