Skip to content

Inputhons

Piotr Banski edited this page Jul 10, 2023 · 26 revisions

"Inputhon" is our super-fancy name for a type of a hackathon where the persons responsible for a centre's recommendations for data deposition formats meet for an hour in order to prepare or update their centre's content for the SIS.

Please note: the content of this page is rather sketchy as of May 2023. We're still preparing to hold a pilot inputhon at the IDS before we recommend the format to other centres. Any feedback, at any point, is very welcome -- click to open a new github issue.

TL;DR

The goal is to (ideally) end the event with a submission of a pull request against one of the files in https://github.com/clarin-eric/standards/tree/formats/SIS/clarin/data/recommendations (note that it's not the master branch).

Post-event, the centre can either

  • point its users to the SIS (recommended, because of the data aggregation that happens there), or else
  • re-use the same data (note: you don't want to maintain two copies of recommendations, do you) by pulling them out of the SIS via its API (an example is supplied; essentially, you just need to style the data according to your site's make-up).

Motivation

For CLARIN B-centres which need to undergo (re-)certification,

  • storing format recommendations in the SIS satisfies the relevant CoreTrustSeal recommendation (see section 8 (R08, "Deposit & Appraisal") of the Extended Guidance), which checks, a.o., whether the repository offers a list of preferred formats.
  • Incidentally, two bullets down, R08 asks about info on "the approach towards digital objects that are deposited in non-preferred formats" -- that information can also be provided by the SIS, both in the general section describing the centre and/or in comments on formats, especially those labelled as "discouraged" (="non-preferred", in CTS lingo).

For other centres/repositories, storing the information is a way to:

  • get that done in a uniform format, and based either on a clean template or on examples provided in the recommendations by other centres;

Preparation

Give us a heads-up

These steps are optional but advisable. If they seem like too much time investment, skip them. But we would appreciate if you could go via pull requests, also for the sake of keeping track of the project's history.

  • tell us about the intention to hold an inputhon, so that we can make sure that the centre is represented in the system, and that at least a skeletal recommendations file for it exists
    • we can then also at least try to make ourselves available for consultation over zoom, etc.

Get the SIS

  • fork the SIS, clone your own repo instance, install eXist and the SIS
  • you might want to integrate that new DB instance with your oXygen (yes, there's a lot of assumptions here), because then you will be able to visualise your changes just by dragging the recommendations file from oXygen's project panel to the DB connection panel (and refreshing the SIS in the browser).

Get the XML document describing the recommendations for your centre

The native GitHub way, if you've forked/cloned

The workaround by exporting centre data

Execution

  • Have a look at the data domains, see which of them correspond to the functions of the data that your centre is ready to receive
  • For each of the selected domains, decide which formats are recommended and how (that is,
    • if the centre wishes to receive data in that format, it is going to be easy to curate, archive, etc. -- then choose "recommended", or
    • if it's an "if you really must" format -- then choose "acceptable";
    • you might also want to discourage submissions in some format -- choose "discouraged" in such cases, and do consider providing a short explanation about what is the preferred alternative, if there is any; or mentioning why submissions in the given format are discouraged by the centre.

We suggest that you go domain by domain, and that you work with either fork of the SIS or work in a branch created from the local "formats" branch -- and then make your pull requests against that branch, please. (There are alternatives, to be described later.)

If you take the path of editing the source with an XML editor, you will be able to use the benefit of XML Schema and Schematron -- both are used to constrain the XML you're going to produce, often providing suggestions on the valid values and structures. You will then also be able to use the template provided in each empty recommendations document.

Using the data

Pointing to the data in the SIS

Using the data input into SIS to populate the centre's local pages

Implemented in https://github.com/IDS-Mannheim/IDS-Mannheim.github.io and the webpage is available at https://ids-mannheim.github.io/standards/

Clone this wiki locally