Skip to content

Updating format recommendations

Piotr Banski edited this page Jun 13, 2024 · 20 revisions

(*** See an error, omission, obsolete information below? Let us know by opening a new issue report with one click. Thanks! ***)

Format recommendations were introduced into the Standards Information System in 2021. They address, on the one hand, users' need to see which data formats are recommended by centres that offer data deposition services, and, on the other hand, a formal requirement on CLARIN B-centres to publish this information as a precondition for getting (re)certified. Information used to populate the early version of SIS-based format recommendations came most often from pages that several centres had already published (see https://www.clarin.eu/content/standards-and-formats). This information has not been transferred untouched into SIS, because (1) not every centre uses the three-way distinction in levels of recommendation (the SIS has: {recommended, acceptable, and discouraged}), and (2) not every centre explicitly uses the set of functional domains which the SIS recognizes. These guidelines are divided depending on whether the update is suggested/performed by a centre representative or by a "concerned user".

Centre representatives

By 'centre representative', we don't mean any kind of special status within a CLARIN centre -- all that we ask for is that you consult your suggestions with whoever is in charge of the deposition services or the centre itself, etc. In short: please make sure that the changes you suggest reflect your centre's profile.

In order to check what the recommendations are, go to https://clarin.ids-mannheim.de/standards/views/list-centres.xq and select your centre from there. In this view, you receive both the general information about the centre and the list of its recommendations. (As an example of what is possible, view the IDS info and switch between the "CLARIN" and "Text+" views to see the effect of language switch on the general information).

Either way you get there, you can now inspect the recommendations and see if they are correct and up-to-date. You can also export them as XML, although this is not the recommended way to proceed if you want to edit the recommendations rather than just dump the information quickly. Note also that the export from the centre page is complete (and the file is named exactly the same as it is in the SIS sources), while the export from the list of recommendations is tied to the RI selection (so, for CLARIN, it will choose all information fields tagged as "en" for English, or untagged, while for Text+ it will prioritise the information fields tagged with "de" for German).

"What happens if I want to make a correction to my centre's recommendation list?", you ask. There are several possibilities:

  • (the best way; minimal GitHub skills needed): fork the SIS repository, create your own local working copy, go to the directory that matches https://github.com/clarin-eric/standards/tree/formats/SIS/clarin/data/recommendations , and find the file for your centre. Edit it and then create a pull request against the formats branch. (The master branch is reserved for export into the live instance.) If you use an XML-aware editor, this way makes it possible to use the SIS schemas that will help you edit the recommendations document.
  • (not the optimal way) You may use the file exported from the centre view, edit it and submit the edited file back to us, either by mail or as a pull request against the formats branch. If you choose e-mail, the credits / edit history will not reference you (sure, we will do our best to put your name in the comment if we end up committing your work, but things happen and people forget stuff in a hurry, so why not consider using the option above).
  • the slowest way that doesn't require any GitHub skills: post a new issue at https://github.com/clarin-eric/standards/issues -- but that way it may naturally take longer before the recommendations are updated. On the other hand, for single changes, this might be a less error-prone way, so as long as you are willing to share, we'll be happy. Again, we'll do our best to reference you somehow in the commit, or -- with your permission -- put your name into the header of the recommendations file as the curator (if you submit the whole bundle on behalf of your centre, rather than e.g. fix a typo).

While preparing and submitting your suggestions, please keep the following in mind:

  • You may want to keep the list of formats and the list of domains open in your browser, for quick consultation; note the 'copy' symbol after their names --- it will copy the domain name and the format ID, which you can then use to add to, or modify, the recommendations;
  • however, if you are using an XML schema-aware editor, there will be no need to copy and paste values -- the schema offers them in the editor as drop-down lists with glosses. (This assumes that you have copied the schema separately or that you have forked the repository; do feel welcome to contact us if this is not clear).
  • Whichever way you edit the file, pleeease consider submitting it as a Pull Request; please note that we reserve the right not to work with severely malformed files -- it is nowadays easy to verify if the XML is well-formed, and, well, let's be nice to one another :-)

You will also notice some fields in the recommendations header that beg to contain your name, if you are a representative of your centre. This shows that the recommendations have the centre's "blessing" (and thus, among others, fulfil the B-centre certification requirements). If the maintainer's details are not provided, a red warning is displayed in the centre information page, to let the users know that the recommendations are not actively maintained.

Concerned users

In order to let us and the relevant centre know of your suggested corrections, please open an issue at https://github.com/clarin-eric/standards/issues. Note also that at many places in the SIS (such as format IDs or format descriptions), you can click to open a new GitHub issue -- depending on the context, some information may be pre-filled for you, to simplify your task.

See also: