-
Notifications
You must be signed in to change notification settings - Fork 17
Detailed syntax of information elements in the SIS
(*** See an error, omission, obsolete information below? Let us know by opening a new issue report with one click. Thanks! ***)
This page provides more details on editing information elements in the SIS. Initially, as of July 2023, it restricts itself to info concerning the creation of data deposition format recommendations. If there is interest, the page can also eventually encompass info on preparing documentation on standards.
First of all, if you choose to clone or fork the entire standards
repository, editing XML information will be made easier thanks to the associated document grammars that provide some content completion or warn you about errors. That should work out of the box for any reasonably modern XML editor that recognises XML Schema and Schematron associations.
Data deposition format recommendations are hiding in the directory /SIS/clarin/data/recommendations/
In the process of preparing format recommendations, some information is completely predefined: these are the data domain names and the recommendation levels. XML Schema supplies them in the form of drop-down selections; otherwise you're down to copy&paste, and in the crucial places, the SIS makes that easier by providing buttons that automatically copy names into the clipboard. That is true of domain names and also the data deposition formats that have been described in the SIS.
Sometimes, the format that a centre recommends (or discourages, etc.) will not (yet) be described by the SIS. A list of such formats, not having their own information pages but nevertheless mentioned by recommendations, is to be found in our Sanity Checker, at the top.
If that still doesn't help, please make up a sensible ID and use that in your recommendations, and kindly notify us about that e.g. when submitting a pull request.
Use the element <info>
for that. Note that that element may bear the @xml:lang
attribute to indicate the language of the content. It is expected that, for example, Text+ centres are going to present at least some of their information in German (xml:lang="de"
). Where the attribute is not present, its value is defaulted to "en" = English.
The role of the comments is either to provide more information or to provide finer granularity. The latter role is illustrated below:
<format id="fWave">
<domain>Audiovisual Source Language Data</domain>
<level>recommended</level>
<comment>PCM-WAV, 48 kHz, 16 bit</comment>
</format>
<format id="fWave">
<domain>Audiovisual Source Language Data</domain>
<level>acceptable</level>
<comment>PCM-WAV with non-recommended parameters (not 48 kHz, 16 bit)</comment>
</format>
(Same format ID, same domain, but different recommendation levels depending on the subcategorisation provided in the comments.)
Comments can also be language-tagged:
<format id="fTextPlain">
<domain>Textual Source Language Data</domain>
<level>recommended</level>
<comment>without markup</comment>
<comment xml:lang="de">ohne Mark-up</comment>
</format>
They can also reference other formats:
<format id="fCHAT-XML">
<domain>Audiovisual Annotation</domain>
<level>discouraged</level>
<comment>Consider using <formatRef ref="fTEISpoken"/> instead.</comment>
</format>