-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of ontology fields (e.g. for Organism) #17
Comments
I went back to the thread, to the configuration and to the BRAPI messages. This is how things are rendered in the ISA user interface. This leaves us with the only workable option (Case2) for using ISAcreator and retaining the ability to accepting strings containing a colon (:) as input. The one thing I would change if the value for the default value in the configuration, from: to: This is to avoid confusing users (in ISAcreator at least) thus indicated that the field ISA Characteristics[Organism] should hold a class label(string) rather a compact URI (curi) This brings in fact another issue which is worth discussing: It is the use of something akin to a curi such as NCBITAXON:4577 in an ISA field which is meant to hold the Ontology Class Label. This is somehow a problem in the mapping we do between BRAPI TaxonID and ISA.Characteristics[organism] I was checking BRAPI documentation on apiary and it seems that from v1.3, "species" is deprecated in favor of 'germplasmSpecies' as per BRAPI documentation: ISA.Organism = concatenation of BRAPI Genus + BRAPI Species This means we'd need to alter the 'create_isa_characteristic' function in brapi_to_isa_converter.py to access additional arguments to populate term_source and term_accession. Final note: best |
The change you suggested isn't showing on github for me for some reason, though I did receive it via mail. But to double check that I understand everything properly: <field data-type="string" header="Characteristics[Organism]" is-file-field="false" is-forced-ontology="false" is-hidden="false" is-multiple-value="false" is-required="true">
<description><![CDATA[(MIAPPE: Organism) An identifier for the organism at the species level. Use of the NCBI taxon ID is recommended.]]></description>
<default-value><![CDATA[NCBITAXON:Zea Mays]]></default-value>
</field>
Do I understand all of points 1, 2 and 3 correctly? Furthermore, do I understand correctly that (4) the above will cause no issues with validation, and the tools we have around this (like brapi2isa) can use the above logic for all cases? |
@PapoutsoglouE I have tested the above validating an exemplar dataset and it did not throw errors.
but I have a question for you: It seems that NCBI taxonomy 'official' shorthand is 'NCBITAXON' (to build resolvable URI under obofoundry.org in the purl format). Does it mean that MIAPPE maintains a list of shorthand as well? If not, it may be beneficial to align with what is being used by OLS, Bioportal. thx! |
@PapoutsoglouE brilliant! The option works fine when batch processing/converting from the BRAPI2ISA service we have just fixed with @bedroesb but if doing it manually, there are no simple ways. One option would be to let users enter free text for that fields and then run the NCBO Annotator service with an ISA configuration set to be an ontology annotation field (but this is probably too complex already and impractical). I would have to try it out myself first the other option is to follow what is found in BRAPI, i.e. markup the last line (desperate?) would be to use an ISA.Comment[uri] field. It would be ignored by the validation service and would required a post processing script over the ISA to sanizatize it. again not ideal |
I think we are losing sight of our goal and straying too far from MIAPPE. In MIAPPE, after much debate, we made the conscious decision of not enforcing NCBITaxon identifiers for the Organism field, because in practical cases you can have plant crosses or varieties that are not listed in the NCBITaxon. What is desired for Organism is an identifier that is both unique and univocally corresponding to the actual biological material. We explicitly discussed the option of having the NCBITaxon anyway (in addition to another more specific identifier when needed) but deemed it unnecessary. In order to comply with MIAPPE, we have to respect this modeling decision and stick with a single field for Organism in ISA-Tab, which takes a unique identifier with the form String_prefix-colon-numeric_id. If, on the side of ISA, this means that we must configure the field as a String due to the issue with colons, and cannot use the ontology ref field because the use of NCBITaxon is not mandatory, then so be it. It means that we cannot use ISA functionalities to select an adequate taxon id or to validate that the field is one, but that is the price MIAPPE pays for flexibility. I reiterate, staying faithful to MIAPPE should be our first and foremost concern in the ISA implementation. People using MIAPPE ISA-Tab are, of course, free to add fields as they need and see fit (MIAPPE is a minimum specification, not a comprehensive one) but the default configuration should not deviate from the MIAPPE checklist. |
@proccaserra I noticed you went ahead and pushed a commit to v1.1 changing the example from NCBI:4577 to NCBITAXON:4577. There is no question that the correct prefix to use is indeed NCBITaxon, and it should have been the one used in the example in the MIAPPE checklist. However, by the same rationale as in my reply above, it is critical that the MIAPPE ISA-Tab implementation remain faithful to the MIAPPE checklist, which should be the reference point for documentation on any field. In the future, if you find any incongruence with the MIAPPE examples, please raise an issue about them on the MIAPPE checklist repository, so that they can be fixed there. Only after they are should they be fixed in the ISA-Tab implementation, in order to ensure that the latter remains faithful to the former. Additionally, I noticed you removed the "e.g." from the examples, a change which I am not fond of. While the fields are intended as examples, as far as ISA-Tab is concerned, they are default values. In my view, it is important that such examples not appear to be valid values (which they do without the e.g.), as otherwise you risk people not filling in a mandatory field but still obtaining a valid MIAPPE ISA-Tab, because the field was automatically filled with the default value (which likely is erroneous for that dataset). |
Issue to fix NCBI prefix created: MIAPPE/MIAPPE#57 |
There have been discussions around the best way to use this field.
In MIAPPE, it is expected to be either:
NCBI:4577
, orWUR:1234
(it is also possible to list crosses, e.g.
WUR:1234 x WUR:5678
)CASE 1
For identifiers that actually exist in ontologies, it would really be a great idea to use ISA-Tab's ontology references. It would be convenient in the interface of the ISA Creator too. When searching for a suitable term, it would look like this:
And the corresponding part of the study table would look like this (in the ISA Creator):
In this case, it is pointing to
http://purl.bioontology.org/ontology/NCBITAXON/4577
.This translates to the following raw text in ISA. In the Investigation file:
And, more crucially, in the Study file:
The configuration for this field looks like this:
CASE 2
If there is no ontology that can be used, and the user has to resort to an institutional identifier, the ISA Creator gives the following error when asked to validate the file:
(You can ignore the errors about the example dates - of course
e.g.
is not a valid date)As @proccaserra has pointed out, this happens because of the colon (
:
) in the field. Indeed, if it is removed, there is no error related to this field:In the study file, this translates to:
Which is probably not very useful. If validation was not an issue, we could however use something like the following:
But opening and validating the files with the ISA Creator results in a familiar error:
Indeed if the file above (with
WUR:1234
in one field) is saved with the ISA Creator, the contents of the field get split into theCharacteristics[Organism]
andTerm Source REF
fields (1234
andWUR
respectively) - the same result that the earlier attempt to inputWUR:1234
into the ISA Creator had.Even if it worked, we would be left with the following mapping issue, as MIAPPE's Organism field contents would end up in different places in each case, with a different format.
Characteristics[Organism]
field must hold the label of the taxon (e.g.Zea mays
) to pass validation, and the taxon ID is only visible in theTerm Accession Number
field, as part of the URI.WUR:1234
in the ISA Creator.Characteristics[Organism]
field, assuming that validation raised no errors.Only the top case in the screenshot gives no errors, but it is also unusable for institutional identifiers (since in many cases there are no ontologies, and other cases require crosses).
Current solution
Currently, the
Characteristics[Organism]
field is set to be a plain string field. This means that it does not use ISA's proper way of referring to ontology classes, but it also means that:Characteristics[Organism]
field is always where these identifiers end up, and finallyTherefore, currently, the field's configuration is:
Other options
Is there some other possible solution to the issues discussed above?
Otherwise, are there any strong opinions for choosing one option (field is an ontology term) over the other (field is plain string)?
The text was updated successfully, but these errors were encountered: