Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the "With" field (affects InterMIne) #394

Open
ValWood opened this issue May 23, 2022 · 4 comments
Open

A question about the "With" field (affects InterMIne) #394

ValWood opened this issue May 23, 2022 · 4 comments

Comments

@ValWood
Copy link

ValWood commented May 23, 2022

Describe the issue/bug

I was not sure which tracker to put this on so beginning with helpdesk.

InterMIne is loading pombe data for a pombeMIne
However, Intermine treats the entries in the GO-GAF "with" field as 'genes' .

Some of the PomBase 'with field' entries and not genes, so we get extra genes added.

See
intermine/pombemine#51

I wonder if we should make a specific syntax to refer to specific iso-forms in the "with" field?

I.e.
DB:gene_symbol[isoform_symbol]
so that it follows the same format as allele?

Screenshot 2022-05-23 at 13 24 05

This is a slight edge case, but we have a family of selfish genes (meiotic drivers) where the long isoform is the poison and the short isoform is the antidote, and so we can annotate the different known isoforms of the family members from the closely related fission yeast, or from other family members.

(this is probably affecting other mines. We only spotted it because I looked for genes without a feature type)

@cmungall
@vanaukenk

@ValWood ValWood changed the title A question about the "With" filed (affects InterMIne) A question about the "With" field (affects InterMIne) May 23, 2022
@vanaukenk
Copy link

@ValWood Are SPCC548.03c.1 and SPCC548.03c.2 transcripts? If so, I believe the DB:sequence_id was meant to cover transcripts.

It seems this is an issue with InterMine assuming that all With/From values represent genes which, for GO, certainly is not the case.

@ValWood
Copy link
Author

ValWood commented May 23, 2022

SPCC548.03c.1 and SPCC548.03c.2 are transcripts.

But it seem that at present we make a special case for alleles where we specify [gene] [allele]

I would not read DB:sequence_id as including transcripts/isoforms
(I assumed this was referring to accession numbers rather than symbols). It might be useful for the docs to be more explicit.

Anyway I will report back to InterMine that. they cannot assume that these identifers are genes.

v

@cmungall
Copy link
Member

But it seem that at present we make a special case for alleles where we specify [gene] [allele]

I had forgotten we have this. It doesn't seem well documented. I don't have the query at hand to see how often this is used, but an ad-hoc check reveals no usages in human or the MODs?

I'm guessing most parsers wouldn't deconstruct this and just treat this as an ID, and URL resolution would fail.

I think if we do want to refer to isoforms we use as isoform ID the same way we would anywhere else, e.g. c17, no ad-hoc syntaxes.

@ValWood
Copy link
Author

ValWood commented May 24, 2022

That makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants