Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ongoing flip-flopping of annotation #678

Closed
ValWood opened this issue May 9, 2018 · 25 comments
Closed

ongoing flip-flopping of annotation #678

ValWood opened this issue May 9, 2018 · 25 comments
Labels
waiting waiting for external change (e.g. GO fix)

Comments

@ValWood
Copy link
Member

ValWood commented May 9, 2018

OK I don't know what the issue is but we seem to be loading some version of the ontology which
is different from the main GO obo. Or something is occurring so that some relationships are not recognised. I suspect the former, because this changes on a daily basis.

summary:

The number of annotations to
"conjugation with cellular fusion" keeps changing, even though we are not changing the annotation.

previously 108:
today :101
see the history here
geneontology/go-ontology#15641 (comment)

for some reason, some gene products which are annotated to terms which are descendents of "
"conjugation with cellular fusion" do not slim to "conjugation with cellular fusion".

Here is an example
lsc1 https://www.pombase.org/gene/SPBC530.13

has

GO:1904788 "positive regulation of induction of conjugation with cellular fusion by regulation of transcription from RNA polymerase II promoter"

but this does not always slim to "conjugation with cellular fusion"
(see the GO slim overview section)

@ValWood
Copy link
Member Author

ValWood commented May 9, 2018

and, even though you can "ascend" to this term page:
https://www.pombase.org/term/GO:0000747
from https://www.pombase.org/term/GO:1904788
lsc1 is not listed here

@ValWood
Copy link
Member Author

ValWood commented May 9, 2018

Its odd that sometimes the parent is there and sometimes not though.....

@mah11
Copy link
Member

mah11 commented May 9, 2018

we seem to be loading some version of the ontology which is different from the main GO obo

As far as I can tell, this is not the explanation. Kim says we load go-basic.obo, and when I checked yesterday the version in https://curation.pombase.org/dumps/latest_build/ontologies/ was identical to the go-basic.obo in the GO repository for the same date.

I can check again when Canto returns, but if we were loading different content, wouldn't that affect the "Parents" on the ontology term pages too?

@kimrutherford
Copy link
Member

I've had a look at this and I'm very confused. We load the transitive closure into Chado and then that is used by the website code to propagate the annotations upward. In Chado for today's load GO:1904788 has GO:0000747 as a parent via regulates. In yesterday's load Chado doesn't have that inferred relation.

(pombase-dev has today's update and the main site is still on yesterday's)

We create the transitive closure with owltools. I ran it on the command line for today and yesterday's go-basic.obo files and got different results.

For today it reports:

GO:1904788   regulates       5       GO:0000747

(The 5 is the length of the path that owltools found but 5 not the shortest path - odd)

Yesterday's output doesn't have that line.

Strangely, both today and yesterday have:

GO:1900406   regulates       3       GO:0000747

and GO:1904788 is_a GO:1900406

Perhaps there is a bug in owltools?

I'll dig deeper tomorrow after a sleep. I'll download the latest owltools to see if that makes a difference.

@mah11
Copy link
Member

mah11 commented May 9, 2018

That is bizarre. The go-basic.obo files from the last 4 builds have identical paths involving GO:1904788 (3 of the 4 are completely identical, and the only difference from the most recent file is that two MF terms got merged).

@ValWood
Copy link
Member Author

ValWood commented May 9, 2018

I knew there was a problem, but it isn't super urgent to fix, but we should get to the bottom of it......

@kimrutherford
Copy link
Member

That is bizarre. The go-basic.obo files from the last 4 builds have identical paths involving
GO:1904788

Yep, that's strange. I've just tried the latest version of OWLTools and the results are the same. Is there any else we can try or check before I create an owltools issue?

@mah11
Copy link
Member

mah11 commented May 10, 2018

I'm pretty much stumped. It sounds like you can feed the same file to owltools and get different results on different days. I got nuthin'

@ValWood
Copy link
Member Author

ValWood commented May 10, 2018

OK just more spooky craziness!

@ValWood
Copy link
Member Author

ValWood commented May 14, 2018

@cmungall says can you open a ticket on owl tools tracker and tag him he will take a look

@mah11
Copy link
Member

mah11 commented May 14, 2018

guessing that Chris means OWLTools - https://github.com/owlcollab/owltools/issues

@ValWood
Copy link
Member Author

ValWood commented May 14, 2018

yep.
I would do it but I'm not sure how to describe the problem

@cmungall
Copy link

it's ok let's try and sort out the problem here, but in general you have a higher chance of getting my attention on another tracker. Unfortunately the current owltools developers' last day is today but I can handle this.

@kimrutherford can I confirm that your pipeline does something like:

owltools go-basic.obo --save-closure-for-chado foo.tsv

@kimrutherford
Copy link
Member

owltools go-basic.obo --save-closure-for-chado foo.tsv

Yep, that's what we run.

These are the OBO files I was testing in my comment above - #678 (comment). The output of owltools changed quite a lot between these two files but we can't see why:

https://curation.pombase.org/kmr44/go-basic-2018-05-07.obo
https://curation.pombase.org/kmr44/go-basic-2018-05-08.obo

@ValWood
Copy link
Member Author

ValWood commented May 15, 2018

look, this was yesterday

yesterday

and actin cytoskeleton org was 128 yesterday

we did not change anything

today

it flip-flops like this every couple of days..... I really think it is a difference in the GO file

@mah11
Copy link
Member

mah11 commented May 15, 2018

But there aren't any differences in the GO files that would account for changing counts of GO:0000747 annotations. Literally the only difference between the go-basic.obo files from yesterday's and today's builds is a typo correction in a synonym of a MF term. Two letters transposed. Sometimes the GO files have been identical.

For the files Kim linked above (go-basic-2018-05-07.obo and go-basic-2018-05-08.obo) this is the complete diff:

2c2
< data-version: releases/2018-05-04
---
> data-version: releases/2018-05-07
265459c265459
< name: 2-octaprenyl-6-methoxy-1,4-benzoquinone methyltransferase activity
---
> name: 2-octaprenyl-6-methoxy-1,4-benzoquinone methylase activity
265460a265461
> alt_id: GO:0102005
265463c265464,265466
< synonym: "2-octaprenyl-6-methoxy-1,4-benzoquinone methylase activity" EXACT []
---
> synonym: "2-octaprenyl-6-methoxy-1,4-benzoquinone methyltransferase activity" EXACT []
> xref: EC:2.1.1.201
> xref: MetaCyc:2-OCTAPRENYL-METHOXY-BENZOQ-METH-RXN
438631,438639d438633
< id: GO:0102005
< name: 2-octaprenyl-6-methoxy-1,4-benzoquinone methylase activity
< namespace: molecular_function
< def: "Catalysis of the reaction: 6-methoxy-2-octaprenylhydroquinone + S-adenosyl-L-methionine <=> H+ + S-adenosyl-L-homocysteine + 5-methoxy-2-methyl-3-octaprenylhydroquinone." [EC:2.1.1.201]
< xref: EC:2.1.1.201
< xref: MetaCyc:2-OCTAPRENYL-METHOXY-BENZOQ-METH-RXN
< is_a: GO:0008168 ! methyltransferase activity
<
< [Term]

But Kim gets different results form OWLTools for the 05-07 vs 05-08 files.

@ValWood
Copy link
Member Author

ValWood commented May 16, 2018

So strange, any ideas @cmungall ?

@ValWood
Copy link
Member Author

ValWood commented May 16, 2018

and back again today??????

today

@ValWood
Copy link
Member Author

ValWood commented May 31, 2018

Summary, transitive closure was inconsistent over the same input file.
No alternative....

@kimrutherford
Copy link
Member

I've made an owltools issue about this: owlcollab/owltools#256

@ValWood
Copy link
Member Author

ValWood commented Jun 5, 2018

good, it still happens every couple of days...

2 june
2 june

3 June
3 june

@ValWood
Copy link
Member Author

ValWood commented Apr 18, 2019

Ah this is the same issue that I just put on the GO tracker. I thought this was all done and dusted....
https://github.com/geneontology/go-ontology

@kimrutherford
Copy link
Member

Hi Val.

The load worked last night including the improvements from geneontology/go-ontology#17171 (comment)

For now I haven't updated the main site so it still shows the load from Monday night. http://dev.pombase.kmr.nz has the updated results from Tuesday night's load which I think are an improvement. I'll hold back the update until you've had a look.

Looking at the BP slim, the counts for actin cytoskeleton organization are 99 vs 101:
https://www.pombase.org/term_genes/GO:0030036
http://dev.pombase.kmr.nz/term_genes/GO:0030036

mitotic cytokinesis has increased quite a lot:
https://www.pombase.org/term_genes/GO:0000281
http://dev.pombase.kmr.nz/term_genes/GO:0000281

protein-containing complex assembly has increased a bit:
https://www.pombase.org/term_genes/GO:0065003
http://dev.pombase.kmr.nz/term_genes/GO:0065003

@ValWood
Copy link
Member Author

ValWood commented Oct 14, 2020

Yes, I compared the slims too and all classes are either identical or increased with the new code.

old
transcription, DNA-templated | GO:0006351 | 385
new
transcription, DNA-templated | GO:0006351 | 390

I'm very happy, thanks
@cmungall
@kimrutherford
@balhoff

@kimrutherford
Copy link
Member

The improvements are now live on the main site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting waiting for external change (e.g. GO fix)
Projects
None yet
Development

No branches or pull requests

4 participants