Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flip flopping of annotation numbers #17171

Closed
ValWood opened this issue Apr 18, 2019 · 48 comments
Closed

flip flopping of annotation numbers #17171

ValWood opened this issue Apr 18, 2019 · 48 comments

Comments

@ValWood
Copy link
Contributor

ValWood commented Apr 18, 2019

pombase/pombase-chado#722

There is something very sinister going on here:
We didn't change anything. Look at the totals for last week, vs this week.
This is an extreme example of the random flip-flopping I keep 'going -on' about. I'm convinced it is some arbitrary effect of owl tools traversing different regulates paths. Beyond that I don't know what I am talking about. But what I am seeing is a big issue.

compare
how it was
transcription

and how it is today
flip flopping

Note that, despite the number of annotations to "regulation of transcription" being identical, the number of annotations to transcription changes radically.

@ValWood
Copy link
Contributor Author

ValWood commented Apr 18, 2019

@balhoff @cmungall needs troubleshooting. I'm sure there is a problem. Annotation numbers jumping all over the place. Something to do with randomly different propagation over the regulates relationship.

Cheers

`Val

@ValWood
Copy link
Contributor Author

ValWood commented Apr 18, 2019

Actually I came across this, tidying our trackers
pombase/pombase-chado#678
It's the same issue, and spawned an OWLtools ticket back in June 2018 owlcollab/owltools#256

@ValWood
Copy link
Contributor Author

ValWood commented Apr 21, 2019

OK this is what we had late last week:
transcription

this is today
today

the changes are RADICAL. The only number that stays the same is

495 | (regulation of transcription, DNA-templated (GO:0006355) OR transcription, DNA-templated (GO:0006351))

81 | ((regulation of transcription, DNA-templated (GO:0006355) OR transcription, DNA-templated (GO:0006351)) NOT transcription, DNA-templated (GO:0006351))
jumped up from 30!

414 | transcription, DNA-templated (GO:0006351)
down from 465

305 | (regulation of transcription, DNA-templated (GO:0006355) AND transcription, DNA-templated (GO:0006351))
down from 356

We really need to find the cause of this urgently. Imagine the knock-on effect in analyses.....

Note that, as far as I am aware we did not make any changes in annotations to transcription during this time.

these are the only 3 papers we approved:
https://www.pombase.org/reference/PMID:30773398
https://www.pombase.org/reference/PMID:30853434
https://www.pombase.org/reference/PMID:30925937

@balhoff
Copy link
Member

balhoff commented Apr 22, 2019

Hi Val, I think you are using the go-basic snapshot for this, correct? Are the differences connected to different snapshots? Or is it possible a different release of owltools is being used at different times?

Are we talking about output from the --save-closure-for-chado command as in owlcollab/owltools#256?

@ValWood
Copy link
Contributor Author

ValWood commented Apr 23, 2019

Hi @balhoff I think so, but @kimrutherford would need to confirm.

I believe owlcollab/owltools#256 is the same issue.

@kimrutherford
Copy link

Hi Jim.

I think you are using the go-basic snapshot for this, correct?

We're using http://purl.obolibrary.org/obo/go/snapshot/go-basic.obo

Or is it possible a different release of owltools is being used at different times?

We haven't changed our version of OWLTools since December. Would it help to upgrade to a newer version?

Are we talking about output from the --save-closure-for-chado command

Yep!

Please let me know if I can help track down the problem. I'm about to head to bed (I'm in New Zealand) but I'll do some investigating tomorrow to see if I can narrow things down.

@balhoff
Copy link
Member

balhoff commented Apr 23, 2019

We haven't changed our version of OWLTools since December. Would it help to upgrade to a newer version?

I don't think so. I'm just trying to understand what would be different between the runs with different output. I.e. could ontology snapshots be differing so widely from release to release?

@ValWood
Copy link
Contributor Author

ValWood commented Apr 23, 2019

I don't think it's anything to do with the ontology snap-shots. This is only a hunch but in some branches where I see this effect I know there have been no changes. We have also seen the effect if we have run twice on the same ontology version (I think). There is a random element to the observations. I think it must be something to do with incomplete or arbitrary path following.

@ValWood
Copy link
Contributor Author

ValWood commented Apr 23, 2019

pombase/pombase-chado#678
might be useful...

@ukemi
Copy link
Contributor

ukemi commented Apr 23, 2019

Then should we move this ticket to the annotation tracker if it's not an ontology problem?

@ValWood
Copy link
Contributor Author

ValWood commented Apr 23, 2019

It isn't an annotation problem though. So it is probably better on this tracker until the cause is known. It might be an owl tools problem but at the moment we don't know quite what causes it...

@kimrutherford
Copy link

I've compared the output of owltools --save-closure-for-chado for the go-basic.obo files over the last few weeks and found some oddities.

The output using go-basic.obo from 2019-04-13 has these inferred relations:

GO:0030702      RO:0002211      5       GO:0006351
GO:0030702      RO:0002212      4       GO:0006351

Those relations are missing from the output for the OBO file from 2019-04-17 then they re-appear in the 2019-04-23 output. Maybe that would explain the flip-flop that Val saw?

The two terms are:

  • chromatin silencing at centromere (GO:0030702)
  • transcription, DNA-templated (GO:0006351)

RO:0002211 is regulates and RO:0002212 is negatively regulates.

@kimrutherford
Copy link

I did some more digging and noticed that I get different results from same version of owltools when run on a different machine. One machine has OpenJDK 11.0.2 the other has 1.8.0

The flip-flopping between go-basic-obo versions happens on both machines but in opposite directions. On one machine (with v11.0.2) go-basic.obo from 2019-04-13 has the inferred relations above but on the other (with 1.8.0) it doesn't.

I believe owlcollab/owltools#256 is the same issue

I think so.

@balhoff
Copy link
Member

balhoff commented Apr 24, 2019

I did some more digging and noticed that I get different results from same version of owltools when run on a different machine. One machine has OpenJDK 11.0.2 the other has 1.8.0

This is getting interesting! And that is with the same input ontology?

I wonder if some owltools dependencies are be out of date and not reliable on Java 11. In the past there have been mysterious classpath loading issues related to OWL API which affected parsers. Could you try using go-basic.owl instead of .obo? I would expect it to be more robust.

@balhoff
Copy link
Member

balhoff commented Apr 24, 2019

I tried today's go-basic.obo using Java 8 on both my Mac laptop and a Linux server—different output! The lines you mentioned are missing from the Linux version.

@kimrutherford
Copy link

And that is with the same input ontology?

It's the same owltools and the same go-basic.obo with different versions of Java.

I wonder if some owltools dependencies are be out of date and not reliable on Java 11.

That wouldn't explain why the inferred relations are flip-flopping when using JDK 1.8.0

Could you try using go-basic.owl instead of .obo? I would expect it to be more robust.

I just grabbed go-basic.obo and go-basic.owl from here: http://skyhook.berkeleybop.org/release/ontology/

I get a different output from the two files and from the two Java versions.

With OpenJDK 1.8.0 and the OBO file the inferred relations are in the output, with the OWL file the two relations are missing from the output.

With OpenJDK 11.0.2, the output for the OBO file doesn't contain the 2 inferred relations. The output for the OWL file does include them.

@balhoff
Copy link
Member

balhoff commented Apr 24, 2019

If I run owltools differently, using the same machine (laptop) and the same Java, I get different output:

java -jar owltools-runner-all.jar go-basic.obo --save-closure-for-chado closure.txt (missing lines)

Correction—owltools-runner-all.jar was an old artifact and was just confusing the issue; running from the current jar seems to work on both Mac and Linux

./owltools go-basic.obo --save-closure-for-chado closure.txt (included lines)

These are two different packaged forms built by the owltools build.sh. I wonder if the issue is similar to this: ontodev/robot#98

Update—if I save the ontology in OWL functional syntax, I get the opposite result with the two ways of running. Furthermore the relations go back to the old label style in the output instead of IDs... 😑

@ValWood
Copy link
Contributor Author

ValWood commented Apr 24, 2019

Thank you! This has been driving me crazy for about a year!

@pgaudet
Copy link
Contributor

pgaudet commented Apr 24, 2019

Sounds like something we should document somewhere ?

@balhoff
Copy link
Member

balhoff commented Apr 24, 2019

@pgaudet this is definitely an owltools bug. It's pretty mysterious but I'm trying to figure it out.

@kimrutherford can you please test all your scenarios with this owltools: https://build.berkeleybop.org/job/owltools/1423/artifact/OWLTools-Runner/target/owltools

I made a change to the way the Java code is executed.

@kimrutherford
Copy link

@kimrutherford can you please test all your scenarios with this owltools

Hi Jim.

I've tried that owltools and the results are still inconsistent. For example the output for go-basic.owl is different to that for go-basic.obo - the two inferred relations above are in the OBO file output but not in the OWL file output.

And the output from the OBO files changes if I change OpenJDK version and it flip-flops between go-basic snapshots.

@balhoff
Copy link
Member

balhoff commented Apr 25, 2019

@kimrutherford thanks for trying. It had worked consistently in the environments I was trying. I will send you one or two more configurations later today, if you don't mind some more testing.

@balhoff
Copy link
Member

balhoff commented Apr 26, 2019

@kimrutherford @ValWood this seems to be more complicated than I thought. I am continuing to work on it but not sure how long it will take.

@balhoff
Copy link
Member

balhoff commented May 20, 2019

@kimrutherford can you test another version of owltools? (sorry for disappearing for a little while) This version has the synchronized usages in this code removed, and has consistent output for me: https://www.dropbox.com/s/18pwwv0jd6o3nt7/owltools?dl=0

@kimrutherford
Copy link

I tested the same go-basic.obo files as above, from 2019-04-16, 2019-04-17 and 2019-04-18.

The output from processing the 2019-04-17 OBO file has these inferred relations:

GO:0030702      RO:0002211      5       GO:0006351
GO:0030702      RO:0002212      4       GO:0006351

but the output for the files from 2019-04-16 and 2019-04-18 don't have those lines.

That was with Java v11. I also tried v8 and in that case, 2019-04-16 and 2019-04-17 are the same as v11 but the output for 2019-04-18 was different: it did have the two inferred relations.

@pgaudet
Copy link
Contributor

pgaudet commented Aug 21, 2019

@ValWood do you still have problems ?

@kimrutherford
Copy link

The inferred relations still come and go. It last changed on Aug 15th.

@ValWood
Copy link
Contributor Author

ValWood commented Apr 21, 2020

This is still a problem, and it looks really bad, and introduced a lot of noise into the annotation and potentially any analyses.

I'm assuming the same bug affects GO annotation numbers internally too?

I'm mentioning it again because it is quite disconcerting to see the numbers bouncing around on a daily basis when you did no annotation changes......It seems quite important to address?

@balhoff
Copy link
Member

balhoff commented Apr 21, 2020

@ValWood this functionality is implemented in owltools, with a sort of ad hoc reasoning method, and no one is actively working on owltools anymore. The approach used in owltools is fine, but it would be more reliable and understandable to me if it was redone with a more modern approach using an OWL reasoner. Would you be able to define the general requirements for this, in case it would be simpler to just rewrite?

@ValWood
Copy link
Contributor Author

ValWood commented Apr 21, 2020

@kimrutherford will need to describe how/what we use and why.

I can only describe the problem I see in the output. I don't even know if the issue has been fully traced. My feeling is that different path are followed arbitrarily in some instances if there are multiple choices, but this only occurs when there are 'regulation of regulation' terms.

Note that this issue does not affect every term, but it always affects

  • transcription, DNA-templated (GO:0006351) ( ~10 gene products pop in and out oscillates between 420 and 430)

  • conjugation with cellular fusion (GO:0000747) (~10 gene products pop in and out
    oscillates between 96 and 105)

These two particular "slim" term annotation numbers oscillate all the time. Most others don't - I suspect this is because we do not have any "regulation of regulation" type annotations for these terms.

@ValWood
Copy link
Contributor Author

ValWood commented Apr 21, 2020

Should we be using different tools?

@kimrutherford
Copy link

@kimrutherford will need to describe how/what we use and why.

We use the output of owltools --save-closure-for-chado to populate the cvtermpath table in Chado. We then use that table downstream anywhere in PomBase where we need all the ancestors or descendants of a term. eg. on our query page and on term pages: https://www.pombase.org/term/GO:0006351

Hope that helps.

We'd be happy to move to another tool.

@ValWood
Copy link
Contributor Author

ValWood commented Sep 4, 2020

This seems to be a clear example of the 'flip-flopping' issue:
pombase/pombase-chado#789
Is a fix for this on the horizon?

Oscillation of annotation numbers in the absence of annotation changes isn't a great feature of GO :(

@ValWood
Copy link
Contributor Author

ValWood commented Sep 23, 2020

Is there any news on this ticket? This is one of the issues because it will make any analyses done with GO reproducible. The annotation numbers are arbitrarily dependent on the day that the analysis was performed.

@kimrutherford
Copy link

We'd be happy to use a different tool. We've been using owltools because it has the convenient --save-closure-for-chado flag but we're not tied to it.

(Perhaps this isn't the right issue tracker for this? It's not a GO problem)

@ValWood
Copy link
Contributor Author

ValWood commented Sep 23, 2020

But doesn't GO also use owl tools for inferences? If so it is a GO problem...

@balhoff
Copy link
Member

balhoff commented Oct 1, 2020

@ValWood we have been trying to eliminate use of owltools, although it continues to be used here and there. I think it may be possible to create a robot command to give you the output you need. Am I right that you want a table of (possibly inferred) relations between GO terms? Which relations do you want to see? Is it only part of, regulates, positively regulates, negatively regulates? is_a also? Do you want all redundant inferences? Meaning, if A regulates B, and B is_a C, do you want a line stating that A regulates C? If X part_of Y, and Y part_of Z, do you want a line X part_of Z?

@balhoff
Copy link
Member

balhoff commented Oct 1, 2020

Should have done it first, but now that I've looked into the owltools output myself, I think you want all the redundant inferences. Stay tuned.

@ValWood
Copy link
Contributor Author

ValWood commented Oct 2, 2020

I'm not sure. We would need to wait for Kim to comment and he is on holiday this week...

@balhoff
Copy link
Member

balhoff commented Oct 2, 2020

Okay @ValWood @kimrutherford here is a replacement you can use: https://github.com/balhoff/relation-graph It should always return the same results for the same input. :-)

Download this zip: https://github.com/balhoff/relation-graph/releases/download/v1.1/relation-graph-1.1.tgz

In there is a bin folder and a lib folder. Keep those next to each other wherever you install this. In bin there are scripts to run for either Unix or Windows.

Download http://purl.obolibrary.org/obo/go.owl. Personally I would use the OWL file instead of the OBO file; either will work.

Run it like this (requires Java):

./bin/relation-graph --ontology-file go.owl --non-redundant-output-file filtered.ttl --redundant-output-file closure.ttl --mode rdf --output-subclasses true --reflexive-subclasses false --equivalence-as-subclass false

The output file you want is the one specified in --redundant-output-file (closure.ttl). The "nonredundant" one is not that useful for this (and is not perfectly nonredundant). The output is RDF, but you should be able to munge it with a shell script if you want to make it look like the owltools output. http://www.w3.org/2000/01/rdf-schema#subClassOf is the same as is_a.

I can follow up to help with a transformation shell script if needed.

@ValWood
Copy link
Contributor Author

ValWood commented Oct 2, 2020

Thanks so much for this Jim, it's very much appreciated! I'm sure that Kim will let you know if he has any questions. I don't even understand what we need, or how you know what we need.

Have a good weekend.
val

@balhoff
Copy link
Member

balhoff commented Oct 2, 2020

No problem! For Kim's info, here is how the output will look:

<http://purl.obolibrary.org/obo/GO_0018235> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/GO_0008152> .
<http://purl.obolibrary.org/obo/GO_0018235> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/GO_0008150> .
<http://purl.obolibrary.org/obo/GO_0018235> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/GO_0018205> .
<http://purl.obolibrary.org/obo/GO_0018235> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/GO_0006807> .
<http://purl.obolibrary.org/obo/GO_0018235> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/GO_0043170> .
<http://purl.obolibrary.org/obo/GO_0030291> <http://purl.obolibrary.org/obo/BFO_0000051> <http://purl.obolibrary.org/obo/GO_0019901> .
<http://purl.obolibrary.org/obo/GO_0008427> <http://purl.obolibrary.org/obo/BFO_0000051> <http://purl.obolibrary.org/obo/GO_0019901> .
<http://purl.obolibrary.org/obo/GO_0042556> <http://purl.obolibrary.org/obo/BFO_0000051> <http://purl.obolibrary.org/obo/GO_0019901> .

kimrutherford added a commit to pombase/pombase-chado that referenced this issue Oct 13, 2020
@kimrutherford
Copy link

That's great Jim. Thanks very much!

I've installed relation-graph and it works for all the ontologies we use. So I've modified our loading code to understand the output and we'll be doing a full load test tonight (UK time).

I'll leave this issue open until Val and Midori have had a look at the results.

Thanks again.

kimrutherford added a commit to pombase/pombase-chado that referenced this issue Oct 13, 2020
@balhoff
Copy link
Member

balhoff commented Oct 13, 2020

Sounds good.

@kimrutherford
Copy link

Thanks very much Jim. It's working very well!:
pombase/pombase-chado#678 (comment)

@balhoff
Copy link
Member

balhoff commented Oct 14, 2020

Great news! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants