Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inferred gaf (F-P) links, correct location and possible bugs #524

Closed
ValWood opened this issue Feb 2, 2018 · 25 comments
Closed

inferred gaf (F-P) links, correct location and possible bugs #524

ValWood opened this issue Feb 2, 2018 · 25 comments

Comments

@ValWood
Copy link
Contributor

ValWood commented Feb 2, 2018

We are currently getting the file from here:
http://build.berkeleybop.org/job/gaf-check-pombase/lastSuccessfulBuild/artifact/gene_association.pombase.inf.gaf

I don't know if this is the correct place. Can you confirm.

I notice a number of issues with this file.

The major one so far is that it contains many IC annotations to process terms with an F in the aspect column.

see for example
PomBase SPBC2F12.13 klp5 GO:1990758 PMID:21664573 IC GO:0005515 F kinesin-8 family plus-end microtubule motor Klp5 sot1 protein taxon:4896 20161018 GOC-OWL part_of(GO:1990758)

(there are many more)

I'll document the other problems once we have established that we are using the correct file.

@ValWood ValWood changed the title inferred gaf (F-P) links inferred gaf (F-P) links, correct location and possible bugs Feb 2, 2018
@mah11
Copy link
Contributor

mah11 commented Feb 2, 2018

Doug's comments on geneontology/go-annotation#1683 make it pretty clear that the world is meant to switch to files with "prediction.gaf" names, e.g. the pombase-prediction.gaf files linked to his first comment there. In another comment Doug also says that "prediction.gaf" files are intended to replace ".inf.gaf" files, and should have the same contents

But there are some big problems:

  • At least for PomBase, the "prediction.gaf" files contain ONLY inferred cellular component annotations. Although this is supposed to be the new place to find biological process annotations inferred via MF-BP links, there are no BP annotations in the file at all.
  • Files using the ".inf.gaf" names are still being produced by Jenkins (e.g. http://build.berkeleybop.org/job/gaf-check-pombase/lastSuccessfulBuild/artifact/gene_association.pombase.inf.gaf). This looks inconsistent with the notion that "prediction.gaf" is the way of the future, and is very confusing. Furthermore, these "inf.gaf" files have problems of their own:
    • They DO NOT have the same content as the "prediction.gaf" files. Instead, "inf.gaf" files do contain BP annotations, presumably inferred via MF-BP links. But it's missing some MF-BP inferences that we expect, per PomBase inferred GAF missing annotations #2226.
    • Val noted above that there are annotation lines that have a BP GOID in column 5, but 'F' in column 9.

In geneontology/go-annotation#1683 (comment) Doug asked for examples of annotations we would expect to see inferred. So here are a couple:

  • The GO:0005484 part_of GO:0061025 example in the original ticket 1683 summary is still an issue for PomBase. In a comment Tanya found the same for the TAIR file.
  • Another example, where the MF has IDA evidence, is PomBase jmj2/SPAC1002.05c. It's annotated to MF GO:0034647, which is part_of BP GO:0034721. But the GO:0034721 annotation is NOT in either pombase-prediction.gaf or gene_association.pombase.inf.gaf.

@dougli1sqrd
Copy link
Contributor

So just to be clear, the inf and prediction gaf files are the same file, just renamed. Owltools is for some reason no longer producing the other predicted lines. We're getting to the bottom of why that is.

@cmungall
Copy link
Member

cmungall commented Feb 3, 2018

Thanks for your patience. Here is where we are

filenames and locations

file names:

you are correct about the filename transition form old to new, .inf.gaf -> -prediction.gaf

locations:

as Eric stated in geneontology/go-annotation#1683 (comment)

The http://build.berkeleybop.org jobs are the old pipeline. Apologies if this is confusing while we're in transition.

lack of inferences

The aspect in the generated GAFs may not be correct, see separate ticket: #524
Although the aspect is redundant with the GO ID, and software looks at the latter this must be fixed. It also creates a very confusing situation for spot-checking the GAFs.

We believe the other issues are to do with a recent ontology change we hope to have this figured out soon

UPDATE

I am still a bit baffled. There is some change in the ontology that caused us to drop inferences. However, when I try and reduce it to a small test case, the inferences always succeed. Further details here owlcollab/owltools#238, but we will update this ticket when we have resolved the underlying problem

@cmungall
Copy link
Member

cmungall commented Feb 3, 2018

OK GOT IT

We have two part_ofs in the GO import chain. The official BFO:0000050 part-of that is used directly in GO, and so#part_of that is coming from the so_import module.

While these have unambiguous URIs as far as the OWLAPI is concerned, in c16 we made the decision to use labels like part_of. The OWLTools code naively does a lookup and takes the first URI, not expecting there to be more than one (bad assumption! always code defensively!).

This means that the c16 expressions were being mapped to the wrong URI, so they weren't matching definitions in the ontology - this explains the lack of deepening 'IC' inferences.

This was hard to trace - it only manifests under some contexts

This doesn't necessarily explain everything, but I suspect the lack of F->P is related

No fixes til monday but I can rest easier now we have found this

@cmungall
Copy link
Member

cmungall commented Feb 3, 2018

Yep, this explains lack of F->P too - see BasicAnnotationPropagator.getDirectLinkedClasses

cmungall added a commit to geneontology/go-ontology that referenced this issue Feb 3, 2018
this is because SO has fake part-ofs that are confusing both to curators
and unfortunately to software (this will be fixed)
see owlcollab/owltools#238

this is a partial fix to geneontology/go-site#524
however, the software needs to be more more robust
@ValWood
Copy link
Contributor Author

ValWood commented Feb 3, 2018

Fabulous! Let me know when it all settles down and I'll run through my checks.
Have a good weekend!

@cmungall
Copy link
Member

cmungall commented Feb 5, 2018

The fix should have propagated and you should have had a shiny new report..

But the pipeline encountered a new class of error in one of the GAFs: geneontology/go-annotation#1797

The pipeline 'fails fast' in these scenarios. We will take this class of aspect error and instead strip the line and report, rather than hard failing. @dougli1sqrd

We should have this resolved soon

@tberardini
Copy link
Contributor

When all the inferred reports are bright and shiny, please send out a message to the whole GOC with their new location. Then everyone who is still pointing their loading scripts at any other location can update accordingly. Thanks.

@ValWood
Copy link
Contributor Author

ValWood commented Feb 8, 2018

OK what is the current status?
the current file? only contains component annotations
http://snapshot.geneontology.org/annotations/pombase-prediction.gaf

pombase/pombase-chado#659 (comment)

@cmungall
Copy link
Member

cmungall commented Feb 9, 2018

Sorry for the late response we were at a meeting last 2 days.

@ValWood can you check http://snapshot.geneontology.org/annotations/pombase-prediction.gaf [See below -sjc]

The changes to the ontology have percolated through now, I have not thoroughly checked the file but all flavors of inference appear to be present.

Don't bookmark this URL yet. We are still finalizing the URLs for the reporting part of the new pipeline. @dougli1sqrd will provide full details when he is back on monday

@kimrutherford
Copy link
Contributor

only contains component annotations

Sort of - the Aspect column has "C" in a lot of cases where the term isn't a component term. Lke:

PomBase  SPAC1006.07  tif1  GO:0006412  PMID:10462529  IMP  C  translation initiati...

GO:0006412 is a process term.

@ValWood
Copy link
Contributor Author

ValWood commented Mar 20, 2018

Is the file ready to be checked. We are still not sure if we should be using the file from the new location yet?

Once this is clear I can check.
Might be able to close
geneontology/go-annotation#1674 closed
geneontology/go-annotation#1544 closed
geneontology/go-annotation#1489 closed
geneontology/go-annotation#1427
geneontology/go-annotation#1398 closed
#2226
geneontology/go-annotation#1576 closed
Sorry I really didn't realise I had submitted so many tickets about this!

@kltm
Copy link
Member

kltm commented Mar 20, 2018

@ValWood This should be the forever home of the prediction GAF in the new pipeline:
http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf

@tberardini
Copy link
Contributor

@ValWood When you've had a chance to proof this file:
http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf

please report back. :) I'll switch our pull of MFBP data to the analogous location for the TAIR file if yours passes your stringent tests.

http://snapshot.geneontology.org/products/annotations/tair-prediction.gaf

@tberardini
Copy link
Contributor

Note to self: cronjob 14 is the one that needs updating

@ValWood
Copy link
Contributor Author

ValWood commented Mar 22, 2018

OK, we will switch to use the new file location since it has the same contents as the old file that we are currently using.

I will check the file contents and report any issues clearly in new tickets. As I do this I'll close off any duplicate issues, because there are often multiple issues per ticket, and long threads.

For me, this ticket can close if people are happy that the new location is the correct file to use.

There are still data issues, more soon....

@ValWood
Copy link
Contributor Author

ValWood commented Mar 22, 2018

OK @tberardini here is my summary:

There still appears to be an issue of missing annotation and reduced file size Midori’s ticket
#2226
I don’t know what caused the large size drop, but I have recorded some examples of annotation which we would expect to see but do not.

There is a shiny new ticket for redundant annotations:
#576
I think this will be fixed by
geneontology/go-annotation#1427 the ticket for redundancy
These tickets were closed as duplicates of this , clearly this was an obsession:
geneontology/go-annotation#1576
geneontology/go-annotation#1674
geneontology/go-annotation#1544
geneontology/go-annotation#1489
geneontology/go-annotation#1398

There are problems with evidence code transfer, discussed in Rachel’s ticket here:
geneontology/go-annotation#1487

Some standard GO syntax/consistency checks required
#578

We are currently filtering the component annotations for PomBase at present because we think the evidence codes are misleading geneontology/go-annotation#1487.

I think otherwise the files are good to use, but the redundancy is annoying and it does not contain all of the annotations we would expect.

We can close this ticket because the issues are all covered in clearer tickets.

  • @cmungall only thing to do here is let everyone know that the file location has changed?

@ValWood
Copy link
Contributor Author

ValWood commented May 11, 2018

Can this be closed...only need to let everyone know about the file location move

@ValWood
Copy link
Contributor Author

ValWood commented Jul 4, 2018

New file location is?

@ValWood
Copy link
Contributor Author

ValWood commented Nov 8, 2018

Maybe this can be closed?

@pgaudet
Copy link
Contributor

pgaudet commented Nov 9, 2018

@ValWood what was the action ? You want the file location of which file ?

@ValWood
Copy link
Contributor Author

ValWood commented Nov 9, 2018

Actually, maybe there is an action required reading the thread.

http://build.berkeleybop.org/job/gaf-check-pombase/lastSuccessfulBuild/artifact/gene_association.pombase.inf.gaf
is still produced/present but it out of date. Could these be removed/archived?

Seth said:
This should be the forever home of the prediction GAF in the new pipeline:
http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
so this answered the question about location.

monthly is
http://current.geneontology.org/annotations/pombase-prediction.gaf

I think all of the problems with contents are in other tickets.
Assigning to Doug to deal with the legacy files.

@ValWood
Copy link
Contributor Author

ValWood commented Nov 9, 2018

Assigning to Doug to deal with the legacy files.

I don't have permissions to do this. Could you @pgaudet

@ValWood
Copy link
Contributor Author

ValWood commented Nov 9, 2018

Chris covered new file locations at the meeting. It might be good to let people know about these changes via the mailing list. At least if the old files are removed it will force people to update.....

@ValWood
Copy link
Contributor Author

ValWood commented May 28, 2019

OK, I think we can close this?

@ValWood ValWood closed this as completed May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants