Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC - new GOA-GOC data exchange pipeline #94

Open
3 of 20 tasks
pgaudet opened this issue Oct 7, 2024 · 4 comments
Open
3 of 20 tasks

QC - new GOA-GOC data exchange pipeline #94

pgaudet opened this issue Oct 7, 2024 · 4 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Oct 7, 2024

The data coming from the GOA pipeline is on AmiGO staging. This ticket looks at the differences across the two datasets.

GOA ftp: https://ftp.ebi.ac.uk/pub/contrib/goa/panther_proteomes/
GOA error reports for external groups: https://ftp.ebi.ac.uk/pub/contrib/goa/reports/
Stats (GOC release): https://docs.google.com/spreadsheets/d/1asamlC32E8HDGCqUVaE1O3hp4jp-Y6Z7nt6PrK_6jGw/edit?gid=0#gid=0

Need to check all sources

  • dictybase
  • ecocyc >> alreaddy loaded via GOA; GOC only loads GPI
  • fb
  • genedb
  • goa
  • japonicusdb
  • mgi
  • noctua
  • paint
  • pombase
  • pseudocap
  • reactome
  • rgd
  • rnacentral (GOC only loads GPI)
  • sgd
  • sgn
  • tair
  • wb
  • xenbase
  • zfin

Known differences between GOA and GOC pipeline:

  • ND filtered by GOA
  • interontology links are instantiated
  • do_not_annotate are filetered
@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 16, 2024

Decrease in ND, since GOA is filtering ND annotations if there are other annotations

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 16, 2024

pombe taxon is different between pombase ad Uniprot

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 16, 2024

SGD:
AmiGO staging: 54,426 annotations assigned by SGD
QuickGO: 54,733 annotation sassigned by SGD

S. cerevisiae:
AmiGO staging: 117,992 annotations
QuickGO: 146,440 annotations (RNA, complexes and SP Reviewed proteins)

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 17, 2024

Human data:

  • Large increase in IEAs:
    • 73K human IEAs on amigo-current (protein only) 121k (all entity types); 235k amigo-staging (protein only) & 283k (all entity types); QuickGO: 1M; reference proteomes only = 418k

@pgaudet pgaudet changed the title Major changes - GOA-GOC joint pipeline QC - new GOA-GOC data exchange pipeline Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 2024-10 GOA-GOC new pipeline test
Development

No branches or pull requests

1 participant