-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend UniProt export #173
Conversation
@realmarcin to get this over the finish line, I need your help getting a more precise relation between protein and Rhea codes, please see #168 (comment) and help me fill in this chart |
src/pyobo/sources/uniprot/uniprot.py
Outdated
for go_function_ref in _parse_go(go_functions): | ||
term.append_relationship(enables, go_function_ref) | ||
for _go_component_ref in _parse_go(go_components): | ||
pass # TODO what is the right relation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@realmarcin the model you gave didn't make an explicit differentiation between go processes, components, and locations. Can you help me find or a better relation between a protein and a cellular component implied by a GO annotation appearing in its UniProt record?
src/pyobo/sources/uniprot/uniprot.py
Outdated
term.append_relationship( | ||
# FIXME this needs a different relation, | ||
# see https://github.com/biopragmatics/pyobo/pull/168#issuecomment-1918680152 | ||
participates_in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a blocker for the PR
This should cover it but let us know if any more and what you think of the relations: protein -> reaction (participates_in) |
) | ||
BASE_URL = "https://rest.uniprot.org/uniprotkb/stream" | ||
SEARCH_URL = "https://rest.uniprot.org/uniprotkb/search" | ||
QUERY = "(*) AND (reviewed:true)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cthoyt -- thanks for merging the PR, this looks great and satisfies our request!
The only thing I noticed is that you are setting reviewed=True, which will return the SwissProt subset of reviewed annotations. For our purposes we do not use that flag but request all the data including unreviewed. As long as the reviewed is returned than it's very easy to split the data as needed.
The unreviewed annotations are also required to make complete proteomes and satisfy genome function ingest requirements as you would get from other resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a follow-up PR we can extend this to enable generation of OBO artifacts from arbitrary queries. Would be happy to review an external PR for this, or I will get to it when I have some free time
Closes #172
This PR adds several additional fields to the uniprot export:
ft_binding
- binding interactionsxref_proteomes
- theuniprot.proteome
reference(s?) from which a record is derivedcc_function
a nice high-level description of the function of the proteingo
GO terms associated with the protein (from all 3 hierarchies)xref_geneid
- NCBI Gene term from which a protein is producedIt also refactors the way the query URLs are constructed to be more easily extensible