Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend UniProt export #173

Merged
merged 9 commits into from
Apr 1, 2024
Merged

Extend UniProt export #173

merged 9 commits into from
Apr 1, 2024

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Mar 24, 2024

Closes #172

This PR adds several additional fields to the uniprot export:

  1. ft_binding - binding interactions
  2. xref_proteomes - the uniprot.proteome reference(s?) from which a record is derived
  3. cc_function a nice high-level description of the function of the protein
  4. go GO terms associated with the protein (from all 3 hierarchies)
  5. xref_geneid - NCBI Gene term from which a protein is produced

It also refactors the way the query URLs are constructed to be more easily extensible

@cthoyt
Copy link
Member Author

cthoyt commented Mar 24, 2024

@realmarcin to get this over the finish line, I need your help getting a more precise relation between protein and Rhea codes, please see #168 (comment) and help me fill in this chart

for go_function_ref in _parse_go(go_functions):
term.append_relationship(enables, go_function_ref)
for _go_component_ref in _parse_go(go_components):
pass # TODO what is the right relation?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@realmarcin the model you gave didn't make an explicit differentiation between go processes, components, and locations. Can you help me find or a better relation between a protein and a cellular component implied by a GO annotation appearing in its UniProt record?

term.append_relationship(
# FIXME this needs a different relation,
# see https://github.com/biopragmatics/pyobo/pull/168#issuecomment-1918680152
participates_in,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a blocker for the PR

@realmarcin
Copy link

@realmarcin to get this over the finish line, I need your help getting a more precise relation between protein and Rhea codes, please see #168 (comment) and help me fill in this chart

This should cover it but let us know if any more and what you think of the relations:

protein -> reaction (participates_in)
UniProt -> EC (enables)
UniProt -> GO molecular function (enables)
UniProt -> GO molecular process (enables)
UniProt -> GO cellular component (located_in)
GO molecular function -> EC (related_to)
Rhea -> GO molecular function (related_to)
Rhea -> GO molecular process (part_of)

@cthoyt cthoyt merged commit 786f474 into main Apr 1, 2024
8 checks passed
@cthoyt cthoyt deleted the update-uniprot branch April 1, 2024 15:40
)
BASE_URL = "https://rest.uniprot.org/uniprotkb/stream"
SEARCH_URL = "https://rest.uniprot.org/uniprotkb/search"
QUERY = "(*) AND (reviewed:true)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cthoyt -- thanks for merging the PR, this looks great and satisfies our request!

The only thing I noticed is that you are setting reviewed=True, which will return the SwissProt subset of reviewed annotations. For our purposes we do not use that flag but request all the data including unreviewed. As long as the reviewed is returned than it's very easy to split the data as needed.

The unreviewed annotations are also required to make complete proteomes and satisfy genome function ingest requirements as you would get from other resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a follow-up PR we can extend this to enable generation of OBO artifacts from arbitrary queries. Would be happy to review an external PR for this, or I will get to it when I have some free time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

possibilities to extend content for OBODB UniProt
2 participants