possibilities to extend content for OBODB UniProt #172

realmarcin · 2024-03-21T19:06:20Z

Greetings from LBNL!

We would like to ingest microbial protein function from UniProt for KG-Microbe and a host-associated microbiome KG. This will also serve the UniProt protein ingest for KG-Hub KGs. We started with using the UniProt REST API to download a few fields for all proteins from our microbial taxon set (~100k).

Here is the repo for our UniProt2S3 download jenkins pipeline:
https://github.com/Knowledge-Graph-Hub/uniprot2s3

We would happy with just a minimal set of UniProt API fields - which covers the semantic namespaces for CHEBI, GO, EC, Rhea:
fields: ["organism_id", "id", "accession", "protein_name", "ec", "ft_binding", "go", "xref_proteomes", "rhea", "reviewed"]

This data will the be a perfect complement to your obo-db-ingest Rhea resource, which we already found very easy to use. If this same process could work for UniProt data a lot of people would be happy as its challenging to get it otherwise (eg we started exploring DAT files which would be bespoke parsing).

Would you consider adding these few extra fields (minimal set) to your [UniProtGetter] (https://github.com/biopragmatics/pyobo/blob/78b34bc85cccae4ec7a47ba777eed37130c4e48e/src/pyobo/sources/uniprot/uniprot.py#L25C7-L25C20) class?

@hrshdhgd @cmungall @bsantan

cthoyt · 2024-03-21T19:24:49Z

Hi @realmarcin, this is definitely possible. What are the relationships you want to use for each field?

realmarcin · 2024-03-21T20:45:46Z

Here is our schema diagram -- all biolink conformant. Let me know if you have any thoughts or if looks good!

And here is the slide in case text is helpful:
https://docs.google.com/presentation/d/1VIT06ROr-WusqJuvya8rj8kpUvLbH0gY-E66JVwYD5Y/edit#slide=id.g26c476712ee_1_0

cthoyt · 2024-03-22T13:54:20Z

Thanks @realmarcin for the share, but PyOBO is using RO relations wherever possible. Luckily, this has a high overlap with Biolink most of what's in this diagram can be translated to RO.

Also, it would be helpful if you could provide explanations of what each of the fields you want are. I don't know what ft_binding or xref_proteomes are, what kind of data is in them, or how I should use them.

realmarcin · 2024-03-23T18:29:44Z

Hi @cthoyt -- here is a gdoc with metadata and explanation of the different fields in the UniProt API request. Let me know if this answers your questions.
https://docs.google.com/document/d/1OEZvDgGu1xOvHRTUDWEvbz3bGFrp4s_u6qXx8y35ZGk/edit?usp=sharing

Closes #172

cmungall · 2024-03-25T14:53:44Z

@realmarcin - I don't think it is biolink conformant, but no worries :-)

Can I simplify the ask here?

The existing pyobo and obo-db-ingest for uniprot is very useful. But it's hardcoded for getting reviewed (swissprot) only. A number of groups have written duplicative ingest code for uniprot - using dat files, using sparql, etc. I think we should converge on pyobo. But I am told that the REST call doesn't scale for including say all GCRPs. If we can solve the general strategy then I think we can make it such that people can get the nodes and edges they need (many of which should not be put in the obo, see biopragmatics/obo-db-ingest#13)

Closes #172

cthoyt added a commit that referenced this issue Mar 24, 2024

Add additional fields to UniProt export

4f79902

Closes #172

cthoyt mentioned this issue Mar 24, 2024

Extend UniProt export #173

Merged

cthoyt closed this as completed in #173 Apr 1, 2024

cthoyt added a commit that referenced this issue Apr 1, 2024

Extend UniProt export (#173)

786f474

Closes #172

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possibilities to extend content for OBODB UniProt #172

possibilities to extend content for OBODB UniProt #172

realmarcin commented Mar 21, 2024

cthoyt commented Mar 21, 2024

realmarcin commented Mar 21, 2024

cthoyt commented Mar 22, 2024

realmarcin commented Mar 23, 2024 •

edited

Loading

cmungall commented Mar 25, 2024 •

edited

Loading

possibilities to extend content for OBODB UniProt #172

possibilities to extend content for OBODB UniProt #172

Comments

realmarcin commented Mar 21, 2024

cthoyt commented Mar 21, 2024

realmarcin commented Mar 21, 2024

cthoyt commented Mar 22, 2024

realmarcin commented Mar 23, 2024 • edited Loading

cmungall commented Mar 25, 2024 • edited Loading

realmarcin commented Mar 23, 2024 •

edited

Loading

cmungall commented Mar 25, 2024 •

edited

Loading