Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imported ComplexPortal IDs not resolving as SGD IDs #364

Open
suzialeksander opened this issue Mar 21, 2024 · 8 comments
Open

Imported ComplexPortal IDs not resolving as SGD IDs #364

suzialeksander opened this issue Mar 21, 2024 · 8 comments

Comments

@suzialeksander
Copy link

Model on right (MOT2) was edited by SGD curators, model on left (CDC20) is as-imported. SGD would prefer the CPX-# Scer similar to the other yeast gene products. After a quick discussion with @dustine32, this might be a straightforward find & replace.

Image

@suzialeksander
Copy link
Author

tagging @vanaukenk to see if this looks like a simple "fix the SGD GPI" or something, or if this might be a larger issue.

@dustine32
Copy link
Contributor

To clarify, the "CPX-1306 Scer" in the right-side, edited model is the resolved label for NEO class SGD:S000218180, which is the SGD ID tying back to ComplexPortal:CPX-1306

For updating the left-side, as-imported model, we would need some lookup to map ComplexPortal:CPX-756 to its SGD namespace NEO class SGD:S000217886. It sounds like the SGD GPI could be this lookup.

Note that there are some ComplexPortal IDs in NEO but these example complex classes only exist in NEO using their SGD namespaces.

@vanaukenk
Copy link

@suzialeksander @dustine32

So, the idea here is to take the existing ComplexPortal entries, strip them of the ComplexPortal prefix, match the unique id to column three of SGD's GPI file (version 1.2?) and then replace any ComplexPortal curies in the Noctua models with the SGD curies so that the name will resolve properly for display?

@suzialeksander - going forward, will SGD include the ComplexPortal curies as dbxrefs to the SGD protein_complex entries in the gpi file?

@srengel
Copy link

srengel commented Apr 9, 2024

@vanaukenk the ComplexPortal curies are already in col9 of the SGD GPI. should they be somewhere else?

some example rows from our current GPI:

SGD	S000217570	CPX-532	Adaptor complex AP-1	APL2:APL4:APM1:APS1|EBI-11896492|Adaptor complex AP-1	protein_complex	taxon:559292		ComplexPortal:CPX-532	
SGD	S000217571	CPX-533	Adaptor complex AP-1R	APL2:APL4:APM2:APS1|EBI-11896583|Adaptor complex AP-1R	protein_complex	taxon:559292		ComplexPortal:CPX-533	
SGD	S000217572	CPX-534	Adapter complex AP-2	APL1:APL3:APM4:APS2|EBI-11896755|Adapter complex AP-2	protein_complex	taxon:559292		ComplexPortal:CPX-534	
SGD	S000217573	CPX-535	Adapter complex AP-3	APL5:APL6:APM3:APS3|EBI-11898515|Adapter complex AP-3	protein_complex	taxon:559292		ComplexPortal:CPX-535	
SGD	S000217574	CPX-536	cAMP-dependent protein kinase complex variant 1	2xBCY1:2xTPK1|EBI-11963349|cAMP-dependent protein kinase complex variant 1	protein_complex	taxon:559292		ComplexPortal:CPX-536	
SGD	S000217575	CPX-537	cAMP-dependent protein kinase complex variant 2	2xBCY1:2xTPK2|EBI-12003988|cAMP-dependent protein kinase complex variant 2	protein_complex	taxon:559292		ComplexPortal:CPX-537	
SGD	S000217576	CPX-571	cAMP-dependent protein kinase complex variant 3	2xBCY1:2xTPK3|EBI-12424950|cAMP-dependent protein kinase complex variant 3	protein_complex	taxon:559292		ComplexPortal:CPX-571	
SGD	S000217577	CPX-572	cAMP-dependent protein kinase complex variant 4	2xBCY1:TPK1:TPK2|EBI-12424978|cAMP-dependent protein kinase complex variant 4	protein_complex	taxon:559292		ComplexPortal:CPX-572	
SGD	S000217578	CPX-573	cAMP-dependent protein kinase complex variant 5	2xBCY1:TPK1:TPK3|EBI-12425007|cAMP-dependent protein kinase complex variant 5	protein_complex	taxon:559292		ComplexPortal:CPX-573	
SGD	S000217579	CPX-574	cAMP-dependent protein kinase complex variant 6	2xBCY1:TPK2:TPK3|EBI-12425036|cAMP-dependent protein kinase complex variant 6	protein_complex	taxon:559292		ComplexPortal:CPX-574	
SGD	S000217580	CPX-575	Ste12/Dig1/Dig2 transcription regulation complex	DIG1:DIG2:STE12|EBI-12448881|Ste12/Dig1/Dig2 transcription regulation complex	protein_complex	taxon:559292		ComplexPortal:CPX-575	
SGD	S000217581	CPX-576	Tec1/Ste12/Dig1 transcription regulation complex	DIG1:STE12:TEC1|EBI-12453638|Tec1/Ste12/Dig1 transcription regulation complex	protein_complex	taxon:559292		ComplexPortal:CPX-576	
SGD	S000217596	CPX-1150	SWI/SNF chromatin remodelling complex	ARP7:ARP9:RTT102:SNF2:SNF5:SNF6:SNF11:SNF12:SWI1:SWI3:SWP82:TAF14|EBI-15100957|SWI/SNF chromatin remodelling complex	protein_complex	taxon:559292		ComplexPortal:CPX-1150	

@vanaukenk
Copy link

@srengel - that's correct; the ComplexPortal xrefs should be in column 9 of the gpi.
I was looking at the gpi file available for download on current.geneontology.org which doesn't have those xrefs because it is derived from the GAF.
Sorry for any confusion!

@suzialeksander
Copy link
Author

suzialeksander commented Aug 27, 2024

Current models: ComplexPortal:CPX http://noctua.geneontology.org/editor/graph/gomodel:SGD_S000000240
CPX- Scer gomodel:SGD_S000000870

@dustine32 does this sound like a fix you can make? And does this sound like a one-off fix, or would something have to be fixed with each load?

@dustine32
Copy link
Contributor

@suzialeksander This sounds like some form of SPARQL UPDATE query done against the minerva modelstore though I think @balhoff can correct me on that. I don't think I've ever done a query sourcing a lookup file like ComplexPortal:CPX-1739 -> SGD:S000218211. Maybe we'd need to inject this lookup (using another query) as xrefs on NEO entities into the modelstore first? I could look at the regular ontology update process for reference. This is likely more of a project than a quick fix.

We'd have to schedule this update during a Noctua outage and, of course, we'd test this on noctua-dev's minerva first.

@kltm
Copy link
Member

kltm commented Aug 28, 2024

There could be a migration (sed on models on disk or SPARQL), but these are fiddly and I'd like to be clear on the mapping (file) to be used, or if it's just a couple of one-offs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants