Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss how to prefix or fix entities #17

Open
9 of 15 tasks
matentzn opened this issue Jul 26, 2020 · 13 comments
Open
9 of 15 tasks

Discuss how to prefix or fix entities #17

matentzn opened this issue Jul 26, 2020 · 13 comments
Assignees

Comments

@matentzn
Copy link
Contributor

matentzn commented Jul 26, 2020

Some entities in the current triplestore are not very well amenable to CURIEfication; but I think some of them are simply mistakes that should be fixed. Here is a preliminary categorisation.

Probably fine, but not prefixable

Maybe we should just ignore these and add the full IRI in the short form and curie fields for SOLR? As an aside I dont think its a good idea to have URL parameters in an entity IRI, but hell, why not :P

I would suggest @dosumis you check [x] the ones that you know about, but I think they are all fine (just in case you want to delete one of them). Trick: if you dont know something, navigate to the triple store entity explorer and paste the IRI in (mind the <>).

Entities we should look into (probably typos in IRIs):

Missing URL parts like site or data

http://virtualflybrain.org/neuprint_JRC
http://virtualflybrain.org/neuronbridge
http://virtualflybrain.org/reports/neuprint_JRC_Hemibrain_1point1
http://virtualflybrain.org/reports/Xu2020Neurons

Broken RO relation (in DPO, ticket!)

http://www.obofoundry.org/ro/ro.owl#has_participant

Entities with regular IRI, but hard to prefix

unless you accept / in the short name:
https://doi.org/10.1101/2020.01.10.902478
doi:10.1101/2020.01.10.902478

This is probably the right way, given how Zenodo seems to do it.

DOI:

https://doi.org/10.1002/cne.24877
https://doi.org/10.1101/122952
https://doi.org/10.1101/198648
https://doi.org/10.1101/2020.01.10.902478
https://doi.org/10.1101/2020.01.21.911859
https://doi.org/10.1101/2020.04.17.047167
https://doi.org/10.1101/238147
https://doi.org/10.1101/376384
https://doi.org/10.1101/617936
https://doi.org/10.1101/617977
https://doi.org/10.7554/eLife.53518

creativecommons:

http://creativecommons.org/licenses/by-nc-sa/3.0
http://creativecommons.org/licenses/by-nc-sa/4.0
http://creativecommons.org/licenses/by-sa/4.0/
http://creativecommons.org/licenses/by/4.0
http://creativecommons.org/ns#attributionURL
@matentzn
Copy link
Contributor Author

By the way, I would suggest to always roll a bespoke VFB id for site, because URLs can change so easily.

@dosumis
Copy link
Member

dosumis commented Jul 28, 2020

@dosumis
Copy link
Member

dosumis commented Jul 28, 2020

DOIs: - allowing '/' in short_form is a problem for short_form generation code assumptions (VFB_neo4j). Not sure of best soln.

@matentzn
Copy link
Contributor Author

Maybe the best thing to do is not using DOIs as IRIs in VFB.. the only alternative I can think of that does not involve DOI1, DOI2 etc is what Robbie has being doing, str replacing the / with __. I am not that much in favour of these too options, but I understand that minting new IDs might be an overhead.. I will leave this ticket up to you! I would suggest you just go through all of them and fix them 1 by 1, and mint new ids wherever necessary. Spring clean.

@matentzn matentzn removed their assignment Jul 28, 2020
@dosumis
Copy link
Member

dosumis commented Aug 1, 2020

I've gone with http://virtualflybrain.org/reports/ as the base IRI for Licenses & Sites:

image
image

@matentzn IIRC you will fix nesting issues for curies arising from this by enforcing ordering on the curie spec. Is this correct?

@dosumis
Copy link
Member

dosumis commented Aug 1, 2020

For DOIs, I'm happy to use virtualflybrain/reports/ as base but we need a standard transformation of ID to short_form that curators can work with when loading content linked to preprints and that we can use for ontology term xrefs to preprints.

I think the two options are / -> _ or / escaped as \/. The latter would be more consistent with curator expectations. @matentzn @Robbie1977 - any preference?

@matentzn
Copy link
Contributor Author

matentzn commented Aug 1, 2020

I am assuming having publication ids minted is out of the question? Both of these require some kind of hacky development work (if short_form contains '' replace by '_', which also needs to be made configurable).. If you want to do it I estimate getting this into both the neo4j2owl importer and SOLR will be about 90 min, with testing etc.

@dosumis
Copy link
Member

dosumis commented Aug 1, 2020

@matentzn - I don't think I need you to do more on this. I see this being managed on the database side. I think the key question is whether the escaped short_form will work for your short_form generator.

@matentzn
Copy link
Contributor Author

matentzn commented Aug 1, 2020

Probably not - I am relying on URI get fragment - having a backslash will probably cause a malformed URI exception. But I am not sure I understand - how will pdb know about this new short form? PDB regenerates short forms from IRI, so are you going to actually change the IRI from https://doi.org/10.7554/eLife.53518 to https://doi.org/10.7554_eLife.53518 and then have a if doiURI(): replace last instance of _ with / in the Gepetto codebase?

@dosumis
Copy link
Member

dosumis commented Aug 1, 2020

The internal VFB issue is that, with the current schema, we need a short_form for pub that is either an external pub ID or some minimal transformation of one. This is distinct from whether the IRI is resolvable. For most pubs we use a FlyBase identifier and FlyBase iri: flybase.org/reports/FBrf0001234, which makes a well behaved short_form of the FlyBase ID. We indicate that this is resolvable using the tag self_xref: True. For preprints we don't have a FlyBase ID and need to specify identity by DOI. In this case I'll settle for the iri not being resolveable. We'll deal with specifying a resoleable iri separately.

@matentzn
Copy link
Contributor Author

matentzn commented Aug 1, 2020

Ok sounds good :)

@dosumis
Copy link
Member

dosumis commented Aug 3, 2020

@Robbie1977 - Are you happy with http://virtualflybrain.org/reports/ as a widely used base for internal VFB entities with readable short_forms? I'm not sure I'm completely comfortable with my own decision here. Let's discuss (Briefly!) on the call today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants