Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove models with redundant IRIs #158

Open
goodb opened this issue Nov 16, 2020 · 3 comments
Open

remove models with redundant IRIs #158

goodb opened this issue Nov 16, 2020 · 3 comments
Assignees

Comments

@goodb
Copy link
Contributor

goodb commented Nov 16, 2020

the dev branch of noctua-models contains a bunch of models that share their IRI with another model

e.g. I see a model named WB_WBGene00011688 and a model named 323f7ea5-6d4b-4d54-a555-386c6df7a9c6
and both have model IRI http://model.geneontology.org/323f7ea5-6d4b-4d54-a555-386c6df7a9c6

We can't have multiple models with the same IRI in the minerva triple store. Right now, the model loader will load the first one it sees and then report an error and skip loading any other models that use the same IRI.

@dustine32
Copy link
Contributor

Not sure yet, but my current theory is that the UUID.ttl files were created/exported during the bulk taxon update. Here are the histories for WB_WBGene00011688.ttl and 323f7ea5-6d4b-4d54-a555-386c6df7a9c6.ttl.

The UUID.ttl file has the added model-level in_taxon property, which we want. But since the MOD import code still generates a random UUID for the model at each run, it will probably be difficult matching updated models to their staler versions the next time a fresh batch is cooked. This might also be moot as I've already implemented writing out multiple models to a single N-Quads (.nq) file.

So I guess I'm now just looking for confirmation from @goodb @kltm that I should just delete the gene_id.ttl (WB_WBGene00011688.ttl) files and keep the UUID.ttl (323f7ea5-6d4b-4d54-a555-386c6df7a9c6.ttl) files. We can figure out later how the next MOD imports load will identify which files to delete/replace. Does this sound like a good plan?

@goodb
Copy link
Contributor Author

goodb commented Nov 16, 2020

@dustine32 I'm not sure about your plans regarding the multiple-models per file option. If you are going with one model per gene, I think it would be a better idea to replace the contents of the gene-named files with the correct data (with the taxon) and drop the UUID titled files. As long as we are using github for this, keeping the file names for the same things stable across changes is better.

@goodb
Copy link
Contributor Author

goodb commented Nov 16, 2020

@dustine32 if the idea is to shift to using large multi-model nquads files we would want to check on the model loading process. Assuming that is good, then drop all the other previous forms here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants