Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error importing subontology into snowstorm #2

Open
liquid36 opened this issue Apr 8, 2024 · 13 comments
Open

error importing subontology into snowstorm #2

liquid36 opened this issue Apr 8, 2024 · 13 comments
Assignees

Comments

@liquid36
Copy link

liquid36 commented Apr 8, 2024

Hi!
I want to use this utility create a light RF2 ZIP file in order to test my application workflow.

java --add-opens java.base/java.lang=ALL-UNNAMED -Xms4g -jar snomed-subontology-extraction-*-executable.jar  -source-ontology ontology-2024-03-26_11-57-07.owl  -input-subset concepts.list   -output-rf2  -rf2-snapshot-archive SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip

concepts.list

89901005
387713003
71388002
138875005

After running this command, i imported the zip file into snowstorm but i got the following error:

2024-04-08T13:24:19.229Z  INFO 1 --- [nio-8080-exec-1] o.s.s.rest.config.RestControllerAdvice   : bad request Duplicate concept document found with id 900000000000441003, A:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024 B:MAIN:1712582554941:Mon Apr 08 13:22:34 UTC 2024.

there are not any concepts duplicated in zip files.

@kaicode kaicode self-assigned this Apr 10, 2024
@kaicode
Copy link
Member

kaicode commented Apr 10, 2024

This is a very unusual error. I recommend deleting the Elasticsearch indices and trying the import again.
The easy way to delete all Elasticsearch indices is using a delete REST request:

curl -XDELETE http://localhost:9200/*

Then restarting Snowstorm will automatically recreate the indices that are needed, ready for the import.

@liquid36
Copy link
Author

I did it several times. Deleting everything and importing again.

What took my attention is that the importer only recognize 30 concepts but in the concepts files there is more:

sct2_Concept_Snapshot_INT_20240326.txt

id	effectiveTime	active	moduleId	definitionStatusId
106237007	20110131	1	900000000000012004	900000000000074008
116680003	20110131	1	900000000000012004	900000000000074008
123037004	20020131	1	900000000000207008	900000000000074008
129284003	20020131	1	900000000000207008	900000000000074008
138875005	20020131	1	900000000000207008	900000000000074008
246061005	20110131	1	900000000000012004	900000000000074008
260686004	20110131	1	900000000000012004	900000000000074008
260787004	20020131	1	900000000000207008	900000000000074008
362981000	20020131	1	900000000000207008	900000000000074008
363704007	20110131	1	900000000000012004	900000000000074008
387713003	20220930	1	900000000000207008	900000000000074008
405815000	20110131	1	900000000000012004	900000000000074008
410662002	20110131	1	900000000000012004	900000000000074008
424226004	20110131	1	900000000000012004	900000000000074008
609096000	20130731	1	900000000000012004	900000000000074008
69536005	20020131	1	900000000000207008	900000000000074008
71388002	20020131	1	900000000000207008	900000000000074008
733073007	20170731	1	900000000000012004	900000000000074008
762676003	20180131	1	900000000000012004	900000000000074008
762705008	20180131	1	900000000000012004	900000000000074008
762706009	20180131	1	900000000000012004	900000000000074008
86174004	20020131	1	900000000000207008	900000000000074008
89901005	20020131	1	900000000000207008	900000000000073002
900000000000003001	20020131	1	900000000000012004	900000000000074008
900000000000006009	20020131	1	900000000000012004	900000000000074008
900000000000010007	20020131	1	900000000000012004	900000000000074008
900000000000011006	20020131	1	900000000000012004	900000000000074008
900000000000013009	20020131	1	900000000000012004	900000000000074008
900000000000017005	20020131	1	900000000000012004	900000000000074008
900000000000020002	20020131	1	900000000000012004	900000000000074008
900000000000073002	20020131	1	900000000000012004	900000000000074008
900000000000074008	20020131	1	900000000000012004	900000000000074008
900000000000225001	20020131	1	900000000000012004	900000000000074008
900000000000227009	20020131	1	900000000000012004	900000000000074008
900000000000441003	20020131	1	900000000000012004	900000000000074008
900000000000444006	20020131	1	900000000000012004	900000000000074008
900000000000446008	20020131	1	900000000000012004	900000000000074008
900000000000447004	20020131	1	900000000000012004	900000000000074008
900000000000448009	20020131	1	900000000000012004	900000000000074008
900000000000449001	20020131	1	900000000000012004	900000000000074008
900000000000450001	20020131	1	900000000000012004	900000000000074008
900000000000451002	20020131	1	900000000000012004	900000000000074008
900000000000452009	20020131	1	900000000000012004	900000000000074008
900000000000454005	20020131	1	900000000000012004	900000000000074008
900000000000455006	20020131	1	900000000000012004	900000000000074008
900000000000506000	20020131	1	900000000000012004	900000000000074008
900000000000507009	20020131	1	900000000000012004	900000000000074008
900000000000508004	20020131	1	900000000000012004	900000000000074008
900000000000509007	20020131	1	900000000000012004	900000000000074008
900000000000511003	20020131	1	900000000000012004	900000000000074008
900000000000548007	20020131	1	900000000000012004	900000000000074008
900000000000549004	20020131	1	900000000000012004	900000000000074008
900000000000550004	20020131	1	900000000000012004	900000000000074008

Do you see something wrog here?

@kaicode
Copy link
Member

kaicode commented Apr 10, 2024

That concepts file looks fine. If you send me the zip I can debug the import.

@liquid36
Copy link
Author

This is the zip file

SnomedCt_test.zip

@kaicode
Copy link
Member

kaicode commented Apr 12, 2024

There are three duplicate entries in the sct2_Concept_Snapshot_INT_20240326.txt file.

$ cut -f1 sct2_Concept_Snapshot_INT_20240326.txt | sort | uniq -d
410662002
762705008
900000000000441003

Perhaps these concepts appear in the concept snapshot files within the SnomedCT_Argentina-EditionRelease_PRODUCTION_20230531T120000Z.zip archive more than once?

@kaicode
Copy link
Member

kaicode commented Apr 12, 2024

The concepts within the "SnomedCt_test.zip" zip file uploaded is very different from the concept file contents that were posted above. The contents above look okay but the one is the zip file contains duplicates.

@liquid36
Copy link
Author

So wired. The snomed-subontology-extraction outputs an RF2 folder and a zip file. I thought that they were the same but they don't.
i posted you the contents of the RF2 folder.

What are the difference ? do you know?

@liquid36
Copy link
Author

Well, i run the importer again with the content of the RF2 folder and it worked perfect.
thank very much.

@kaicode
Copy link
Member

kaicode commented Apr 12, 2024

Great news about the import!
That's strange about the RF2 folder.

@Semohsbi
Copy link

Hello,

I'm having trouble running a Java command, and I’m not very familiar with Java. Here’s what I tried:

& "C:\Program Files\Java\jdk-17\bin\java" -Xms4g -jar .\snomed-subontology-extraction-2.0.0-executable.jar -source-ontology .\ontology.xml -input-subset .\door.txt -verify-subontology

faced this error in attached file

screen1

and when I ran this command:

& "C:\Program Files\Java\jdk-17\bin\java" -Xms4g --add-opens java.base/java.lang=ALL-UNNAMED -jar .\snomed-subontology-extraction-2.0.0-executable.jar -source-ontology .\ontology.xml -input-subset .\door.txt -verify-subontology

image

I also tried using TTL and RDF/XML formats for the ontology, but no luck. Any advice on fixing this would be really helpful!

Thanks!

@kaicode
Copy link
Member

kaicode commented Oct 30, 2024

The first command you tried failed because of a security issue. The second command you ran has the extra parameters to overcome the security issue but it seems to have not selected anything.

The ontology input format should use functional syntax. This can be generated using the SNOMED OWL Toolkit, see SNOMED to OWL Conversion. You can grab the jar file for that from the snomed-owl-toolkit releases page. For example "snomed-owl-toolkit-5.3.0-executable.jar".

This will produce an owl file with a filename like ontology-2024-10-30_19-45-07.owl, that should be used with the subontology -source-ontology param.

@Semohsbi
Copy link

I’m working on a project to create a subontology focused on “Door” using the IFC4 ontology from buildingSMART, available in JSON, TTL, NL, and XML formats (https://standards.buildingsmart.org/IFC/DEV/IFC4/ADD2_TC1/OWL/index.html).

The snomed-subontology-extraction tool requires OWL functional syntax. Is there the possibility of using these format in this tool? Would you recommend using TTL or RDF/XML formats from the IFC4 options, or is there another format that would work better?

Additionally, the SNOMED OWL Toolkit needs an RF2 structure, and I’m unsure if IFC4 includes this format. Would I need to convert the ontology into RF2 to use this tool?

Thank you for any guidance!

@kaicode
Copy link
Member

kaicode commented Oct 31, 2024

The SNOMED Subontology Extraction tool has been created to work with the SNOMED CT Ontology and SNOMED CT RF2 release files only. It is not intended for us with other ontologies. The algorithm expects specific SNOMED CT concepts and axioms to be present and also uses the SNOMED CT attribute hierarchy.

Unfortunately I do not think this tool is suitable for use with the IFC4 ontology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants