Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FHIR: Output NPM format #511

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
.tox/
__pycache__/
.ipynb_checkpoints/
tests/output/
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
dist/
db/

Expand All @@ -25,7 +24,8 @@ notebooks/api-key.txt
.coverage.*
.coverage
coverage.*
tests/input/fhirjson_conf.json
tests/input/*_conf.json
Copy link
Contributor Author

@joeflack4 joeflack4 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fhirjson_conf.json setup (@cmungall review)

@cmungall Harder to review as time goes on as fades in back of memory. TLDR this is a minor choice; I think its (a) dynamically generate .gitignored conf JSONs , or (b) commit them as static files in test/input/. Existing / current pattern is(a).

Details 2023/11/26

We were also discussing in this comment. I set it up to write to a file. I think I followed an existing pattern you were using. I think I would prefer just statically saving as JSON and loading that way.

fhir_conf = {
"code_system_id": "test",
"code_system_url": "http://purl.obolibrary.org/obo/go.owl",
"native_uri_stems": ["http://purl.obolibrary.org/obo/GO_"],
}

output_path = str(OUTPUT_DIR / f"test_dump-{output_format}.out")

Details 2023/04/05

@hrshdhgd I should have made a 'review comment' earlier when tagging you. I like being able to hit 'resolve' to clean the page up a bit.

Regarding fhirjson_conf.json, I located the source. It's from a (very helpful) unit test that @cmungall added in another PR recently. Basically, in order to keep the CLI clean, we're planning on having some dumpers, etc, pull their parameters from a config file rather than declaring them in cli.py in the normal click way. Right now, the only such dumper output_type that has such a config is fhirjson.

conf_path = INPUT_DIR / f"{output_format}_conf.json"
with open(conf_path, "w", encoding="utf-8") as f:
json.dump(conf_object, f)

@cmungall It does look like this probably should be in the .gitignore, so I've added the following entry (though let me know if you actually do want to have these committed):
tests/input/*_conf.json

tests/output/

oak_hp.profile
oak_semsimian_hp.profile
Expand Down
2 changes: 1 addition & 1 deletion docs/packages/converters/obo-graph-to-fhir.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ OBO Graph to FHIR Converter

.. currentmodule:: oaklib.converters.obo_graph_to_fhir_converter

.. autoclass:: OboGraphToFHIRConverter
.. autoclass:: OboGraphToFhirJsonConverter
:members:
10 changes: 8 additions & 2 deletions src/oaklib/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,10 @@
from oaklib.io.rollup_report_writer import write_report
from oaklib.io.streaming_axiom_writer import StreamingAxiomWriter
from oaklib.io.streaming_csv_writer import StreamingCsvWriter
from oaklib.io.streaming_fhir_writer import StreamingFHIRWriter
from oaklib.io.streaming_fhir_writer import (
StreamingFhirJsonWriter,
StreamingFhirNpmWriter,
)
from oaklib.io.streaming_info_writer import StreamingInfoWriter
from oaklib.io.streaming_json_writer import StreamingJsonWriter
from oaklib.io.streaming_kgcl_writer import StreamingKGCLWriter
Expand Down Expand Up @@ -212,6 +215,7 @@
NL_FORMAT = "nl"
KGCL_FORMAT = "kgcl"
FHIR_JSON_FORMAT = "fhirjson"
FHIR_NPM_FORMAT = "fhirnpm"
HEATMAP_FORMAT = "heatmap"

ONT_FORMATS = [
Expand All @@ -222,6 +226,7 @@
JSON_FORMAT,
YAML_FORMAT,
FHIR_JSON_FORMAT,
FHIR_NPM_FORMAT,
Copy link
Contributor Author

@joeflack4 joeflack4 Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI & documenting config @joeflack4

I think this may still need:

  • 1. CLI
  • 2. Docs about the FHIR config JSON
Details

Tasks

1. CLI

It can call the dumper for fhirnpm, but need to settle on the params for it first.

2. Docs about the FHIR config JSON

Should probably document the config for that as with the fhirjson dumper.

Discussion

Joe 4/5/2023:
I haven't actually properly set up these yet.

Joe 9/2023:
You may consider this a blocker, the fact that there's no CLI. Since I didn't do it back in April, I'm thinking it's because I wanted your input on the params for some reason. If this is needed for merge, I can put this back into draft mode, look into it and let you know if/where I need feedback.

Joe 11/26/2023:
I think I may have time to do this in this PR now.

This comment was marked as duplicate.

CSV_FORMAT,
NL_FORMAT,
]
Expand All @@ -238,7 +243,8 @@
JSONL_FORMAT: StreamingJsonWriter,
YAML_FORMAT: StreamingYamlWriter,
SSSOM_FORMAT: StreamingSssomWriter,
FHIR_JSON_FORMAT: StreamingFHIRWriter,
FHIR_JSON_FORMAT: StreamingFhirJsonWriter,
FHIR_NPM_FORMAT: StreamingFhirNpmWriter,
INFO_FORMAT: StreamingInfoWriter,
NL_FORMAT: StreamingNaturalLanguageWriter,
KGCL_FORMAT: StreamingKGCLWriter,
Expand Down
82 changes: 78 additions & 4 deletions src/oaklib/converters/obo_graph_to_fhir_converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@
- Updates issue: https://github.com/INCATools/ontology-access-kit/issues/369
- Conversion examples: https://drive.google.com/drive/folders/1lwGQ63_fedfWlGlRemq8OeZhZsvIXN01
"""
import json
import logging
import os
import shutil
import tarfile
import tempfile
from dataclasses import dataclass
from typing import Any, Dict, List, Tuple, Union

Expand Down Expand Up @@ -52,7 +57,7 @@


@dataclass
class OboGraphToFHIRConverter(DataModelConverter):
class OboGraphToFhirJsonConverter(DataModelConverter):
"""Converts from OboGraph to FHIR.

- An ontology is mapped to a FHIR `CodeSystem <https://build.fhir.org/codesystem.html>`_.
Expand Down Expand Up @@ -86,7 +91,7 @@ def dump(
Dump an OBO Graph Document to a FHIR CodeSystem.

:param source: Source serialization.
:param target: Target serialization.
:param target: Target outpath.
:param kwargs: Additional keyword arguments passed to :ref:`convert`.
"""
cs = self.convert(
Expand Down Expand Up @@ -119,11 +124,11 @@ def convert(

To use:

>>> from oaklib.converters.obo_graph_to_fhir_converter import OboGraphToFHIRConverter
>>> from oaklib.converters.obo_graph_to_fhir_converter import OboGraphToFhirJsonConverter
>>> from oaklib.datamodels.obograph import GraphDocument
>>> from linkml_runtime.dumpers import json_dumper
>>> from linkml_runtime.loaders import json_loader
>>> converter = OboGraphToFHIRConverter()
>>> converter = OboGraphToFhirJsonConverter()
>>> graph = json_loader.load("tests/input/hp_test.json", target_class=GraphDocument)
>>> code_system = converter.convert(graph)
>>> print(json_dumper.dumps(code_system))
Expand Down Expand Up @@ -205,6 +210,7 @@ def _convert_graph(
predicate_period_replacement: bool = False,
) -> CodeSystem:
target.id = source.id
target.version = source.meta.version
edges_by_subject = index_graph_edges_by_subject(source)
logging.info(f"Converting graph to obo: {source.id}, nodes={len(source.nodes)}")
self.predicates_to_export = set()
Expand Down Expand Up @@ -286,3 +292,71 @@ def _convert_meta(self, source: Node, concept: Concept):
value=synonym.val,
)
)


@dataclass
class OboGraphToFhirNpmConverter(OboGraphToFhirJsonConverter):
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
"""Converts an OBO Graph to a FHIR NPM package.

Plays the same role as OboGraphToFhirJsonConverter, but also packages the outpus.
"""

def dump(
Copy link
Contributor Author

@joeflack4 joeflack4 Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OboGraphToFhirNpmConverter structure and pattern differences (@cmungall review)

Details

@cmungall Basically the review I want here is that the nature of the FHIR output is now starting to deviate from what is standard in OAK. It's not just 1 file in, 1 file out for this. FHIR will now output a number of files, and if they're using the fhirnpm format, it'll be 1 zip.


@cmungall I followed your suggestion and made a fhirnpm output_type. Given the architecture of OAK, I think a OboGraphToFhirNpmConverter is expected. And it seemed to make sense to subclass it from OboGraphToFhirJsonConverter.

Non-standard dump() and alternative idea

All it needs to do is override dump() and do some extra things. But this might violate your intentions of standardization around the dump() methods in these classes. Is it OK how I've done this? Or do you want me to create some kind of intermediate method, say, package() which comes after the inherited OboGraphToFhirJsonConverter.convert() and then returns back to dump()? If so, perhaps package() could do everything up to creating the actual zip file.

Required dump() params

Some other non-standard changes that I've made to OboGraphToFhirNpmConverter.dump() given its special case:

  • target is now a required param, and in this case represents the output directory. I could change this to be the output path of the zip file. However, it seems that it must be required, because the other methods print the output to the terminal, and I imagine we don't want to do that with a zip file.
  • manifest_path a required parameter. If you like my idea of creating a package() method, I could pass this down and require it there instead.
  • kwargs["code_system_id"]: I didn't make this a required param (yet), but it is technically required.
  • kwargs["code_system_url"]: I didn't make this a required param (yet), but it is technically required.

PyCharm signature method warning

I have a warning:

Signature of method 'OboGraphToFhirNpmConverter.dump()' does not match signature of the base method in class 'OboGraphToFhirJsonConverter'

It's unhappy that I made target a required param by removing = None. But I don't see why this is invalid / bad. The code seems to run fine.

I can't seem to put a JetBrains noinspection comment to ignore the warning, either.

This comment was marked as duplicate.

self,
source: GraphDocument,
target: str,
manifest_path: str,
**kwargs,
) -> str:
"""
Dump an OBO Graph Document to a FHIR CodeSystem.

:param source: Source serialization.
:param target: Target directory to save the output.
:param manifest_path: Path to a manifest JSON. Required fields:'name', 'version', 'description', and 'author'.
See: https://confluence.hl7.org/display/FHIR/NPM+Package+Specification
:param kwargs: Additional keyword arguments passed to :ref:`convert`.
"""
cs = self.convert(
source,
**kwargs,
)
cs_filename = "CodeSystem-" + kwargs["code_system_id"] + ".json"

outpath = os.path.join(target, cs_filename.replace(".json", ".tgz"))

# Create directory structure
temp_dir = tempfile.mkdtemp()
package_dir = os.path.join(temp_dir, "package")
os.mkdir(package_dir)

# Save FHIR resources
cs_str = json_dumper.dumps(cs, inject_type=False)
with open(os.path.join(package_dir, cs_filename), "w", encoding="UTF-8") as f:
f.write(cs_str)

# Save manifest package.json
shutil.copyfile(manifest_path, os.path.join(package_dir, "package.json"))

# Create and save .index.json
package_index = {
"index-version": 1,
"files": [
{
"filename": cs_filename,
"resourceType": "CodeSystem",
"id": kwargs["code_system_id"],
"url": kwargs["code_system_url"],
"version": cs.version,
},
],
}
with open(os.path.join(package_dir, ".index.json"), "w", encoding="UTF-8") as f:
json.dump(package_index, f)

# Save zipfile and remove temp dir
with tarfile.open(outpath, "w:gz") as tar:
tar.add(package_dir, arcname="package")
shutil.rmtree(temp_dir)

return outpath
8 changes: 6 additions & 2 deletions src/oaklib/interfaces/dumper_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@
from linkml_runtime.dumpers import json_dumper

from oaklib.converters.obo_graph_to_cx_converter import OboGraphToCXConverter
from oaklib.converters.obo_graph_to_fhir_converter import OboGraphToFHIRConverter
from oaklib.converters.obo_graph_to_fhir_converter import (
OboGraphToFhirJsonConverter,
OboGraphToFhirNpmConverter,
)
from oaklib.converters.obo_graph_to_obo_format_converter import (
OboGraphToOboFormatConverter,
)
Expand All @@ -18,7 +21,8 @@

OBOGRAPH_CONVERTERS = {
"obo": OboGraphToOboFormatConverter,
"fhirjson": OboGraphToFHIRConverter,
"fhirjson": OboGraphToFhirJsonConverter,
"fhirnpm": OboGraphToFhirNpmConverter,
"owl": OboGraphToRdfOwlConverter,
"turtle": OboGraphToRdfOwlConverter,
"rdf": OboGraphToRdfOwlConverter,
Expand Down
27 changes: 24 additions & 3 deletions src/oaklib/io/streaming_fhir_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@

from linkml_runtime.dumpers import json_dumper

from oaklib.converters.obo_graph_to_fhir_converter import OboGraphToFHIRConverter
from oaklib.converters.obo_graph_to_fhir_converter import OboGraphToFhirJsonConverter
from oaklib.datamodels.obograph import GraphDocument
from oaklib.interfaces.obograph_interface import OboGraphInterface
from oaklib.io.streaming_writer import StreamingWriter
from oaklib.types import CURIE


@dataclass
class StreamingFHIRWriter(StreamingWriter):
class StreamingFhirJsonWriter(StreamingWriter):
"""
A writer that emits FHIR CodeSystem objects or Concept objects
"""
Expand All @@ -24,10 +24,31 @@ def emit_multiple(self, entities: Iterable[CURIE], **kwargs):
g = oi.extract_graph(list(entities), include_metadata=True)
gd = GraphDocument(graphs=[g])
logging.info(f"Converting {len(g.nodes)} nodes to OBO")
converter = OboGraphToFHIRConverter()
converter = OboGraphToFhirJsonConverter()
converter.curie_converter = oi.converter
code_system = converter.convert(gd)
logging.info(f"Writing {len(code_system.concept)} Concepts")
# TODO: Should not this call OboGraphToFhirJsonConverter.dump()?
self.file.write(json_dumper.dumps(code_system))
Copy link
Contributor Author

@joeflack4 joeflack4 Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingFhirJsonWriter .dumps() bug? @cmungall

Details

This isn't code I wrote. I'm on not even sure how StreamingFhirJsonWriter is used, but I noticed it dumps like this:
self.file.write(json_dumper.dumps(code_system))

But I would expect it to do something like this:
OboGraphToFhirJsonConverter.dump(...)

Should I create an issue?
I could maybe try to evaluate it as well.

This comment was marked as duplicate.

else:
super().emit_multiple(entities, **kwargs)


# TODO:
@dataclass
class StreamingFhirNpmWriter(StreamingWriter):
Copy link
Contributor Author

@joeflack4 joeflack4 Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingFhirNpmWriter (@cmungall question)

Details

@cmungall I didn't even notice these "streaming writers" existed until a short time ago. Do you need me to implement StreamingFhirNpmWriter? Does it make sense to?

edit: If so, I don't have a lot of time so it would be better for me to do in multiple PRs and track progress in an issue.

"""
A writer that emits FHIR CodeSystem objects or Concept objects
"""

def emit_multiple(self, entities: Iterable[CURIE], **kwargs):
oi = self.ontology_interface
if isinstance(oi, OboGraphInterface):
logging.info("Extracting graph")
g = oi.extract_graph(list(entities), include_metadata=True)
gd = GraphDocument(graphs=[g])
logging.info(f"Converting {len(g.nodes)} nodes to OBO")
converter = None
print(gd, converter)
else:
super().emit_multiple(entities, **kwargs)
24 changes: 24 additions & 0 deletions tests/input/fhir_npm_manifest_so.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
"name": "sequence-ontology",
"version": "0.1.0",
"canonical": "http://purl.obolibrary.org/obo/so.owl",
"title": "Sequence Ontology",
"description": "The Sequence Ontology is a set of terms and relationships used to describe the features and attributes of biological sequence.",
"homepage": "http://www.sequenceontology.org/",
"keywords": [
"SO",
"Sequence Ontology"
],
"author": "TIMS",
"maintainers": [
{
"name": "Joe Flack",
"email": "[email protected]"
},
{
"name": "Shahim Essaid",
"email": "[email protected]"
}
],
"license": "MIT"
}
Loading
Loading