Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fds 1797 input graph api #1481

Open
wants to merge 125 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
2527224
WIP: add export_as_graph flag to convert CLI command.
afwillia Mar 25, 2024
70bfc19
Add convert CLI test for exporting graph as a pickle.
afwillia Mar 25, 2024
5b41aa5
Add export_as_graph to help.py and update the CLI click docs.
afwillia Mar 25, 2024
76a5567
Change export_as_graph to output_format and specify graph, jsonld, or…
afwillia Mar 25, 2024
670b5c7
Add tests for CLI convert output_type options.
afwillia Mar 25, 2024
a87d2fe
Change text of CLI convert output_type help.
afwillia Mar 25, 2024
07dbd70
Test that jsonld is created correctly by convert CLI
afwillia Mar 25, 2024
421c59a
Add type hints to convert, clean up formatting issues from sonarcloud…
afwillia Mar 26, 2024
2f5abf1
Add convert CLI test for not specifying --output_type
afwillia Mar 26, 2024
119cf40
Merge branch 'Sage-Bionetworks:develop' into FDS-1796-schema-convert-…
afwillia Apr 2, 2024
c6c5cb8
parameterize the convert CLI test.
afwillia Apr 3, 2024
dd8b2cd
remove unused test code for the convert CLI.
afwillia Apr 3, 2024
6ec81db
run black on the updated files.
afwillia Apr 3, 2024
bbe6c98
Use logger.info and logger.error instead of click.echo.
afwillia Apr 3, 2024
b99d9eb
Trim pickle from end of filename, in addition to csv and jsonld.
afwillia Apr 9, 2024
1189107
add output_path alias for output_jsonld
afwillia Apr 9, 2024
86f7af8
Add combination of arguments to convert CLI tests
afwillia Apr 10, 2024
d3c3769
add export_graph to schema_utils.py to dump a pickle file
afwillia Apr 10, 2024
215cc86
use export_graph util to save pickle file instead of handling that lo…
afwillia Apr 10, 2024
242c885
add docstrings to graph_export
afwillia Apr 10, 2024
a1cfcce
Update tests to run different combinations of output_jsonld and outpu…
afwillia Apr 10, 2024
1c67b4a
Add data_model_graph_pickle to generator
afwillia Apr 11, 2024
a986b39
Add ata_model_graph_pickle to metadata.py
afwillia Apr 11, 2024
0390b69
Add data_model_graph to attributes_explorer
afwillia Apr 11, 2024
37bbf74
Add data_model_graph_pickle to tangled_tree
afwillia Apr 11, 2024
441b865
Add convert CLI test case where output_jsonld is pickle and output_pa…
afwillia Apr 18, 2024
fe4fffb
move tests into one function
afwillia Apr 18, 2024
8c64981
Merge branch 'develop' into FDS-1796-schema-convert-export-pickle
afwillia Apr 19, 2024
f20f88b
Run black on commands.py and schema_utils.py
afwillia Apr 19, 2024
d56c965
fix single-letter variable names for pylint error
afwillia Apr 19, 2024
4596560
Merge branch 'develop' into FDS-1843-use-graph
afwillia May 7, 2024
b5dd8e4
Add data_model_graph argument to ManifestGenerator.create_manifests
afwillia May 8, 2024
c8da900
use pickle.load to read pickle file
afwillia May 8, 2024
601b5d0
Add data model graph pickle to attributes_explorer
afwillia May 8, 2024
c6d01de
Add data model graph pickle to tangled_tree
afwillia May 8, 2024
1f57cf1
return 0 to resolve a pylint error.
afwillia Jun 5, 2024
99e860b
Merge branch 'develop' into FDS-1796-schema-convert-export-pickle
afwillia Jun 5, 2024
d89dc10
remove extra function def from merge
afwillia Jun 5, 2024
c9786dc
run black on utils/schema_utils.py
afwillia Jun 5, 2024
38b0690
Return 0 for all convert, but echo the filepath created
afwillia Jun 5, 2024
77979c7
Merge branch 'develop' into FDS-1843-use-graph
afwillia Jun 5, 2024
9907b4e
Fix mix-up between parsed data model and graph data model
afwillia Jun 6, 2024
481ebd3
Add data model pickle parsing to tanlged_tree
afwillia Jun 6, 2024
f63231a
Add data_model_pickle file to tests
afwillia Jun 6, 2024
f3fdc41
Add data model pickle file
afwillia Jun 6, 2024
032771f
run black
afwillia Jun 6, 2024
bf0e9ba
fix a couple sonarcloud issues with the graph_data_model variable
afwillia Jun 6, 2024
bc59218
Add data_model_graph_pickle to metadata.py and tests
afwillia Jun 6, 2024
5973a75
run black on generator.py
afwillia Jun 6, 2024
bf1a449
remove type check from networkx
afwillia Jun 6, 2024
69bb182
fix pylint issues
afwillia Jun 6, 2024
d93aa9d
remove pickle from metadata test because it wasn't created with data_…
afwillia Jun 7, 2024
aef4f62
create two metadata_model objects with different data_model_label gra…
afwillia Jun 7, 2024
08b2ae5
add display_label graph pickle
afwillia Jun 7, 2024
a088d63
Merge branch 'develop' into FDS-1843-use-graph
afwillia Jun 26, 2024
68672c3
Add multiple combinations of jsonld and pickle to test_viz attribute_…
afwillia Jun 28, 2024
de07c55
Fix missing variables in tangled_tree when jsonld and pickle are supp…
afwillia Jun 28, 2024
77f5ec4
Add combinations of jsonld and pickle to tangled_tree tests
afwillia Jun 28, 2024
5934ed5
add graph_data_model argument description
afwillia Jun 28, 2024
51313bd
add data_model_graph_pickle description to MetadataModel
afwillia Jun 28, 2024
e4e1368
run black on test_viz.py
afwillia Aug 26, 2024
9a78a18
Update schematic/manifest/generator.py
afwillia Aug 26, 2024
05b4fe0
Add read_pickle function to read a binary pickle file
afwillia Aug 26, 2024
0964501
Use read_pickle instead of importing pickle and opening file
afwillia Aug 26, 2024
697cc3e
Add note for pickle files that don't fit in memory. Not sure if this …
afwillia Aug 26, 2024
774dc91
Use read_pickle to load the pickle file in metadata.py
afwillia Aug 26, 2024
bb7c450
read_pickle instead of import pickle in attributes_explorer
afwillia Aug 26, 2024
8f5bc1a
set self.graph_data_model and update it if data_model_grapher is not …
afwillia Aug 26, 2024
35c13dd
use if ... is not None instead of if not ...
afwillia Aug 26, 2024
f8c14d4
use if ... is not None instead of if not ...
afwillia Aug 26, 2024
13c9ffc
add an input to read_pickle
afwillia Aug 26, 2024
9205d6d
run black on io_utils.py
afwillia Aug 27, 2024
172e1b1
use read_pickle to read pickle file. Set default data_model_labels
afwillia Aug 27, 2024
d4d3a30
set default data_model_labels
afwillia Aug 27, 2024
730b227
alter logic for setting graph_data_model from pickle or DataModelGrap…
afwillia Aug 27, 2024
795e263
fix pickle reading logic in tangled tree
afwillia Aug 27, 2024
0ddb552
fix logic checking for None parameters in attributes_explorer.py
afwillia Aug 27, 2024
537af6f
import "import pickle" should be placed before "from schematic import…
afwillia Aug 27, 2024
dcb2f65
Use literal for output_type parameter and remove unnecessary try/exce…
afwillia Aug 27, 2024
bb4ee3a
use logger.exception instead of logger.error
afwillia Aug 27, 2024
5504586
move return statement inside the output_type check block
afwillia Aug 27, 2024
b88ee86
run black
afwillia Aug 27, 2024
2bf450b
import Literal from typing
afwillia Aug 27, 2024
7adaf44
add tests for export_graph
afwillia Aug 27, 2024
c2a06f6
run black
afwillia Aug 27, 2024
1e5bd7d
add graph_url spec to manifest/generate endpoint
afwillia Aug 27, 2024
c849f92
add function to download temp pickle file from url. Add data_model_pi…
afwillia Aug 27, 2024
8041b07
use data_model_graph_pickle argument in manifest/generate endpoint
afwillia Aug 27, 2024
0870fa1
download temp pickle in manifest/generate endpoint
afwillia Aug 27, 2024
6802753
check that files are created in schema convert
afwillia Aug 27, 2024
23b7e1d
update docstring and use if ... is not None instead of if not ...
afwillia Aug 27, 2024
fa5e308
use if ... is None instead of if ... is not None
afwillia Aug 27, 2024
3dd6659
Merge branch 'FDS-1796-schema-convert-export-pickle' into FDS-1797-in…
afwillia Aug 28, 2024
f28a0d7
Merge branch 'FDS-1843-use-graph' into FDS-1797-input-graph-api
afwillia Aug 28, 2024
afbe448
add graph_url to validate and submission endpoints
afwillia Aug 28, 2024
e4d0b7c
add graph_url to validate and submission endpoints
afwillia Aug 28, 2024
a7d7366
add pickled htan data model for testing
afwillia Aug 28, 2024
ee9cb9f
remove data model graph parameter from validation and submission. It'…
afwillia Aug 28, 2024
cec7aa6
remove graph_url from validation and submission endpoints
afwillia Aug 28, 2024
92b66fd
Merge branch 'develop' into FDS-1797-input-graph-api
afwillia Aug 28, 2024
b52bfdc
run black on routes.py
afwillia Aug 28, 2024
8a64c39
set graph_data_model to None before looking for graph_url
afwillia Aug 28, 2024
5bd3fc0
add DataModelParser back to tangled_tree.py
afwillia Aug 28, 2024
9c2edde
change logger.info to click.echo
afwillia Sep 10, 2024
58c1429
Check if pickle file contains a networkx MultiDiGraph object
afwillia Sep 10, 2024
487e31d
explain options to schema convert output_type
afwillia Sep 10, 2024
e13f693
update data_model_graph_pickle docstring for ManifestGenerator
afwillia Sep 10, 2024
3d5da40
don't return 0 at the end of schema convert
afwillia Sep 10, 2024
a331725
raise a valueError instead of logger.error for passing a bad filepath…
afwillia Sep 10, 2024
3ed1177
return 0 at the end of schema convert to avoid error 'error: Return v…
afwillia Sep 10, 2024
b6728d3
Raise an actual exception and valueError for bad filepath to schema c…
afwillia Sep 10, 2024
b7460ea
add more information to schema convert when saving the graph as a pic…
afwillia Sep 10, 2024
53e639b
add argument and return types to get_temp_pickle
afwillia Sep 10, 2024
498851b
run the black code formatter
afwillia Sep 10, 2024
58aef40
turn a single line that pylint states is too long into two shorter li…
afwillia Sep 10, 2024
03f7497
swap the f in logger.echo to avoid 'Using an f-string that does not h…
afwillia Sep 10, 2024
bc64305
Merge branch 'develop' into FDS-1797-input-graph-api
afwillia Sep 10, 2024
2a7ea55
message where graph is saved in schema convert cli
afwillia Sep 10, 2024
275fbae
check for FileNotFound error instead of generic exception
afwillia Sep 10, 2024
89cd4ce
change test variables to camel case
afwillia Sep 10, 2024
0f4091a
Merge branch 'FDS-1797-input-graph-api' of https://github.com/Sage-Bi…
afwillia Sep 10, 2024
3197abf
document schema convert tests
afwillia Sep 10, 2024
abd1630
turn get_temp_ functions into one single function
afwillia Sep 10, 2024
445aad5
remove unhelpful comments and whitespace
afwillia Sep 10, 2024
0c4fad7
clarify test cases for attributes explorer
afwillia Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions schematic/help.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,13 @@
"output_jsonld": (
"Path to where the generated JSON-LD file needs to be outputted."
),
"output_type": (
"Output format to export the schema. "
"Options are 'jsonld', 'graph', 'all'. Default is 'jsonld'."
"'jsonld' will output the schema as a JSON-LD file. "
"'graph' will output an nx graph object of the schema as a pickle file."
"'all' will output both the JSON-LD file and the graph object."
),
"data_model_labels": DATA_MODEL_LABELS_HELP,
}
}
Expand Down
29 changes: 22 additions & 7 deletions schematic/manifest/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@
DisplayLabelType,
extract_component_validation_rules,
)
from schematic.utils.df_utils import update_df, load_df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already loaded on line 25

from schematic.utils.io_utils import read_pickle
from schematic.utils.validate_utils import rule_in_rule_list

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -1660,11 +1662,15 @@ def create_manifests(
title: Optional[str] = None,
strict: Optional[bool] = True,
use_annotations: Optional[bool] = False,
graph_data_model: Optional[nx.MultiDiGraph] = None,
data_model_graph_pickle: Optional[str] = None,
) -> Union[List[str], List[pd.DataFrame]]:
"""Create multiple manifests

Args:
path_to_data_model (str): str path to data model
data_model_graph_pickle (str, optional): path to pickled networkx MultiDiGraph object. Defaults to None.
graph_data_model (str, optional): An networkx MultiDiGraph object. Defaults to None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the type parameter above

data_types (list): a list of data types
access_token (str, optional): synapse access token. Required when getting an existing manifest. Defaults to None.
dataset_ids (list, optional): a list of dataset ids when generating an existing manifest. Defaults to None.
Expand Down Expand Up @@ -1722,16 +1728,25 @@ def create_manifests(
"Please check your submission and try again."
)

data_model_parser = DataModelParser(path_to_data_model=path_to_data_model)
if graph_data_model is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the logic from 1685 to 1703 can be further improved. Essentially you want to cover the following cases:

  1. if graph data model is provided, then use the graph data model
  2. if the pickle file is provided, get the graph from the pickle file and use that
  3. if none of these two are provided, parse data model to get the graph.

so instead of starting with if graph_data_mode is None, it will be easier if you start with if graph_data_model

and also, please consider wrapping this part in its own function... that way testing can be easier

if data_model_graph_pickle:
"""What if pickle file does not fit in memory?"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment meant to be addressed in this PR? If not, can a new ticket be created so that we don't lose track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, also this is formatted like a DOC string with triple quotes instead of starting with a hash like a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally things like this or # TODO are fine if they are just temporary, but should be removed before submitting a PR. At thet point they should be fixed, or have a Jira issue created as Lingling said.

graph_data_model = read_pickle(data_model_graph_pickle)
else:
data_model_parser = DataModelParser(
path_to_data_model=path_to_data_model
)

# Parse Model
parsed_data_model = data_model_parser.parse_model()
# Parse Model
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(
parsed_data_model, data_model_labels
)

# Generate graph
graph_data_model = data_model_grapher.graph
# Generate graph
graph_data_model = data_model_grapher.graph

# Gather all returned result urls
all_results = []
Expand Down
26 changes: 18 additions & 8 deletions schematic/models/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
# we shouldn't need to expose Synapse functionality explicitly
from schematic.store.synapse import SynapseStorage
from schematic.utils.df_utils import load_df
from schematic.utils.io_utils import read_pickle

logger = logging.getLogger(__name__)

Expand All @@ -41,12 +42,14 @@ def __init__(
inputMModelLocation: str,
inputMModelLocationType: str,
data_model_labels: str,
data_model_graph_pickle: Optional[str] = None,
) -> None:
"""Instantiates a MetadataModel object.

Args:
inputMModelLocation: local path, uri, synapse entity id (e.g. gs://, syn123, /User/x/…); present location
inputMModelLocationType: specifier to indicate where the metadata model resource can be found (e.g. 'local' if file/JSON-LD is on local machine)
data_model_graph_pickle: filepath to a data model graph stored as pickle file.
"""
# extract extension of 'inputMModelLocation'
# ensure that it is necessarily pointing to a '.jsonld' file
Expand All @@ -59,17 +62,24 @@ def __init__(
self.inputMModelLocation = inputMModelLocation
self.path_to_json_ld = inputMModelLocation

data_model_parser = DataModelParser(path_to_data_model=self.inputMModelLocation)
# Parse Model
parsed_data_model = data_model_parser.parse_model()
# Use graph, if provided. Otherwise parse data model for graph.
if data_model_graph_pickle:
self.graph_data_model = read_pickle(data_model_graph_pickle)
self.dmge = DataModelGraphExplorer(self.graph_data_model)
else:
data_model_parser = DataModelParser(
path_to_data_model=self.inputMModelLocation
)
# Parse Model
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph

self.dmge = DataModelGraphExplorer(self.graph_data_model)
self.dmge = DataModelGraphExplorer(self.graph_data_model)

# check if the type of MModel file is "local"
# currently, the application only supports reading from local JSON-LD files
Expand Down
75 changes: 48 additions & 27 deletions schematic/schemas/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import logging
import time
import re
from typing import get_args, Optional, Any
from typing import get_args, Optional, Any, Literal

import click
import click_log # type: ignore
Expand All @@ -17,7 +17,7 @@

from schematic.utils.schema_utils import DisplayLabelType
from schematic.utils.cli_utils import query_dict
from schematic.utils.schema_utils import export_schema
from schematic.utils.schema_utils import export_schema, export_graph
from schematic.help import schema_commands

logger = logging.getLogger("schematic")
Expand Down Expand Up @@ -59,9 +59,21 @@ def schema() -> None: # use as `schematic model ...`
metavar="<OUTPUT_PATH>",
help=query_dict(schema_commands, ("schema", "convert", "output_jsonld")),
)
@click.option("--output_path", help="Alias for --output_jsonld")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you copied this from output_jsonld but forgot to change?

@click.option(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CLI part still needs some more work and design thinking. When I am reading this, I think --output_jsonld, output_type, and output_path are very confusing to the users.

Can we simplify this by combining --output_jsonld and --output_path into a single --output_path option or just --output?

I am seeing:

    if output_path:
        output_jsonld = output_path

what is the purpose of adding output_path parameter if it is just using the value from output_jsonld?

"--output_type",
"-ot",
type=click.Choice(["jsonld", "graph", "all"], case_sensitive=False),
default="jsonld",
help=query_dict(schema_commands, ("schema", "convert", "output_type")),
)
def convert(
schema: Any, data_model_labels: DisplayLabelType, output_jsonld: Optional[str]
) -> None:
schema: Any,
data_model_labels: DisplayLabelType,
output_jsonld: Optional[str],
output_type: Optional[Literal["jsonld", "graph", "all"]],
output_path: Optional[str],
) -> int:
"""
Running CLI to convert data model specification in CSV format to
data model in JSON-LD format.
Expand All @@ -80,19 +92,19 @@ def convert(
data_model_parser = DataModelParser(schema)

# Parse Model
logger.info("Parsing data model.")
click.echo("Parsing data model.")
parsed_data_model = data_model_parser.parse_model()

# Convert parsed model to graph
# Instantiate DataModelGraph
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graphschema
logger.info("Generating data model graph.")
click.echo("Generating data model graph.")
graph_data_model = data_model_grapher.graph

# Validate generated data model.
logger.info("Validating the data model internally.")
click.echo("Validating the data model internally.")
data_model_validator = DataModelValidator(graph=graph_data_model)
data_model_errors, data_model_warnings = data_model_validator.run_checks()

Expand All @@ -114,40 +126,49 @@ def convert(
for warning in war:
logger.warning(warning)

logger.info("Converting data model to JSON-LD")
if output_path:
Copy link
Collaborator

@linglp linglp Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many if else are added here to handle different cases of output_path and output_jsonld.. it is hard for the user to know which one take precedence. It is better in my opinion to simplify the parameter and clean up the logic here.

output_jsonld = output_path

if output_jsonld is None:
output_file_no_ext = re.sub("[.](jsonld|csv|pickle)$", "", schema)
else:
output_file_no_ext = re.sub("[.](jsonld|csv|pickle)$", "", output_jsonld)

click.echo(
"By default, the JSON-LD output will be stored alongside the first "
f"input CSV or JSON-LD file. In this case, it will appear here: '{output_jsonld}'. "
"You can use the `--output_jsonld` argument to specify another file path."
)

if output_type in ["graph", "all"]:
output_graph = output_file_no_ext + ".pickle"
click.echo(f"Saving data model graph to '{output_graph}'.")
export_graph(graph_data_model, output_graph)
if output_type == "graph":
return 0
afwillia marked this conversation as resolved.
Show resolved Hide resolved

click.echo("Converting data model to JSON-LD")
jsonld_data_model = convert_graph_to_jsonld(graph=graph_data_model)

# output JSON-LD file alongside CSV file by default, get path.
if output_jsonld is None:
if not ".jsonld" in schema:
csv_no_ext = re.sub("[.]csv$", "", schema)
output_jsonld = csv_no_ext + ".jsonld"
else:
output_jsonld = schema

logger.info(
"By default, the JSON-LD output will be stored alongside the first "
f"input CSV or JSON-LD file. In this case, it will appear here: '{output_jsonld}'. "
"You can use the `--output_jsonld` argument to specify another file path."
)
output_jsonld = output_file_no_ext + ".jsonld"

# saving updated schema.org schema
try:
export_schema(jsonld_data_model, output_jsonld)
click.echo(
f"The Data Model was created and saved to '{output_jsonld}' location."
)
except: # pylint: disable=bare-except
click.echo(
(
f"The Data Model could not be created by using '{output_jsonld}' location. "
"Please check your file path again"
)
)
except Exception as exc:
raise ValueError(
f"The Data Model could not be created by using '{output_jsonld}' location. "
"Please check your file path again"
) from exc

# get the end time
end_time = time.time()

# get the execution time
elapsed_time = time.strftime("%M:%S", time.gmtime(end_time - start_time))
click.echo(f"Execution time: {elapsed_time} (M:S)")
return 0
linglp marked this conversation as resolved.
Show resolved Hide resolved
8 changes: 8 additions & 0 deletions schematic/utils/io_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from typing import Any
import json
import urllib.request
import pickle
from schematic import LOADER


Expand Down Expand Up @@ -40,3 +41,10 @@ def load_schemaorg() -> Any:
data_path = "data_models/schema_org.model.jsonld"
schema_org_path = LOADER.filename(data_path)
return load_json(schema_org_path)


def read_pickle(file_path: str) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think the read_pickle function assumes that the pickle file can be correctly loaded. But what if it doesn't? Could you raise a meaningful error message here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can test for read_pickle also be added?

"""Read pickle file"""
with open(file_path, "rb") as fle:
data = pickle.load(fle)
return data
21 changes: 21 additions & 0 deletions schematic/utils/schema_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import os
import string
from typing import Literal, Union, Optional
import pickle

import inflection

Expand Down Expand Up @@ -500,3 +501,23 @@ def get_json_schema_log_file_path(data_model_path: str, source_node: str) -> str
prefix = prefix_root
json_schema_log_file_path = f"{prefix}.{source_node}.schema.json"
return json_schema_log_file_path


def export_graph(schema: dict, file_path: str) -> None:
"""Write object to a pickle file.
Args:
schema, dict: A data model graph to export
file_path, str: File to create
"""
try:
with open(file_path, "wb") as file:
pickle.dump(schema, file)
logger.info(
"The data model graph was created and saved "
f"to a pickle file located at: '{file_path}'."
)
except SystemExit as error:
logger.exception(
afwillia marked this conversation as resolved.
Show resolved Hide resolved
f"The graph failed to save to '{file_path}'. Please check your file path again."
)
afwillia marked this conversation as resolved.
Show resolved Hide resolved
raise error
31 changes: 21 additions & 10 deletions schematic/visualization/attributes_explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@
from typing import Optional, no_type_check
import numpy as np
import pandas as pd
import networkx as nx # type: ignore

from schematic.schemas.data_model_parser import DataModelParser
from schematic.schemas.data_model_graph import DataModelGraph, DataModelGraphExplorer
from schematic.schemas.data_model_json_schema import DataModelJSONSchema
from schematic.utils.schema_utils import DisplayLabelType
from schematic.utils.io_utils import load_json
from schematic.utils.io_utils import load_json, read_pickle

logger = logging.getLogger(__name__)

Expand All @@ -22,34 +23,44 @@ class AttributesExplorer:
def __init__(
self,
path_to_jsonld: str,
data_model_labels: DisplayLabelType,
data_model_labels: DisplayLabelType = "class_label",
data_model_grapher: Optional[DataModelGraph] = None,
data_model_graph_explorer: Optional[DataModelGraphExplorer] = None,
parsed_data_model: Optional[dict] = None,
graph_data_model: Optional[nx.MultiDiGraph] = None,
data_model_graph_pickle: Optional[str] = None,
) -> None:
self.path_to_jsonld = path_to_jsonld

self.jsonld = load_json(self.path_to_jsonld)
if graph_data_model is not None:
self.graph_data_model = graph_data_model
elif data_model_graph_pickle is not None:
data_model_graph = read_pickle(data_model_graph_pickle)
if not isinstance(data_model_graph, nx.MultiDiGraph):
raise ValueError(
"The data model graph must be a networkx MultiDiGraph object."
)
self.graph_data_model = data_model_graph

# Parse Model
if not parsed_data_model:
if parsed_data_model is None:
data_model_parser = DataModelParser(
path_to_data_model=self.path_to_jsonld,
)
parsed_data_model = data_model_parser.parse_model()

# Instantiate DataModelGraph
if not data_model_grapher:
if data_model_grapher is None:
data_model_grapher = DataModelGraph(parsed_data_model, data_model_labels)

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will self.graph_data_model always be assigned a value? The logic here is kind of confusion, and it's nto clear it always will be.


# Instantiate Data Model Graph Explorer
if not data_model_graph_explorer:
self.dmge = DataModelGraphExplorer(self.graph_data_model)
else:
if data_model_graph_explorer is not None:
self.dmge = data_model_graph_explorer
else:
self.dmge = DataModelGraphExplorer(self.graph_data_model)

# Instantiate Data Model Json Schema
self.data_model_js = DataModelJSONSchema(
Expand Down
Loading
Loading