-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fds 1797 input graph api #1481
base: develop
Are you sure you want to change the base?
Fds 1797 input graph api #1481
Conversation
…gic in the CLI function
…th is pickle. Also test both parameters are empty.
… to schema convert
…alue expected [return-value]'
…nes of satisfactory length
…ave any interpolated variables'
…onetworks/schematic into FDS-1797-input-graph-api
@@ -40,3 +41,10 @@ def load_schemaorg() -> Any: | |||
data_path = "data_models/schema_org.model.jsonld" | |||
schema_org_path = LOADER.filename(data_path) | |||
return load_json(schema_org_path) | |||
|
|||
|
|||
def read_pickle(file_path: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think the read_pickle function assumes that the pickle file can be correctly loaded. But what if it doesn't? Could you raise a meaningful error message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can test for read_pickle also be added?
data_model_parser = DataModelParser(path_to_data_model=path_to_data_model) | ||
if graph_data_model is None: | ||
if data_model_graph_pickle: | ||
"""What if pickle file does not fit in memory?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment meant to be addressed in this PR? If not, can a new ticket be created so that we don't lose track?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, also this is formatted like a DOC string with triple quotes instead of starting with a hash like a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally things like this or # TODO
are fine if they are just temporary, but should be removed before submitting a PR. At thet point they should be fixed, or have a Jira issue created as Lingling said.
@@ -59,9 +59,21 @@ def schema() -> None: # use as `schematic model ...` | |||
metavar="<OUTPUT_PATH>", | |||
help=query_dict(schema_commands, ("schema", "convert", "output_jsonld")), | |||
) | |||
@click.option("--output_path", help="Alias for --output_jsonld") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you copied this from output_jsonld
but forgot to change?
@@ -59,9 +59,21 @@ def schema() -> None: # use as `schematic model ...` | |||
metavar="<OUTPUT_PATH>", | |||
help=query_dict(schema_commands, ("schema", "convert", "output_jsonld")), | |||
) | |||
@click.option("--output_path", help="Alias for --output_jsonld") | |||
@click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the CLI part still needs some more work and design thinking. When I am reading this, I think --output_jsonld
, output_type
, and output_path
are very confusing to the users.
Can we simplify this by combining --output_jsonld and --output_path into a single --output_path
option or just --output
?
I am seeing:
if output_path:
output_jsonld = output_path
what is the purpose of adding output_path parameter if it is just using the value from output_jsonld?
@@ -114,40 +126,50 @@ def convert( | |||
for warning in war: | |||
logger.warning(warning) | |||
|
|||
logger.info("Converting data model to JSON-LD") | |||
if output_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so many if else
are added here to handle different cases of output_path
and output_jsonld
.. it is hard for the user to know which one take precedence. It is better in my opinion to simplify the parameter and clean up the logic here.
tests/test_cli.py
Outdated
def test_schema_convert_cli(self, runner, output_path, output_type): | ||
model = "tests/data/example.model.csv" | ||
label_type = "class_label" | ||
expected = 0 | ||
|
||
output_path = helpers.get_data_path("example.model.jsonld") | ||
resultOne = runner.invoke(schema, ["convert", model]) | ||
|
||
label_type = "class_label" | ||
assert resultOne.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
resultTwo = runner.invoke( | ||
schema, ["convert", model, "--output_path", output_path] | ||
) | ||
|
||
assert resultTwo.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
resultThree = runner.invoke( | ||
schema, ["convert", model, "--output_type", output_type] | ||
) | ||
|
||
assert resultThree.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
resultFour = runner.invoke( | ||
schema, | ||
[ | ||
"convert", | ||
model, | ||
"--output_type", | ||
output_type, | ||
"--output_jsonld", | ||
output_path, | ||
], | ||
) | ||
|
||
assert resultFour.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
result = runner.invoke( | ||
schema, | ||
[ | ||
"convert", | ||
data_model_csv_path, | ||
model, | ||
"--output_jsonld", | ||
output_path, | ||
"--data_model_labels", | ||
label_type, | ||
], | ||
) | ||
|
||
assert result.exit_code == 0 | ||
assert result.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
expected_substr = ( | ||
"The Data Model was created and saved to " f"'{output_path}' location." | ||
resultFive = runner.invoke( | ||
schema, | ||
[ | ||
"convert", | ||
model, | ||
"--output_jsonld", | ||
"tests/data/example.model.pickle", | ||
"--output_path", | ||
"tests/data/example.model.pickle", | ||
], | ||
) | ||
|
||
assert expected_substr in result.output | ||
assert resultFive.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) | ||
|
||
resultSix = runner.invoke( | ||
schema, ["convert", model, "--output_jsonld", "", "--output_path", ""] | ||
) | ||
|
||
assert resultSix.exit_code == expected | ||
# check output_path file is created then remove it | ||
assert os.path.exists(output_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 1 to Tom's suggestion. Please split this up so that it is more clear what has been tested.
return metadata_model | ||
|
||
|
||
def metadata_model_display(helpers, data_model_labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is called metadata_model_display
.. If you are using data_model_labels
, and this could potentially be both "class_label" and "display_label", why is this named "display"?
@@ -34,7 +48,10 @@ class TestMetadataModel: | |||
) | |||
def test_get_component_requirements(self, helpers, as_graph, data_model_labels): | |||
# Instantiate MetadataModel | |||
meta_data_model = metadata_model(helpers, data_model_labels) | |||
if data_model_labels == "class_label": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what logic is being tested here? Could you add comments in the code?
Quality Gate passedIssues Measures |
@@ -32,6 +32,8 @@ | |||
DisplayLabelType, | |||
extract_component_validation_rules, | |||
) | |||
from schematic.utils.df_utils import update_df, load_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already loaded on line 25
) -> Union[List[str], List[pd.DataFrame]]: | ||
"""Create multiple manifests | ||
|
||
Args: | ||
path_to_data_model (str): str path to data model | ||
data_model_graph_pickle (str, optional): path to pickled networkx MultiDiGraph object. Defaults to None. | ||
graph_data_model (str, optional): An networkx MultiDiGraph object. Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the type parameter above
# Generate graph | ||
self.graph_data_model = data_model_grapher.graph | ||
# Generate graph | ||
self.graph_data_model = data_model_grapher.graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will self.graph_data_model always be assigned a value? The logic here is kind of confusion, and it's nto clear it always will be.
@afwillia another concern that I have about the PR is that I am seeing a lot of:
and also a lot of classes have both Here's an example of what I meant: #1499 |
@afwillia another point is that if you change parameter |
This PR adds a
graph_url
parameter to the manifest/generate, validate, and submit API endpoints.graph_url
is a URL to a pickled data model graph. Supplying it should make the request run faster. It depends on PR1425 and PR1396 which I have already merged into this branch to facilitate development.graph_url
will be added to other endpoints that also acceptschema_url
, but the three endpoints above are the most relevant to DCA and the current improvement sprint.Linking #1425 , #1396