Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fds 1797 input graph api #1481

Open
wants to merge 125 commits into
base: develop
Choose a base branch
from
Open

Fds 1797 input graph api #1481

wants to merge 125 commits into from

Conversation

afwillia
Copy link
Contributor

@afwillia afwillia commented Aug 28, 2024

This PR adds a graph_url parameter to the manifest/generate, validate, and submit API endpoints. graph_url is a URL to a pickled data model graph. Supplying it should make the request run faster. It depends on PR1425 and PR1396 which I have already merged into this branch to facilitate development.

graph_url will be added to other endpoints that also accept schema_url, but the three endpoints above are the most relevant to DCA and the current improvement sprint.

Linking #1425 , #1396

…th is pickle. Also test both parameters are empty.
@@ -40,3 +41,10 @@ def load_schemaorg() -> Any:
data_path = "data_models/schema_org.model.jsonld"
schema_org_path = LOADER.filename(data_path)
return load_json(schema_org_path)


def read_pickle(file_path: str) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think the read_pickle function assumes that the pickle file can be correctly loaded. But what if it doesn't? Could you raise a meaningful error message here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can test for read_pickle also be added?

data_model_parser = DataModelParser(path_to_data_model=path_to_data_model)
if graph_data_model is None:
if data_model_graph_pickle:
"""What if pickle file does not fit in memory?"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment meant to be addressed in this PR? If not, can a new ticket be created so that we don't lose track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, also this is formatted like a DOC string with triple quotes instead of starting with a hash like a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally things like this or # TODO are fine if they are just temporary, but should be removed before submitting a PR. At thet point they should be fixed, or have a Jira issue created as Lingling said.

@@ -59,9 +59,21 @@ def schema() -> None: # use as `schematic model ...`
metavar="<OUTPUT_PATH>",
help=query_dict(schema_commands, ("schema", "convert", "output_jsonld")),
)
@click.option("--output_path", help="Alias for --output_jsonld")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you copied this from output_jsonld but forgot to change?

@@ -59,9 +59,21 @@ def schema() -> None: # use as `schematic model ...`
metavar="<OUTPUT_PATH>",
help=query_dict(schema_commands, ("schema", "convert", "output_jsonld")),
)
@click.option("--output_path", help="Alias for --output_jsonld")
@click.option(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CLI part still needs some more work and design thinking. When I am reading this, I think --output_jsonld, output_type, and output_path are very confusing to the users.

Can we simplify this by combining --output_jsonld and --output_path into a single --output_path option or just --output?

I am seeing:

    if output_path:
        output_jsonld = output_path

what is the purpose of adding output_path parameter if it is just using the value from output_jsonld?

@@ -114,40 +126,50 @@ def convert(
for warning in war:
logger.warning(warning)

logger.info("Converting data model to JSON-LD")
if output_path:
Copy link
Collaborator

@linglp linglp Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so many if else are added here to handle different cases of output_path and output_jsonld.. it is hard for the user to know which one take precedence. It is better in my opinion to simplify the parameter and clean up the logic here.

Comment on lines 61 to 142
def test_schema_convert_cli(self, runner, output_path, output_type):
model = "tests/data/example.model.csv"
label_type = "class_label"
expected = 0

output_path = helpers.get_data_path("example.model.jsonld")
resultOne = runner.invoke(schema, ["convert", model])

label_type = "class_label"
assert resultOne.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

resultTwo = runner.invoke(
schema, ["convert", model, "--output_path", output_path]
)

assert resultTwo.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

resultThree = runner.invoke(
schema, ["convert", model, "--output_type", output_type]
)

assert resultThree.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

resultFour = runner.invoke(
schema,
[
"convert",
model,
"--output_type",
output_type,
"--output_jsonld",
output_path,
],
)

assert resultFour.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

result = runner.invoke(
schema,
[
"convert",
data_model_csv_path,
model,
"--output_jsonld",
output_path,
"--data_model_labels",
label_type,
],
)

assert result.exit_code == 0
assert result.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

expected_substr = (
"The Data Model was created and saved to " f"'{output_path}' location."
resultFive = runner.invoke(
schema,
[
"convert",
model,
"--output_jsonld",
"tests/data/example.model.pickle",
"--output_path",
"tests/data/example.model.pickle",
],
)

assert expected_substr in result.output
assert resultFive.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)

resultSix = runner.invoke(
schema, ["convert", model, "--output_jsonld", "", "--output_path", ""]
)

assert resultSix.exit_code == expected
# check output_path file is created then remove it
assert os.path.exists(output_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 1 to Tom's suggestion. Please split this up so that it is more clear what has been tested.

return metadata_model


def metadata_model_display(helpers, data_model_labels):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is called metadata_model_display.. If you are using data_model_labels, and this could potentially be both "class_label" and "display_label", why is this named "display"?

@@ -34,7 +48,10 @@ class TestMetadataModel:
)
def test_get_component_requirements(self, helpers, as_graph, data_model_labels):
# Instantiate MetadataModel
meta_data_model = metadata_model(helpers, data_model_labels)
if data_model_labels == "class_label":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what logic is being tested here? Could you add comments in the code?

Copy link

sonarcloud bot commented Sep 10, 2024

@@ -32,6 +32,8 @@
DisplayLabelType,
extract_component_validation_rules,
)
from schematic.utils.df_utils import update_df, load_df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already loaded on line 25

) -> Union[List[str], List[pd.DataFrame]]:
"""Create multiple manifests

Args:
path_to_data_model (str): str path to data model
data_model_graph_pickle (str, optional): path to pickled networkx MultiDiGraph object. Defaults to None.
graph_data_model (str, optional): An networkx MultiDiGraph object. Defaults to None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the type parameter above

# Generate graph
self.graph_data_model = data_model_grapher.graph
# Generate graph
self.graph_data_model = data_model_grapher.graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will self.graph_data_model always be assigned a value? The logic here is kind of confusion, and it's nto clear it always will be.

@linglp
Copy link
Collaborator

linglp commented Sep 11, 2024

@afwillia another concern that I have about the PR is that I am seeing a lot of:

        if data_model_graph_pickle:
            self.graph_data_model = read_pickle(data_model_graph_pickle)

and also a lot of classes have both graph_data_model parameter and data_model_graph_pickle parameter. If we always get graph_data_model from the pickle file, then wouldn't it make sense to just keep one parameter?

Here's an example of what I meant: #1499

@linglp
Copy link
Collaborator

linglp commented Sep 11, 2024

@afwillia another point is that if you change parameter create_manifests, I am thinking most likely that you will need to change test_api.py because test_api.py test the process of generating a manifest by hitting the manifest/generate endpoint, and in this case, it needs to test the process of generating a manifest using a pickle file. But I am not seeing any changes totest_api.py. Can changes be added to test_api.py too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants