feat(JA): Implementation for Job Attachments diff command #465

leongdl · 2024-10-09T18:42:58Z

What was the problem/requirement? (What/Why)

Customers have requested APIs and CLI interfaces to use Job Attachments.
We designed new CLI + APIs:
- manifest snapshot
- manifest diff
- manifest download
- manifest upload
- attachment upload
- attachment download.

What was the solution? (How)

In this PR, we are adding the implementation for manifest diff

What is the impact of this change?

This new CLI + API will allow users to compare a Job Attachment manifest against the latest files in the local file system. Then customers can make a decision to make a new manifest file or not.

How was this change tested?

Manual testing

deadline manifest diff --root ./src --manifest ~/work/manifest/diff-cli-2024-10-08T16-40-40.manifest

deadline manifest diff --root ./src --manifest /Users/leongdl/work/manifest/diff-cli-2024-10-08T16-40-40.manifest --json

Have you run the unit tests?
- Yes, hatch run test passes.
Have you run the integration tests?
- Yes hatch run integ:test passes.

================================================================================ 16 passed, 2 skipped, 4 warnings in 68.45s (0:01:08) ================================================================================

Have you made changes to the download or asset_sync modules? If so, then it is highly recommended
that you ensure that the docker-based unit tests pass.
Not applicable, Diff only does manifest operations.

Was this change documented?

Are relevant docstrings in the code base updated?
- Yes
Has the README.md been updated? If you modified CLI arguments, for instance.
- Readme and examples will be added once the feature stablizes.

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
This PR does not add any new dependencies.

Is this a breaking change?

No.

Does this change impact security?

Yes, this will be security reviewed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

leongdl · 2024-10-10T18:21:19Z

src/deadline/job_attachments/api/manifest.py

@@ -29,6 +36,37 @@
 """


+def _glob_files(


Refactored away to share code.

Could this be in _glob.py?

No - because this function is local only to the CLIs and only used by the CLIs, so I am not moving this to a shared code. It is shared code to process the CLI input.

The shared function or API is _process_glob_inputs below.

leongdl · 2024-10-10T23:50:13Z

test/unit/deadline_client/cli/test_cli_manifest_diff.py

+
+
+@pytest.mark.skip("Random Failure with no credentials on Github")
+class TestSnapshot:


There is something odd with calling CLI commands in unit test. As integration tests they pass but I am not sure why. I'm going to comment this out for now and then once submissions open I may move this over as an integ test.

These look like integ tests to me. I'd imagine cli test tests the wrapper and mock call to the api (such as _manifest_diff).

Yes, somehow there is something odd with them being unit tests. I will move them to be integ tests once the code is merged. I didn't want to merge in some functionality with potentially flaky integ tests. (I'll follow up after this PR)

godobyte · 2024-10-16T00:42:38Z

src/deadline/client/cli/_groups/manifest_group.py

+@click.option(
+    "--force-rehash",
+    default=False,
+    is_flag=True,
+    help="Rehash all files to compare using file hashes.",
+)


I don't see this being added in for snapshot command, do you mind adding it before the next release?

Yes - I will, this PR is specifically for diff. My Next PR will also add it to snapshot for equivalence.

godobyte · 2024-10-16T00:47:20Z

src/deadline/client/cli/_groups/manifest_group.py

+    if not os.path.isfile(manifest):
+        raise NonValidInputError(f"Specified manifest file {manifest} does not exist. ")
+
+    if not os.path.isdir(root):


nit - we should be consistent with option naming root or root-dir for attachment and manifest commands.

I agree! Lets do a quick pass after we merge in to catch these inconsistencies.

Root or root dir?

I'd suggest root. Let's do a pass with stakeholders for the naming, and update them altogether.

We agreed with root (this PR's name) and also will do a pass potentially to cleanup.

godobyte · 2024-10-16T00:49:46Z

src/deadline/client/cli/_groups/manifest_group.py

+        raise NonValidInputError(f"Specified root directory {root} does not exist. ")
+
+    # Perform the diff.
+    differences = _manifest_diff(


nit - no typing for differences.

godobyte · 2024-10-16T00:52:18Z

src/deadline/client/cli/_groups/manifest_group.py

+    if json:
+        logger.json(dataclasses.asdict(differences), indent=4)


nit - json function already does json check?

What do you mean? logger.json takes in a dictionary so it prints out JSON.

The json function in clicklogger already check if json, this line doesn't need to be under the json check again.

Oh! Thats what you meant. Yeah I recognize that in this case, but given there's a line for the pretty print option it was easier to already early escape it here.

The pretty print does alot more un-necessary work so I wanted to avoid running that code for no reason.

godobyte · 2024-10-16T00:53:04Z

src/deadline/client/cli/_groups/manifest_group.py

+    if json:
+        logger.json(dataclasses.asdict(differences), indent=4)
+    else:
+        logger.echo(f"\n{root}")


nit - could we add more description for this log?

godobyte · 2024-10-16T01:19:30Z

src/deadline/job_attachments/_diff.py

+    COLORS = {
+        "MODIFIED": "\033[93m",  # yellow
+        "NEW": "\033[92m",  # green
+        "DELETED": "\033[91m",  # red
+        "UNCHANGED": "\033[90m",  # grey
+        "RESET": "\033[0m",  # base color
+        "DIRECTORY": "\033[80m",  # grey
+    }
+
+    # Tooltips:
+    TOOLTIPS = {
+        FileStatus.NEW: " +",  # added files
+        FileStatus.DELETED: " -",  # deleted files
+        FileStatus.MODIFIED: " M",  # modified files
+        FileStatus.UNCHANGED: "",  # unchanged files
+    }


Could we model file status related configurations better?

Some idea - https://stackoverflow.com/questions/59916345/adding-a-property-to-an-enum

I agree- let me look into this refactor after we complete all the CLIs and name cleanups.

godobyte · 2024-10-16T01:21:19Z

src/deadline/job_attachments/_diff.py

+
+    directory_tree = build_directory_tree(all_files)
+    print_tree(directory_tree)
+    logger.info("")


Is this on purpose?

Yes - this is part of the "pretty print" to print an empty line.

godobyte · 2024-10-16T01:27:03Z

src/deadline/job_attachments/api/manifest.py

+    with open(manifest) as input_file:
+        manifest_data_str = input_file.read()
+        local_manifest_object = decode_manifest(manifest_data_str)


Should we add file check and error handling here?

decode_manifest raises errors, so if this failed, it will just fail the program outright. There's no need to do more error checks redundant.

godobyte · 2024-10-16T01:31:21Z

src/deadline/job_attachments/api/manifest.py

+        # Hash based compare manifests.
+        differences: List[Tuple[FileStatus, BaseManifestPath]] = compare_manifest(
+            reference_manifest=local_manifest_object, compare_manifest=directory_manifest_object
+        )
+        # Map to output datastructure.
+        for item in differences:
+            if item[0] == FileStatus.MODIFIED:
+                output.modified.append(item[1].path)
+            elif item[0] == FileStatus.NEW:
+                output.new.append(item[1].path)
+            elif item[0] == FileStatus.DELETED:
+                output.deleted.append(item[1].path)
+
+    else:
+        # File based comparisons.
+        fast_diff: List[Tuple[str, FileStatus]] = _fast_file_list_to_manifest_diff(
+            root=root, current_files=input_files, diff_manifest=local_manifest_object, logger=logger
+        )
+        for fast_diff_item in fast_diff:
+            if fast_diff_item[1] == FileStatus.MODIFIED:
+                output.modified.append(fast_diff_item[0])
+            elif fast_diff_item[1] == FileStatus.NEW:
+                output.new.append(fast_diff_item[0])
+            elif fast_diff_item[1] == FileStatus.DELETED:
+                output.deleted.append(fast_diff_item[0])


The comparison for both cases look very similar, maybe we could refactor or use helper?

Yes and no, the loop is the same, but the data structure it processes is the different. So I dont' see much refactoring options here.

Maybe something like?

def process_output(status, path, output): if status == FileStatus.MODIFIED: output.modified.append(path) elif status == FileStatus.NEW: output.new.append(path) elif status == FileStatus.DELETED: output.deleted.append(path) else: raise InvalidStatusError()

Sure, I can do that cleanup but I wanted to avoid nested functions :(

godobyte · 2024-10-16T01:33:59Z

test/unit/deadline_client/cli/test_cli_manifest_diff.py

+
+
+@pytest.mark.skip("Random Failure with no credentials on Github")
+class TestSnapshot:


These look like integ tests to me. I'd imagine cli test tests the wrapper and mock call to the api (such as _manifest_diff).

benl-2023 · 2024-10-16T21:18:24Z

src/deadline/job_attachments/_diff.py

+        if return_root_relative_path:
+            return relative_path
+        else:
+            return full_path


nit: could be 1 liner with ternary operator

True :) Very pythonic.

godobyte · 2024-10-17T03:03:09Z

src/hello

Is this file needed?

Signed-off-by: David Leong <[email protected]>

sonarqubecloud · 2024-10-17T04:25:05Z

Quality Gate failed

Failed conditions
3.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

leongdl mentioned this pull request Oct 9, 2024

[WIP] feat(JA CLI): Job Attachment Manifest CLIs. Snapshot, Diff, Download,… #446

Closed

leongdl force-pushed the ja-diff branch 6 times, most recently from d0f2413 to 5671bc9 Compare October 10, 2024 05:35

godobyte self-assigned this Oct 10, 2024

leongdl commented Oct 10, 2024

View reviewed changes

leongdl force-pushed the ja-diff branch 7 times, most recently from c087a00 to b27f11b Compare October 10, 2024 23:11

leongdl commented Oct 10, 2024

View reviewed changes

leongdl force-pushed the ja-diff branch from b27f11b to 643d6fa Compare October 11, 2024 01:55

godobyte reviewed Oct 16, 2024

View reviewed changes

leongdl force-pushed the ja-diff branch from 643d6fa to c310e8f Compare October 16, 2024 21:08

leongdl marked this pull request as ready for review October 16, 2024 21:08

leongdl requested a review from a team as a code owner October 16, 2024 21:08

benl-2023 approved these changes Oct 16, 2024

View reviewed changes

godobyte reviewed Oct 17, 2024

View reviewed changes

src/hello Outdated

Copy link

Contributor

godobyte Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file needed?

godobyte approved these changes Oct 17, 2024

View reviewed changes

leongdl enabled auto-merge (squash) October 17, 2024 04:23

feat(JA): Implementation for Job Attachments diff command

50e9280

Signed-off-by: David Leong <[email protected]>

leongdl force-pushed the ja-diff branch from c310e8f to 50e9280 Compare October 17, 2024 04:24

leongdl merged commit 97e8bc4 into aws-deadline:mainline Oct 17, 2024
17 of 18 checks passed

godobyte mentioned this pull request Oct 18, 2024

feat(JA): Add force-rehash option to snapshot command. Integ test refactoring. #477

Merged

2 tasks

This was referenced Nov 14, 2024

chore(release): 0.49.0 #500

Closed

chore(release): 0.49.0 #503

Merged



		@pytest.mark.skip("Random Failure with no credentials on Github")
		class TestSnapshot:

		if json:
		logger.json(dataclasses.asdict(differences), indent=4)

feat(JA): Implementation for Job Attachments diff command #465

feat(JA): Implementation for Job Attachments diff command #465

Conversation

leongdl commented Oct 9, 2024 • edited Loading

What was the problem/requirement? (What/Why)

What was the solution? (How)

What is the impact of this change?

How was this change tested?

Was this change documented?

Does this PR introduce new dependencies?

Is this a breaking change?

Does this change impact security?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Oct 17, 2024

Quality Gate failed

leongdl commented Oct 9, 2024 •

edited

Loading