Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(JA): Implementation for Job Attachments diff command #465

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

leongdl
Copy link
Contributor

@leongdl leongdl commented Oct 9, 2024

What was the problem/requirement? (What/Why)

  • Customers have requested APIs and CLI interfaces to use Job Attachments.
  • We designed new CLI + APIs:
    • manifest snapshot
    • manifest diff
    • manifest download
    • manifest upload
    • attachment upload
    • attachment download.

What was the solution? (How)

  • In this PR, we are adding the implementation for manifest diff

What is the impact of this change?

  • This new CLI + API will allow users to compare a Job Attachment manifest against the latest files in the local file system. Then customers can make a decision to make a new manifest file or not.

How was this change tested?

  • Manual testing
deadline manifest diff --root ./src --manifest ~/work/manifest/diff-cli-2024-10-08T16-40-40.manifest 
deadline manifest diff --root ./src --manifest /Users/leongdl/work/manifest/diff-cli-2024-10-08T16-40-40.manifest --json
  • Have you run the unit tests?
    • Yes, hatch run test passes.
  • Have you run the integration tests?
    • Yes hatch run integ:test passes.
================================================================================ 16 passed, 2 skipped, 4 warnings in 68.45s (0:01:08) ================================================================================
  • Have you made changes to the download or asset_sync modules? If so, then it is highly recommended
    that you ensure that the docker-based unit tests pass.
  • Not applicable, Diff only does manifest operations.

Was this change documented?

  • Are relevant docstrings in the code base updated?

    • Yes
  • Has the README.md been updated? If you modified CLI arguments, for instance.

    • Readme and examples will be added once the feature stablizes.

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

  • This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
  • This PR does not add any new dependencies.

Is this a breaking change?

  • No.

Does this change impact security?

  • Yes, this will be security reviewed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@@ -29,6 +36,37 @@
"""


def _glob_files(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored away to share code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be in _glob.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - because this function is local only to the CLIs and only used by the CLIs, so I am not moving this to a shared code. It is shared code to process the CLI input.

The shared function or API is _process_glob_inputs below.

@leongdl leongdl force-pushed the ja-diff branch 7 times, most recently from c087a00 to b27f11b Compare October 10, 2024 23:11


@pytest.mark.skip("Random Failure with no credentials on Github")
class TestSnapshot:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something odd with calling CLI commands in unit test. As integration tests they pass but I am not sure why. I'm going to comment this out for now and then once submissions open I may move this over as an integ test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like integ tests to me. I'd imagine cli test tests the wrapper and mock call to the api (such as _manifest_diff).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, somehow there is something odd with them being unit tests. I will move them to be integ tests once the code is merged. I didn't want to merge in some functionality with potentially flaky integ tests. (I'll follow up after this PR)

Comment on lines +145 to +151
@click.option(
"--force-rehash",
default=False,
is_flag=True,
help="Rehash all files to compare using file hashes.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being added in for snapshot command, do you mind adding it before the next release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I will, this PR is specifically for diff. My Next PR will also add it to snapshot for equivalence.

if not os.path.isfile(manifest):
raise NonValidInputError(f"Specified manifest file {manifest} does not exist. ")

if not os.path.isdir(root):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - we should be consistent with option naming root or root-dir for attachment and manifest commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Lets do a quick pass after we merge in to catch these inconsistencies.

Root or root dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest root. Let's do a pass with stakeholders for the naming, and update them altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed with root (this PR's name) and also will do a pass potentially to cleanup.

raise NonValidInputError(f"Specified root directory {root} does not exist. ")

# Perform the diff.
differences = _manifest_diff(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - no typing for differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

Comment on lines +185 to +187
if json:
logger.json(dataclasses.asdict(differences), indent=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - json function already does json check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? logger.json takes in a dictionary so it prints out JSON.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json function in clicklogger already check if json, this line doesn't need to be under the json check again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Thats what you meant. Yeah I recognize that in this case, but given there's a line for the pretty print option it was easier to already early escape it here.

The pretty print does alot more un-necessary work so I wanted to avoid running that code for no reason.

if json:
logger.json(dataclasses.asdict(differences), indent=4)
else:
logger.echo(f"\n{root}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - could we add more description for this log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Comment on lines +213 to +229
COLORS = {
"MODIFIED": "\033[93m", # yellow
"NEW": "\033[92m", # green
"DELETED": "\033[91m", # red
"UNCHANGED": "\033[90m", # grey
"RESET": "\033[0m", # base color
"DIRECTORY": "\033[80m", # grey
}

# Tooltips:
TOOLTIPS = {
FileStatus.NEW: " +", # added files
FileStatus.DELETED: " -", # deleted files
FileStatus.MODIFIED: " M", # modified files
FileStatus.UNCHANGED: "", # unchanged files
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we model file status related configurations better?

Some idea - https://stackoverflow.com/questions/59916345/adding-a-property-to-an-enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree- let me look into this refactor after we complete all the CLIs and name cleanups.


directory_tree = build_directory_tree(all_files)
print_tree(directory_tree)
logger.info("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is part of the "pretty print" to print an empty line.

Comment on lines +210 to +212
with open(manifest) as input_file:
manifest_data_str = input_file.read()
local_manifest_object = decode_manifest(manifest_data_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add file check and error handling here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_manifest raises errors, so if this failed, it will just fail the program outright. There's no need to do more error checks redundant.

Comment on lines +224 to +248
# Hash based compare manifests.
differences: List[Tuple[FileStatus, BaseManifestPath]] = compare_manifest(
reference_manifest=local_manifest_object, compare_manifest=directory_manifest_object
)
# Map to output datastructure.
for item in differences:
if item[0] == FileStatus.MODIFIED:
output.modified.append(item[1].path)
elif item[0] == FileStatus.NEW:
output.new.append(item[1].path)
elif item[0] == FileStatus.DELETED:
output.deleted.append(item[1].path)

else:
# File based comparisons.
fast_diff: List[Tuple[str, FileStatus]] = _fast_file_list_to_manifest_diff(
root=root, current_files=input_files, diff_manifest=local_manifest_object, logger=logger
)
for fast_diff_item in fast_diff:
if fast_diff_item[1] == FileStatus.MODIFIED:
output.modified.append(fast_diff_item[0])
elif fast_diff_item[1] == FileStatus.NEW:
output.new.append(fast_diff_item[0])
elif fast_diff_item[1] == FileStatus.DELETED:
output.deleted.append(fast_diff_item[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison for both cases look very similar, maybe we could refactor or use helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no, the loop is the same, but the data structure it processes is the different. So I dont' see much refactoring options here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like?

def process_output(status, path, output):
    if status == FileStatus.MODIFIED:
        output.modified.append(path)
    elif status == FileStatus.NEW:
        output.new.append(path)
    elif status == FileStatus.DELETED:
        output.deleted.append(path)
    else:
        raise InvalidStatusError()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that cleanup but I wanted to avoid nested functions :(



@pytest.mark.skip("Random Failure with no credentials on Github")
class TestSnapshot:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like integ tests to me. I'd imagine cli test tests the wrapper and mock call to the api (such as _manifest_diff).

@leongdl leongdl marked this pull request as ready for review October 16, 2024 21:08
@leongdl leongdl requested a review from a team as a code owner October 16, 2024 21:08
Comment on lines +139 to +142
if return_root_relative_path:
return relative_path
else:
return full_path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be 1 liner with ternary operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True :) Very pythonic.

src/hello Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file needed?

@leongdl leongdl enabled auto-merge (squash) October 17, 2024 04:23
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@leongdl leongdl merged commit 97e8bc4 into aws-deadline:mainline Oct 17, 2024
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants