Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract dynamic capabilities #1644

Conversation

yelhamer
Copy link
Collaborator

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@google-cla
Copy link

google-cla bot commented Jul 17, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@yelhamer yelhamer changed the base branch from master to dynamic-feature-extraction July 17, 2023 10:55
@yelhamer yelhamer marked this pull request as draft July 17, 2023 11:30
@yelhamer
Copy link
Collaborator Author

will work on fixing the protobuf mypy issues and vverbose, then will remove the draft status. feel free to review in the meanwhile

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good so far!

capa/main.py Outdated Show resolved Hide resolved
capa/main.py Show resolved Hide resolved
@yelhamer yelhamer force-pushed the find-dynamic-capabilities branch 4 times, most recently from a51bf13 to 4e4b123 Compare July 19, 2023 13:00
@yelhamer
Copy link
Collaborator Author

Couldn't fix the mypy issues for some reason...

I tried adding the exclusions using pyproject.toml, mypy.ini, and pre-commit config files. All worked locally, but for some reason only tests/_test_proto,py was being excluded in the Github actions tests.

I'll take a look at this more afterwards, in the meantime, the tests succeed locally so I think a review would be nice (so that I could address the comments and the mypy issues together in future commits). The result_document.py tests might fail if you try to run them locally, and that's because I haven't updated all of the test result document (rd) data nor opened a PR to capa-testfiles, will do so after the dynamic statement PR gets merged (I'll be opening that shortly).

Here are the test results:

image

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see inline comments.

overall, looks good! i'd like to run it locally and see how things work.

do you think we should separate StaticResultDocument from DynamicResultDocument? or, let the code switch on doc.meta.analysis?

capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/ida/extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/ida/extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/dnfile/extractor.py Outdated Show resolved Hide resolved
Comment on lines +456 to +460
meta = {
"feature_counts": feature_counts,
}

return matches, meta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably introduce a dedicated type for the meta variable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it okay to introduce a type with just 2 attributes max ("feature_counts" and "library_functions") ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, its fine to introduce trivial types. the major benefit is that mypy can help is identify bugs, as well as serving as a single place for documenting what the data in each field means. otherwise, we're left marking up the various dict construction statements.

capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/binja/extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/ida/extractor.py Outdated Show resolved Hide resolved
@williballenthin
Copy link
Collaborator

with the dynamic branch being sync'd, there will now be conflicts here to be fixed. you should try to merge dynamic-feature-extraction into this branch and update the PR.

Co-authored-by: Moritz <[email protected]>
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress, please see my inline comments and can you add a few tests on the dynamic rendering and calling capa main (test_main) with CAPE input?

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some not-very-hard things to fix, and then i think we should go ahead and merge this. i think the logic looks quite good!

capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/binja/extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/cape/extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/ida/extractor.py Outdated Show resolved Hide resolved
capa/main.py Outdated Show resolved Hide resolved
capa/main.py Outdated Show resolved Hide resolved
tests/test_binja_features.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see a few inline changes, then im happy for this to be merged.

capa/features/extractors/base_extractor.py Outdated Show resolved Hide resolved
capa/features/extractors/base_extractor.py Show resolved Hide resolved
capa/features/extractors/base_extractor.py Show resolved Hide resolved
capa/features/extractors/ida/extractor.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be good to merge, not sure what's wrong with the failing 3.11 test

@williballenthin williballenthin merged commit f4bdff0 into mandiant:dynamic-feature-extraction Jul 21, 2023
13 of 15 checks passed
@yelhamer yelhamer added breaking-change introduces a breaking change that should be released in a major version gsoc Work related to Google Summer of Code project. dynamic related to dynamic analysis flavor labels Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change introduces a breaking change that should be released in a major version dynamic related to dynamic analysis flavor gsoc Work related to Google Summer of Code project.
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

update main logic to recognize CAPE analysis reports and invoke the right extractor
3 participants