-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match BinExport2 instructions using patterns #2353
match BinExport2 instructions using patterns #2353
Conversation
add pruning phase to expression tree building to remove known-bad branches. This might address some of the data we're seeing due to: NationalSecurityAgency/ghidra#6821 Also introduces a --instruction optional argument to dump the details of a specific instruction.
d15bca5
to
99767ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work and I think a big improvement over the nested ifs before - the tests give further confidence that this works as expected!
return | ||
|
||
instruction: BinExport2.Instruction = be2.instruction[ii.instruction_index] | ||
# guaranteed to be simple int/reg operands | ||
# so we don't have to realize the tree/list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, great comments here throughout!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @williballenthin . I've left some small comments for your review but overall this is a big improvement to the nested checks that we were using. LGTM 🚀
tests/test_binexport_accessors.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of the tests! This helps breakdown what the various functions are supposed to do.
children_tree_indexes: List[int] = expression_tree[tree_index] | ||
|
||
if expression.type == BinExport2.Expression.OPERATOR: | ||
if len(children_tree_indexes) == 0 and expression.symbol in ("lsl", "lsr"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle other extended register operands? see https://devblogs.microsoft.com/oldnewthing/20220727-00/?p=106907
# - #int | ||
# - #int0 | ||
# and a limited set of supported operators. | ||
assert re.match(r"^(reg|#int)[0-9]?(\(stack\)|\(not-stack\))?$", o) or o in ("[", ",", "!", "+", "*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concern with a performance hit from repeated regex matching here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used when parsing the patterns, which we expect to happen once at startup, O(1). I figured it wasn't worth optimizing that path (such a tiny impact) versus line count/context here. Would have to move the regex declaration out of the function, which makes it harder to follow as a human.
""" | ||
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack), #int] ; capture #int | ||
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack), #int]! ; capture #int | ||
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack)], #int ; capture #int | ||
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack), #int] ; capture #int | ||
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack), #int]! ; capture #int | ||
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack)], #int ; capture #int | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat 🚀
99767ff
to
7302be5
Compare
Introduce intruction pattern matching to declaratively describe the instructions and operands that we want to extract. While there's a bit more code, its much more thoroughly tested, and is less brittle than the prior if/else/if/else/if/else implementation.
7302be5
to
5d2ebcb
Compare
This PR against the BinExport2 PR introduces a way to match instructions and extract expression values. We use this mini-language to specify how to extract features from BinExport2 instructions, such as Number, Offset, nzxor, etc. While there's a fair amount of code (~500 lines, with tests) to implement this matching language, it is:
To understand all the changes here, I'd recommend reading:
lsl #0
expressions that have no effect. We prune these from the expression tree.Checklist