Improving x86 GCC Support #256
Replies: 9 comments 7 replies
-
Further investigation revealed that I was misguided. The precise value was an offset not an absolute value. Thus it makes sense it can not have a binary operation computed over it exactly. In the function |
Beta Was this translation helpful? Give feedback.
-
One potential heavy-weight general solution for making binary bit-wise operations less conservative for relative values being operated on with constants would be to use an SMT solver. Some preliminary experiments suggest Z3 is able to quite quickly find a suitable interval. Here's a small script I wrote to demonstrate this with:
This returns |
Beta Was this translation helpful? Give feedback.
-
Two things: First, just making the handling of Second, while a SMT solver like Z3 is definitely a powerful and useful tool, its usage also comes with a very high computational cost. Since we already fight long analysis times even without a SMT solver in the mix, integrating it could lead to the cwe_checker being too slow even for the analysis of very small binaries. |
Beta Was this translation helpful? Give feedback.
-
This makes sense! I'll look at ways to cheaply track stack pointer behaviour precisely using extra knowledge about the binary. I presume this will essentially involve making (as is mentioned in the comment in the test) some assumptions about the alignment of the unknown base address in the ESP approximation? |
Beta Was this translation helpful? Give feedback.
-
Apologies on the delay in response, I've not had time to look at this for a week or so. Thought I'd better update where I'd gotten up to. I had a crack at solving this problem but found that an |
Beta Was this translation helpful? Give feedback.
-
So I tried to implement a solution that essentially produces a new stack frame (following the lead of the |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late and short answer, I will have time for longer answers next week. What you could try is to build in the special case into the update_def function. For example by testing for the special case directly in update_def (before the standard handling happens). Even better would be to test for it in the |
Beta Was this translation helpful? Give feedback.
-
I did some more debugging and the core of the problem seems to be that I call ‘replace_abstract_id’ with T as the offset (with the intention of changing pre-existing memory objects to be addressed relative to the new ID with T as the offset since the difference between the old ID and the new ID is unknown) which (unexpectedly) creates a new abstract object that has T stored as an associated offset. Then when I manually make a stack object with 0 as an offset (mimicking the update_call behaviour) it is joined with the object ‘replace_abstract_id’ creates. The 0 and T join to make T, ruining the new stack memory object. I don’t understand ‘replace_abstract_id’’s behaviour.
What you say makes sense and I need to think about this some more but I’m struggling to understand how such a normalisation is possible in generality. If the number of stack pushes and pops in a function is not predictable then would such a normal form exist? That is suppose some function _init_ aligns the stack with an AND ESP, 0xFFFFFFF0 but then it pushes 4 bytes twice in branch A but four times in branch B. Both branches then call main. In the branch A case in the function main an AND ESP, 0xFFFFFFF0 would translate into SUB ESP, 0x8 but in the second to a SUB ESP, 0x0. So no normalised form for main exists as the calling contexts dictates the offset. Is there a way around this problem? Apologies if I’ve misunderstood!
… On 29 Nov 2021, at 6:00 pm, Enkelmann ***@***.***> wrote:
I finally had time to think about your approach some more. One possible error could be that you did not explicitly create an abstract object for your new stack frame, since abstract objects do not get created automatically when you call State::store_value for a new abstract ID. This would at least explain why no information is stored for your new stack frame.
Another issue with your approach is that changing the stack frame ID midway through a function probably breaks a lot of assumptions about the stack pointer (some of which may be implicit and not properly documented yet). For example, this could create a situation where the fixpoint algorithm would have to merge two stack frames for the same function that have different abstract IDs. The code is not designed to be able to handle such situations right now. So your approach may work for the special case of the main function of gcc-compiled binaries, but it probably creates more problems than it solves for all other occurences of lost stack pointer offsets.
For that reason I think only solutions that enable us to track the stack offset pointer exactly are worthwhile right now. This would imply knowing the alignment of the stack pointer at function start (I would assume that information related to that should be contained in the System V ABI). And then using this knowledge to translate the INT_AND operation to an INT_SUB operation with a constant offset. Actually, it just occured to me that one could also do this prior to the Pointer Inference analysis as some kind of intermediate representation normalization pass. This way one would not need raise the complexity of the code of the Pointer Inference just for this special case.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Beta Was this translation helpful? Give feedback.
-
I've done some work attempting to experiment with such a normalisation. I've built a prototype least significant bits abstraction and then applied it to one of the example ELFs. It seems to work tracking the alignment of ESP however it will not solve the problem. Unfortunately it seems the entry point |
Beta Was this translation helpful? Give feedback.
-
Hello!
I'm looking at improving support for x86 GCC output (both ELF and mingw32 cross-compiled). Similar to work on CWE416 for x64 PEs, I thought I'd document my progress on debugging/patching to invite insights or comments as I go. My main focus for improving this support is on trying to get to a point where the CWE415 and CWE416 GCC x86 acceptance tests pass. Currently they're disabled with the following explanatory comment:
My first step was to understand exactly how this plays out and attempt to reproduce the explained issue. Looking through the code in Ghidra for all
AND
instructions I found two:AND ESP, 0xfffffff0
in_start
at0x10445
AND ESP, 0xfffffff0
inmain
at0x10581
I then instrumented cwe_checker to spit out extra information including the
Tid
of handledDEF
instructions and details about assignment. As such I found these two x86 instructions contained respectively these two analysed register assignments in p-code:In both cases the stack pointer is not set to top but it is not tracked precisely. In fact the interval
Digit(2147483648)
toDigit(2147483647)
represents the entire signed 32 bit range. This is strange because before the operation the stack pointer is set to a single value so the result should also be a single value, shouldn't it?Beta Was this translation helpful? Give feedback.
All reactions