Improving x86 GCC Support #256

JamesShaker · 2021-11-09T04:35:03Z

JamesShaker
Nov 9, 2021

Hello!

I'm looking at improving support for x86 GCC output (both ELF and mingw32 cross-compiled). Similar to work on CWE416 for x64 PEs, I thought I'd document my progress on debugging/patching to invite insights or comments as I go. My main focus for improving this support is on trying to get to a point where the CWE415 and CWE416 GCC x86 acceptance tests pass. Currently they're disabled with the following explanatory comment:

// The analysis loses track of the stack pointer offset in the main() function
// because of a "INT_AND ESP 0xfffffff0" instruction.
// We would need knowledge about alignment guarantees for the stack pointer at the start of main() to fix this.

My first step was to understand exactly how this plays out and attempt to reproduce the explained issue. Looking through the code in Ghidra for all AND instructions I found two:

AND ESP, 0xfffffff0 in _start at 0x10445
AND ESP, 0xfffffff0 in main at 0x10581
I then instrumented cwe_checker to spit out extra information including the Tid of handled DEF instructions and details about assignment. As such I found these two x86 instructions contained respectively these two analysed register assignments in p-code:

HANDLING DEF Tid { id: "instr_00010445_2", address: "00010445" }
ASSIGN 
OLD VAL: Some(DataDomain { size: ByteSize(4), relative_values: {AbstractIdentifier(AbstractIdentifierData { time: Tid { id: "sub_00010440", address: "00010440" }, location: Register(Variable { name: "ESP", size: ByteSize(4), is_temp: false }) }): IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(4)] }, end: ApInt { len: BitWidth(32), digits: [Digit(4)] }, stride: 0 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }}, absolute_value: None, contains_top_values: false })
TO: Variable { name: "ESP", size: ByteSize(4), is_temp: false }
EXPR: BinOp { op: IntAnd, lhs: Var(Variable { name: "ESP", size: ByteSize(4), is_temp: false }), rhs: Const(ApInt { len: BitWidth(32), digits: [Digit(4294967280)] }) }
VALUE: DataDomain { size: ByteSize(4), relative_values: {AbstractIdentifier(AbstractIdentifierData { time: Tid { id: "sub_00010440", address: "00010440" }, location: Register(Variable { name: "ESP", size: ByteSize(4), is_temp: false }) }): IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(2147483648)] }, end: ApInt { len: BitWidth(32), digits: [Digit(2147483647)] }, stride: 1 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }}, absolute_value: Some(IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(2147483648)] }, end: ApInt { len: BitWidth(32), digits: [Digit(2147483647)] }, stride: 1 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }), contains_top_values: false }
VALUE IS: Not Top

HANDLING DEF Tid { id: "instr_00010581_2", address: "00010581" }
ASSIGN 
OLD VAL: Some(DataDomain { size: ByteSize(4), relative_values: {AbstractIdentifier(AbstractIdentifierData { time: Tid { id: "sub_0001057d", address: "0001057d" }, location: Register(Variable { name: "ESP", size: ByteSize(4), is_temp: false }) }): IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(0)] }, end: ApInt { len: BitWidth(32), digits: [Digit(0)] }, stride: 0 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }}, absolute_value: None, contains_top_values: false })
TO: Variable { name: "ESP", size: ByteSize(4), is_temp: false }
EXPR: BinOp { op: IntAnd, lhs: Var(Variable { name: "ESP", size: ByteSize(4), is_temp: false }), rhs: Const(ApInt { len: BitWidth(32), digits: [Digit(4294967280)] }) }
VALUE: DataDomain { size: ByteSize(4), relative_values: {AbstractIdentifier(AbstractIdentifierData { time: Tid { id: "sub_0001057d", address: "0001057d" }, location: Register(Variable { name: "ESP", size: ByteSize(4), is_temp: false }) }): IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(2147483648)] }, end: ApInt { len: BitWidth(32), digits: [Digit(2147483647)] }, stride: 1 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }}, absolute_value: Some(IntervalDomain { interval: Interval { start: ApInt { len: BitWidth(32), digits: [Digit(2147483648)] }, end: ApInt { len: BitWidth(32), digits: [Digit(2147483647)] }, stride: 1 }, widening_upper_bound: None, widening_lower_bound: None, widening_delay: 0 }), contains_top_values: false }
VALUE IS: Not Top

In both cases the stack pointer is not set to top but it is not tracked precisely. In fact the interval Digit(2147483648) to Digit(2147483647) represents the entire signed 32 bit range. This is strange because before the operation the stack pointer is set to a single value so the result should also be a single value, shouldn't it?

JamesShaker · 2021-11-09T04:35:09Z

JamesShaker
Nov 9, 2021
Author

Further investigation revealed that I was misguided. The precise value was an offset not an absolute value. Thus it makes sense it can not have a binary operation computed over it exactly. In the function preserve_relative_targets_for_binop every offset is set to top conservatively. Thus it makes sense why the stack pointer is not set to top but holds all values: the offset is top but it maintains its base and so is not marked as top in totality. To fix this we need to increase our assumptions about ESP so that we can use absolute values (this seems like a bad idea?) or make clever assumptions about the behaviour of operations based on known alignment (as the comment suggests), or we can improve how the offset is handled in the INT_AND operation. Now it is more conservative than it needs to be. In particular for an operand like 0xFFFFFFF0 we should be able to do better than setting the offset to top. I'm not certain if this alone will be sufficient to resolve the issue. Suppose we adjust the system so that in this special case the offset is between -F and F (that is between 4294967281 and 15). Would this really provide enough accuracy to allow values in the stack to be found? I need to think more about this and/or try it and see if it fixes the problem.

0 replies

JamesShaker · 2021-11-09T06:28:23Z

JamesShaker
Nov 9, 2021
Author

One potential heavy-weight general solution for making binary bit-wise operations less conservative for relative values being operated on with constants would be to use an SMT solver. Some preliminary experiments suggest Z3 is able to quite quickly find a suitable interval. Here's a small script I wrote to demonstrate this with:

from z3 import *

#PARAMETERS
binop_const_val = 0xFFFFFFF0
min_offset = 0x0
max_offset = 0x0

base = BitVec('base',32)
offset = BitVec('offset',32)

#Switch to absolute value, before binop, revert to relative value
abs_val = base + offset
new_abs_val = abs_val & binop_const_val
new_offset = new_abs_val - base

new_min_offset = BitVec('new_min_offset',32)
new_max_offset = BitVec('new_max_offset',32)

#Offset is in given interval
def gof(offset):
    return And(offset <= max_offset, min_offset <= offset)

#Find upper and lower bounds on offset interval post-binop
def is_ub(ub):
    return ForAll([base,offset],Implies(gof(offset),new_offset <= ub))

def is_lb(lb):
    return ForAll([base,offset],Implies(gof(offset),lb <= new_offset))

#Find supremums and infimums on offset interval post-binop
def is_sup(sup):
    aub = BitVec('alt_upper_bound',32)
    return And(is_ub(sup),ForAll([aub],Implies(is_ub(aub),sup <= aub)))

def is_inf(inf):
    alb = BitVec('alt_lower_bound',32)
    return And(is_lb(inf),ForAll([alb],Implies(is_lb(alb),alb <= inf)))

#Actually solve to find best new abstraction
s = Solver()
s.add(is_sup(new_max_offset))
s.add(is_inf(new_min_offset))

result = s.check()
if result == sat:
    print(s.model())
else:
    print(result)
    print("Something went wrong :(")

This returns [new_max_offset = 0, new_min_offset = 4294967281] as we would expect. Of course, again, this may not solve our core problem and it is a large change to the dependencies of the project. Likely this would also have performance ramifications. Furthermore it's unclear how useful this would be outside this use case (fixing alignment) even though we're not certain it would help in this case!

0 replies

Enkelmann · 2021-11-09T06:58:42Z

Enkelmann
Nov 9, 2021

Two things:

First, just making the handling of INT_AND less conservative will not solve our problems here. Even if the offset is just a very small interval it will still completely mess up any stack access handling in the PointerInference analysis (at least in the current implementation). A solution that does not allow us to exactly track the effect of the AND ESP, 0xfffffff0 instruction on the stack pointer is not worth pursuing in my opinion.

Second, while a SMT solver like Z3 is definitely a powerful and useful tool, its usage also comes with a very high computational cost. Since we already fight long analysis times even without a SMT solver in the mix, integrating it could lead to the cwe_checker being too slow even for the analysis of very small binaries.
I think the way to go is to only use computationally cheap operations in the cwe_checker itself and accept the inaccuracies that come with it. Then one can use the computationally expensive but powerful hammer of SMT solvers (especially together with symbolic execution!) and apply it only to those parts of a binary where the cwe_checker found possible bugs. This could (sort of) combine the advantages of the cwe_checker approach with the advantages of the symbolic execution approach while also mitigating the disadvantages of both approaches.

0 replies

JamesShaker · 2021-11-09T07:57:53Z

JamesShaker
Nov 9, 2021
Author

This makes sense! I'll look at ways to cheaply track stack pointer behaviour precisely using extra knowledge about the binary. I presume this will essentially involve making (as is mentioned in the comment in the test) some assumptions about the alignment of the unknown base address in the ESP approximation?

1 reply

Enkelmann Nov 9, 2021

Yes. It may be possible to solve this in a rather hacky way by teaching the PointerInference the alignment of the stack pointer base address just for the main function. If you are looking for a conceptually sound way to track alignment information (e.g. one could add an extra alignment field to the DataDomain), the necessary changes to the cwe_checker could turn out to be quite complex. Just as a little warning that depending on your solution the conceptually sound way here may not be that easy to implement.

JamesShaker · 2021-11-22T22:21:28Z

JamesShaker
Nov 22, 2021
Author

Apologies on the delay in response, I've not had time to look at this for a week or so. Thought I'd better update where I'd gotten up to.

I had a crack at solving this problem but found that an alignment field on DataDomain doesn't work unfortunately. I implemented a system in which an AND ESP 0xFFFFFFF0 would cause the alignment to become 4 (since 4 LSBs are definitely zero) with the intention that this would allow the next AND ESP 0xFFFFFFF0 to be ignored (through some added logic on binary operations). The issue is when the stack is pushed or popped the alignment becomes 2 and so we can no longer ignore an AND ESP 0xFFFFFFF0 as it may now (based on our new alignment of 2) zero out originally non-zero bits. Potentially in the concrete world the AND ESP 0xFFFFFFF0 is still essentially a NOP as there are enough PUSH or POP instructions to mean when main performs an AND ESP 0xFFFFFFF0 we have already re-aligned the stack. Unfortunately my enrichment of the abstraction cannot profit from this behaviour (if the behaviour even exists). One option to try and catch this case would be to switch to a known_lsb field which tracks what we know about the LSBs. This becomes somewhat complex and I'm not convinced it'll work. Instead I think I might investigate giving up on precisely tracking the effect of an AND ESP 0xFFFFFFF0 and move to treating it like a call in which we use a totally new AbstractIdentifier as the base address. This means we lose access to the old relative values, but given the register is (essentially if not actually) becoming top anyway (as the offsets become T even if the base addresses don't) is this really a concern? I might experiment on this. Very happy to receive any insights or advice!

0 replies

JamesShaker · 2021-11-25T05:09:39Z

JamesShaker
Nov 25, 2021
Author

So I tried to implement a solution that essentially produces a new stack frame (following the lead of the update_call function) when update_def processes a Def::assign that overwrites the project.stack_pointer with top or with relative_values that all have top offsets. This works to some extent. Now the free call that under CWE416 should result in a use after free being later triggered successfully finds the location of its parameter on the stack (in the new stack frame) since ESP is being properly tracked. However when this value is read top is returned. It is definitely stored to with a non-top value. The problem is that State::write_to_address calls State::store_value which calls Object_list::set_value on the State's memory. When this function looks in the AbstracObjectList for the 'zero offset' of our new stack's abstract object it gets top which results in the memory offset that is passed to Object::value_access::set_value (which relies on the object's 'zero offset') being top. This causes Object::value_access:set_value to then mark all memory as top on I believe line 72 of src/cwe_checker_lib/arc/analysis/pointer_inference/object/value_access.rs. Thus the call to free finds the spot on the stack but when it is loaded the original malloc-ed pointer is no longer there. I need to investigate further to understand why this is happening.

1 reply

Enkelmann Nov 29, 2021

I finally had time to think about your approach some more. One possible error could be that you did not explicitly create an abstract object for your new stack frame, since abstract objects do not get created automatically when you call State::store_value for a new abstract ID. This would at least explain why no information is stored for your new stack frame.

Another issue with your approach is that changing the stack frame ID midway through a function probably breaks a lot of assumptions about the stack pointer (some of which may be implicit and not properly documented yet). For example, this could create a situation where the fixpoint algorithm would have to merge two stack frames for the same function that have different abstract IDs. The code is not designed to be able to handle such situations right now. So your approach may work for the special case of the main function of gcc-compiled binaries, but it probably creates more problems than it solves for all other occurences of lost stack pointer offsets.

For that reason I think only solutions that enable us to track the stack offset pointer exactly are worthwhile right now. This would imply knowing the alignment of the stack pointer at function start (I would assume that information related to that should be contained in the System V ABI). And then using this knowledge to translate the INT_AND operation to an INT_SUB operation with a constant offset. Actually, it just occured to me that one could also do this prior to the Pointer Inference analysis as some kind of intermediate representation normalization pass. This way one would not need raise the complexity of the code of the Pointer Inference just for this special case.

Enkelmann · 2021-11-25T06:56:08Z

Enkelmann
Nov 25, 2021

Sorry for the late and short answer, I will have time for longer answers next week. What you could try is to build in the special case into the update_def function. For example by testing for the special case directly in update_def (before the standard handling happens). Even better would be to test for it in the State::eval function whenever an Int_and operation is called. This way you could prevent the stack pointer to ever contain a top offset. As long as you know that the code is x86 (which IIRC you can look up in the Project struct), you know what alignment the zero-offset of the stack pointer should have, which should allow for an exact computation of the effect of the Int_and operation.

1 reply

JamesShaker Nov 25, 2021
Author

No worries at all! Will hang in there until next week. I guess what I'm struggling to understand is why the new stack frame's abstract object has top as its zero offset. I understand it having a top offset relative to the previous stack frame, but why is its zero top relative to itself? (Please please don't feel any pressure to reply until next week, happy to wait!)

JamesShaker · 2021-11-30T01:23:34Z

JamesShaker
Nov 30, 2021
Author

I did some more debugging and the core of the problem seems to be that I call ‘replace_abstract_id’ with T as the offset (with the intention of changing pre-existing memory objects to be addressed relative to the new ID with T as the offset since the difference between the old ID and the new ID is unknown) which (unexpectedly) creates a new abstract object that has T stored as an associated offset. Then when I manually make a stack object with 0 as an offset (mimicking the update_call behaviour) it is joined with the object ‘replace_abstract_id’ creates. The 0 and T join to make T, ruining the new stack memory object. I don’t understand ‘replace_abstract_id’’s behaviour. What you say makes sense and I need to think about this some more but I’m struggling to understand how such a normalisation is possible in generality. If the number of stack pushes and pops in a function is not predictable then would such a normal form exist? That is suppose some function _init_ aligns the stack with an AND ESP, 0xFFFFFFF0 but then it pushes 4 bytes twice in branch A but four times in branch B. Both branches then call main. In the branch A case in the function main an AND ESP, 0xFFFFFFF0 would translate into SUB ESP, 0x8 but in the second to a SUB ESP, 0x0. So no normalised form for main exists as the calling contexts dictates the offset. Is there a way around this problem? Apologies if I’ve misunderstood!

…

On 29 Nov 2021, at 6:00 pm, Enkelmann ***@***.***> wrote: I finally had time to think about your approach some more. One possible error could be that you did not explicitly create an abstract object for your new stack frame, since abstract objects do not get created automatically when you call State::store_value for a new abstract ID. This would at least explain why no information is stored for your new stack frame. Another issue with your approach is that changing the stack frame ID midway through a function probably breaks a lot of assumptions about the stack pointer (some of which may be implicit and not properly documented yet). For example, this could create a situation where the fixpoint algorithm would have to merge two stack frames for the same function that have different abstract IDs. The code is not designed to be able to handle such situations right now. So your approach may work for the special case of the main function of gcc-compiled binaries, but it probably creates more problems than it solves for all other occurences of lost stack pointer offsets. For that reason I think only solutions that enable us to track the stack offset pointer exactly are worthwhile right now. This would imply knowing the alignment of the stack pointer at function start (I would assume that information related to that should be contained in the System V ABI). And then using this knowledge to translate the INT_AND operation to an INT_SUB operation with a constant offset. Actually, it just occured to me that one could also do this prior to the Pointer Inference analysis as some kind of intermediate representation normalization pass. This way one would not need raise the complexity of the code of the Pointer Inference just for this special case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

1 reply

Enkelmann Nov 30, 2021

Regarding the replace_abstract_id behaviour: When it renames an ID which has an associated memory object, the result is that the memory object is referenced by the new ID afterwards (with the offset behavior you encountered). It is built so that it can preserve offsets whenever possible. What you could try:

use replace_abstract_id with a Top offset (since you do not know the target of old stack pointers relative to your new zero offset anyway)
then remove any content of the abstract offset (you cannot access them anyway since you don't know the positions relative the new zero offset)
manually set the offset adjustment of the memory object to zero
manually set the stack pointer to point to the zero offset of the memory object again

Since your approach breaks some assumptions that the code was built around, you may encounter some more things where the code does not exactly do as you would expect.

Regarding the normalization: You are correct, such a normalization is not possible in general. But we can detect cases where the normalization is safe to do and at least change these. The other side is that compilers usually do not create code where the stack pointer value depends on the (intraprocedural) execution path, since this would complicate compilation quite a bit without yielding performance gains in the compiled program. And in cases where the stack pointer depends on the execution path, it is usually restored via the stack base pointer and not via an INT_AND instruction. So I would expect cases where we cannot use this normalization to only occur in rare cases of handwritten assembly.

JamesShaker · 2021-12-24T08:25:54Z

JamesShaker
Dec 24, 2021
Author

I've done some work attempting to experiment with such a normalisation. I've built a prototype least significant bits abstraction and then applied it to one of the example ELFs. It seems to work tracking the alignment of ESP however it will not solve the problem. Unfortunately it seems the entry point _start calls out into __libc_start_main which then calls into main. Of course this is not visible to the analysis when libc is dynamically linked and so the alignment analysis cannot allow us to normalise out INT_AND operations with no extra knowledge. :(

3 replies

Enkelmann Jan 3, 2022

Thank you for trying anyway! :-) It still might be possible to do this properly:
First, it is allowed to match the name of the function for a fix. So if the name of a function is main, we can simply assume that it is the main function and act accordingly. I think that correctly teaching the cwe_checker how __libc_start_main works would be too complicated, at least for now. And I used the workaround of matching the function name before in the cwe_checker.
Second, according to Wikipedia all functions using the x86-cdecl calling convention have a 16-byte aligned stack (although I have not cross-checked with official documentation of the calling convention yet). We may be able to just assume the alignment for all functions! When function stacks are not aligned they are unlikely to use an AND ESP, ... operation, so this may even work in cases when the assumption is wrong.

JamesShaker Jan 4, 2022
Author

So if I've understood this correctly you're saying we can just match on the function main and remove the early AND ESP,0xFFF.. instruction? As for other non-main functions, I haven't checked super thoroughly but it seems from my basic investigations that they don't use alignment operations (i.e. AND ESP, ...). If I'm mistaken and they do, then perhaps we could make the assumption of alignment once for main but then use the LSB abstraction for all other functions?

Enkelmann Jan 4, 2022

Yes, we can just match on the function main.
I also have not seen usage of AND ESP, ... outside of the main function yet, so I would restrict the fix to the main function for now and see whether that is already enough to fix the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving x86 GCC Support #256

{{title}}

Replies: 9 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Improving x86 GCC Support #256

Replies: 9 comments · 7 replies

JamesShaker Nov 9, 2021 Author

JamesShaker Nov 9, 2021 Author

JamesShaker Nov 9, 2021 Author

JamesShaker Nov 22, 2021 Author

JamesShaker Nov 25, 2021 Author

JamesShaker Nov 25, 2021 Author

JamesShaker Nov 30, 2021 Author

JamesShaker Dec 24, 2021 Author

JamesShaker Jan 4, 2022 Author

Replies: 9 comments 7 replies

JamesShaker
Nov 9, 2021
Author

JamesShaker
Nov 9, 2021
Author

JamesShaker
Nov 9, 2021
Author

JamesShaker
Nov 22, 2021
Author

JamesShaker
Nov 25, 2021
Author

JamesShaker Nov 25, 2021
Author

JamesShaker
Nov 30, 2021
Author

JamesShaker
Dec 24, 2021
Author

JamesShaker Jan 4, 2022
Author