Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix & improve Propagate Control Flow normalization pass #462

Merged
merged 11 commits into from
May 15, 2024

Conversation

vobst
Copy link
Collaborator

@vobst vobst commented May 3, 2024

Issue #461 discovered a problem with the Propagate Control Flow normalization pass: In rare cases it could happen that a basic block would be removed without re-targeting calls that return to it. link

This PR fixes the issue by allowing the re-targeting of call returns. Besides that, it includes a number of improvements to the optimization pass at large. In particular:

  • It adds support for the re-targeting of jumps without a known condition. Currently, a jump can only be re-targeted if some condition is known to be true whenever this branch is taken. This is necessary to resolve conditional branches, however, unconditional branches can be optimized away without a known condition.
  • It adds support for the re-targeting of call returns. This is what fixes panic in graph.rs because unwrap() of None #461. Note that we cannot assume that conditions remain true across calls due to possible side-effects so the first change is required to make this work.
  • It adds support to derive the block precondition from multiple incoming edges. Currently, a block precondition can only be derived if the block has a single incoming edge that stems from a conditional jump. However, we can still derive a precondition if there are multiple incoming edges that have the same condition.
  • It adds support to remember precondition and branch condition when re-targeting conditional jumps. Currently, we only remember the branch condition, but it turns out that there are cases where computing and remembering the precondition helps as well.
  • It updates the docs and makes minor code-style modernizations.

Testing

There is a unit test for each of the proposed changes that make the optimization more aggressive. Furthermore, the optimizations of the new pass were manually verified (at random) for three programs for three different architectures (MIPS, ARM32, AMD64).

Measurements

The following table includes a comparison of this PR and the current optimization pass. It show the number of basic block nodes and jump edges in the CFG of the unoptimized readelf IR program for different host architectures. Next to that are the fractions of basic blocks and jump edges that can be removed by the current and proposed optimization passes respectively. In short, the improved pass can remove ~1.8x more stuff from the CFG than the old one.

One interesting bit is that the optimization performs performs worst for MIPS, which was the architecture for which it was originally introduced.

|             | unoptimized           | master                   | PR                                                |
|             | BB        | Jumps     | dBB [%]     | dJumps [%] | dBB [%] | dJumps  [%] | impr. BB    | impr. Jumps |
|-------------|-----------|-----------|-------------|------------|---------|-------------|-------------|-------------|
| arm64       | 35,462.00 | 40,296.00 | 2.35        | 2.07       | 6.20    | 5.46        | 2.64        | 2.63        |
| armel       | 50,323.00 | 60,404.00 | 3.26        | 4.01       | 6.77    | 6.93        | 2.08        | 1.73        |
| armhf       | 50,385.00 | 61,036.00 | 3.58        | 4.49       | 6.89    | 7.22        | 1.92        | 1.61        |
| mipsel      | 27,784.00 | 31,661.00 | 0.63        | 0.58       | 0.69    | 0.64        | 1.09        | 1.09        |
| amd64       | 48,556.00 | 55,543.00 | 3.20        | 2.80       | 5.48    | 4.84        | 1.71        | 1.73        |
|-------------|-----------|-----------|-------------|------------|---------|-------------|-------------|-------------|
| avrg.       | 42,502.00 | 49,788.00 | 2.60        | 2.79       | 5.21    | 5.02        | 1.89        | 1.76        |
| std. dev.   | 10,321.60 | 13,143.66 | 1.19        | 1.56       | 2.58    | 2.64        | 0.56        | 0.56        |

@vobst vobst marked this pull request as ready for review May 3, 2024 15:03
@Enkelmann Enkelmann self-requested a review May 3, 2024 15:38
Copy link
Contributor

@Enkelmann Enkelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for some very small nitpicks, looks good to me.

By the way: The reason that it performs so bad on MIPS is probably because we still have a known error when parsing conditional assignment instructions in MIPS. Since the whole control flow path propagation exists mainly for improving the control flow in the presence of conditional assignments, it cannot do much if the instructions are not parsed correctly.

Edit: The bug fix might not be complete, see my other comment. You can probably check with some elaborated unit test whether I am right or not.

@Enkelmann
Copy link
Contributor

I might have an idea why the old code did not catch the case: We use the CFG to check for incoming edges when removing blocks, right? But if a callee has no return instruction, then the CFG may not contain an edge to the return site. But the TID of the return site is nevertheless referenced in the call instruction. Your code solves that by retargeting the return site. In theory it might be possible that the return site cannot be retargeted, while all other edges to the return site are retargeted nevertheless. So I think the bug still persists, it is just less likely now...

@vobst
Copy link
Collaborator Author

vobst commented May 3, 2024

I might have an idea why the old code did not catch the case: We use the CFG to check for incoming edges when removing blocks, right? But if a callee has no return instruction, then the CFG may not contain an edge to the return site. But the TID of the return site is nevertheless referenced in the call instruction. Your code solves that by retargeting the return site. In theory it might be possible that the return site cannot be retargeted, while all other edges to the return site are retargeted nevertheless. So I think the bug still persists, it is just less likely now...

Good catch! It is indeed not too difficult to construct this situation. Essentially a case where a call to a non-returning function is not recognized as such and "returns" to a conditional block that can be optimized away. See below for a concrete example.

I see a couple of options how to proceed:

a.) Add another, preceding normalization pass that recognizes calls that have Some return TID and for which no artificial return nodes are generated in the CFG for some reason. Set the return TID to None for those calls, essentially marking them non-returning.
b.) Keep track of all call-returns manually in the propagate CF pass and make sure that their target is only removed if they are re-targeted as well.
c.) It is already quite unlikely to hit the original case (at least it took us quite a while to notice) and will probably take much longer for a binary to emerge that triggers the remaining edge case. So just ignore it.

b.) and c.) are both unsatisfactory. For a.) it would be interesting to know if you have a feeling whether there are any bad surprises waiting down the line. On the first sight it seems like the return target information is worthless if the artificial return nodes are not generated so we might as well throw it away entirely (Indirect calls and call-other are handled differently and are not affected, right?). The question is rather why Ghidra generated it in the first place.

  sub_1                                                 sub_2           
 ┌───────────────────────────────────────────────┐      ┌─────────────┐ 
 │                                               │      │             │ 
 │                               ┌────────────┐  │      │ ┌────────┐  │ 
 │                               │            │  │      │ │        │  │ 
 │                 ┌─────┐       │ call sub_2 │  │      │ │  loop  │  │ 
 │                 │     │       │            │  │      │ │        │  │ 
 │                 │  C  │       └────────────┘  │      │ └───┬───▲┘  │ 
 │                 │     │                       │      │     │   │   │ 
 │                 └┬───┬┘              │ return │      │     └───┘   │ 
 │                  │   │               │        │      │             │ 
 │                  │   │                        │      └─────────────┘ 
 │                  │   │            ┌─────┐     │                      
 │                  │   │   C        │     │     │                      
 │                  │   └───────────►│  C  │     │                      
 │ ┌─────┐ not C    │                │     │     │                      
 │ │     ◄──────────┘                └─┬─┬─┘     │                      
 │ │ E_1 │               not C         │ │       │                      
 │ │     ◄─────────────────────────────┘ │       │                      
 │ └─────┘                               │ C     │                      
 │                                       │       │                      
 │                                   ┌───▼───┐   │                      
 │                                   │       │   │                      
 │                                   │  E_2  │   │                      
 │                                   │       │   │                      
 │                                   └───────┘   │                      
 │                                               │                      
 └───────────────────────────────────────────────┘                      

In this example the first C branch can be re-targeted which means that the second conditional block becomes orphaned as the call does not cause an incoming edge due to the missing return statement.

This unit test uses the above example to show that we can still trigger this bug under those conditions.

#[test]
fn call_return_to_cond_jump_removed() {
    let sub_1 = Sub {
        name: "sub_1".to_string(),
        calling_convention: None,
        blocks: vec![
            mock_condition_block("cond_blk_1", "cond_blk_2", "end_blk_1"),
            mock_block_with_defs_and_call("call_blk", "sub_2", "cond_blk_2"),
            mock_condition_block("cond_blk_2", "end_blk_2", "end_blk_1"),
            mock_block_with_defs("end_blk_1", "end_blk_1"),
            mock_block_with_defs("end_blk_2", "end_blk_2"),
        ],
    };
    let sub_1 = Term {
        tid: Tid::new("sub_1"),
        term: sub_1,
    };
    let sub_2 = Sub {
        name: "sub_2".to_string(),
        calling_convention: None,
        blocks: vec![mock_block_with_defs("loop_block", "loop_block")],
    };
    let sub_2 = Term {
        tid: Tid::new("sub_2"),
        term: sub_2,
    };
    let mut project = Project::mock_arm32();
    project.program.term.subs =
        BTreeMap::from([(Tid::new("sub_1"), sub_1), (Tid::new("sub_2"), sub_2)]);

    let cfg_before_normalization = graph::get_program_cfg(&project.program);
    cfg_before_normalization.print_compact_json();

    propagate_control_flow(&mut project);
    // construction of CFG would panic now
    //graph::get_program_cfg(&project.program);
    let expected_blocks = vec![
        // `cond_blk_1` can be retarget.
        mock_condition_block("cond_blk_1", "end_blk_2", "end_blk_1"),
        // `call_blk` can not be re-targetd since no condition is known.
        mock_block_with_defs_and_call("call_blk", "sub_2", "cond_blk_2"),
        // `cond_blk_2` was removed since cond_blk_1 was re-targeted.
        // Note: `call_blk` did not contribute an incoming edge since the
        // callee does not return.
        mock_block_with_defs("end_blk_1", "end_blk_1"),
        mock_block_with_defs("end_blk_2", "end_blk_2"),
    ];

    assert_eq!(
        &project.program.term.subs[&Tid::new("sub_1")].term.blocks[..],
        &expected_blocks[..]
    );
}

PS: How do you like the updated ToJsonCompact impl ;)

{
  "edge_counts": {
    "block": 6,
    "call": 1,
    "call_combine": 1,
    "cr_call_stub": 0,
    "cr_return_stub": 0,
    "extern_call_stub": 0,
    "jump": 7,
    "return_combine": 0,
    "total": 15
  },
  "edges": {
    "0 -> 1": "Block",
    "1 -> 4": "Jump",
    "1 -> 6": "Jump",
    "10 -> 11": "Block",
    "11 -> 10": "Jump",
    "12 -> 10": "Call",
    "2 -> 3": "Block",
    "3 -> 12": "CallCombine",
    "4 -> 5": "Block",
    "5 -> 6": "Jump",
    "5 -> 8": "Jump",
    "6 -> 7": "Block",
    "7 -> 6": "Jump",
    "8 -> 9": "Block",
    "9 -> 8": "Jump"
  },
  "node_counts": {
    "blk_end": 6,
    "blk_start": 6,
    "call_return": 0,
    "call_source": 1,
    "total": 13
  },
  "nodes": {
    "0": "BlkStart @ cond_blk_1 (sub sub_1)",
    "1": "BlkEnd @ cond_blk_1 (sub sub_1)",
    "10": "BlkStart @ loop_block (sub sub_2)",
    "11": "BlkEnd @ loop_block (sub sub_2)",
    "12": "CallSource @ loop_block (sub sub_2) (caller @ call_blk (sub sub_1))",
    "2": "BlkStart @ call_blk (sub sub_1)",
    "3": "BlkEnd @ call_blk (sub sub_1)",
    "4": "BlkStart @ cond_blk_2 (sub sub_1)",
    "5": "BlkEnd @ cond_blk_2 (sub sub_1)",
    "6": "BlkStart @ end_blk_1 (sub sub_1)",
    "7": "BlkEnd @ end_blk_1 (sub sub_1)",
    "8": "BlkStart @ end_blk_2 (sub sub_1)",
    "9": "BlkEnd @ end_blk_2 (sub sub_1)"
  }
}

@Enkelmann
Copy link
Contributor

Your updated ToJsonCompact format looks very good!

Right now, your option a) would probably result in the best CFG that we can generate. Just make sure to mention it prominently in the doc-comment of the normalization pass. Maybe also add a INFO-level log message for each function (or call?) deemed non-returning. Just to make it easier to spot real-world binaries where suspiciously many functions are marked as non-returning.

@vobst
Copy link
Collaborator Author

vobst commented May 8, 2024

Your updated ToJsonCompact format looks very good!

Nice, I included that in the first new commit.

Right now, your option a) would probably result in the best CFG that we can generate. Just make sure to mention it prominently in the doc-comment of the normalization pass. Maybe also add a INFO-level log message for each function (or call?) deemed non-returning. Just to make it easier to spot real-world binaries where suspiciously many functions are marked as non-returning.

I decided to split up the existing remove_references_to_nonexisting_tids_and_retarget_non_returning_calls pass into three passes:

  1. Unconditionally add the artificial sinks. (As far as I can see there is no problem with having them when they are not needed and it simplifies the code.)
  2. remove_references_to_nonexisting_tids: The part of the code in remove_references_to_nonexisting_tids_and_retarget_non_returning_calls that was responsible for that; it can now assume that artificial sinks always exist.
  3. retarget_non_returning_calls_to_artifical_sink: handles calls to external functions AND functions without a return instruction; it can now assume that artificial sinks always exist.

Here are some stats about how often this new pass strikes. All in all they suggest that it is a pretty rare thing. I included some reasons for false positives in the doc comment.

| $arch-$binary        | functions | non-returning | calls | non-returning |
|----------------------|-----------|---------------|-------|---------------|
| arm64-readelf        | 429       | 0             | 8497  | 0             |
| armel-readelf        | 365       | 0             | 8046  | 0             |
| armhf-readelf        | 369       | 0             | 7939  | 0             |
| mipsel-readelf       | 346       | 9             | 5213  | 27            |
| amd64-readelf        | 681       | 7             | 10094 | 0             |
| armel-ls             | 235       | 0             | 1190  | 0             |
| mipsel-ls            | 220       | 7             | 1409  | 14            |
| arm64-ls             | 348       | 0             | 1551  | 0             |
| x86-ls               | 338       | 10            | 1460  | 134           |
| armhf-ls             | 162       | 5             | 731   | 3             |
| amd64-ls             | 322       | 4             | 1400  | 0             |
| ppc64el-ls           | 435       | 1             | 2443  | 0             |
| x86-netfs.ko         | 37        | 6             | 79    | 0             |
| powerpc64le-netfs.ko | 109       | 10            | 364   | 0             |
| mips64r2el-netfs.ko  | 52        | 0             | 720   | 0             |
| mips32r2el-netfs.ko  | 54        | 0             | 790   | 0             |
| amd64-netfs.ko       | 62        | 0             | 310   | 0             |
| arm64-netfs.ko       | 55        | 0             | 194   | 0             |
| armhf-netfs.ko       | 54        | 0             | 227   | 0             |

However, these stats may also be a bit biased. On the IoT ip executable from the original issue we get:

total_fn = 366
total_non_ret_fn = 11
total_calls_with_ret = 3469
total_retargeted_calls = 414

Maybe something with the compiler+settings. In general it does no harm to retarget those calls as they can not be analyzed anyway - however - maybe we can address some of the root causes an another point.

vobst pushed a commit to vobst/cwe_checker that referenced this pull request May 8, 2024
The original fix for Issue fkie-cad#461 in Commit ("lib/ir/project: propagate
control flow for call returns") was incomplete.

The original problem was due to a call to a function without a return
instruction "returning" to a block that could be optimized away in the
propagate control flow pass. Retargeting the call return can only solve
the issue when the return block can be retargeted (and the retarget is
not optimized away), which is not the case for condition blocks.

Thus, always retarget returns from calls to functions without a ret
to the artificial sink.

Link: fkie-cad#462 (comment)
Signed-off-by: Valentin Obst <[email protected]>
vobst pushed a commit to vobst/cwe_checker that referenced this pull request May 8, 2024
…tion

Add a test to verify that retargeting returns from calls to non-returning
functions is indeed solving the problem this pass has with "dangling"
references to return sites.

Link: fkie-cad#462 (comment)
Signed-off-by: Valentin Obst <[email protected]>
@vobst vobst requested a review from Enkelmann May 8, 2024 09:34
Copy link
Contributor

@Enkelmann Enkelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the issue you mentioned yourself, everything looks good to me.

@vobst
Copy link
Collaborator Author

vobst commented May 14, 2024

Apart from the issue you mentioned yourself, everything looks good to me.

Thanks for the review! Then I'll now clean up the commit log and resolve the merge conflicts with the benchmarking PR.

Copy link
Collaborator Author

@vobst vobst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small nits.

Valentin Obst added 11 commits May 14, 2024 21:31
This patch does two things:

1. It allows the re-targeting of jumps for which no known true condition
   is available. Without a known condition, only blocks that consist of
   a single, unconditional jump can be skipped.
2. It allows the re-targeting of call returns in the same way that we
   already do it for unconditional jumps. For calls we never have a
   known condition as side-effects may invalidate any knowledge we have
   after the execution of all DEFs in the block.

Example:

Before the optimization we might have code like this:

  BLK [blk_0040a9c4]
    DEF [instr_0040a9c4_0] ra:4 = 0x40a9cc:4
    JMP [instr_0040a9c4_1] call sub_00403f80 ret blk_0040a9cc
  BLK [blk_0040a9cc]
    JMP [instr_0040a9cc_1] Jump to blk_0040a9d0
  BLK [blk_0040a9d0]
    DEF [instr_0040a9d0_0] a0:4 = ((0x43:4 << 0x10:4) + 0xffffb730:4)
    JMP [instr_0040a9d0_1] Jump to blk_0040a9d4

whereas after the optimization it becomes:

  BLK [blk_0040a9c4]
    DEF [instr_0040a9c4_0] ra:4 = 0x40a9cc:4
    JMP [instr_0040a9c4_1] call sub_00403f80 ret blk_0040a9d0
  BLK [blk_0040a9d0]
    DEF [instr_0040a9d0_0] a0:4 = ((0x43:4 << 0x10:4) + 0xffffb730:4)
    JMP [instr_0040a9d0_1] Jump to blk_0040a9d4

Fixes: 2487aac ("remove dead code originating from control flow propagation (fkie-cad#384)")
Closes: fkie-cad#461
Reported-by: https://github.com/ElDavoo
Signed-off-by: Valentin Obst <[email protected]>
…g edges

If a basic block has multiple incoming edges that are all conditioned on
the same condition, use this condition when retargeting the control flow
transfer at the end of the block.

Signed-off-by: Valentin Obst <[email protected]>
Remember precondition and branch condition when retrageting a block that
ends with a conditional jump.

Signed-off-by: Valentin Obst <[email protected]>
No functional changes. Hopefully.

Signed-off-by: Valentin Obst <[email protected]>
No functional changes.

Signed-off-by: Valentin Obst <[email protected]>
The original fix for Issue fkie-cad#461 in Commit ("lib/ir/project: propagate
control flow for call returns") was incomplete.

The original problem was due to a call to a function without a return
instruction "returning" to a block that could be optimized away in the
propagate control flow pass. Retargeting the call return can only solve
the issue when the return block can be retargeted (and the retarget is
not optimized away), which is not the case for condition blocks.

Thus, always retarget returns from calls to functions without a ret
to the artificial sink.

Link: fkie-cad#462 (comment)
Signed-off-by: Valentin Obst <[email protected]>
…tion

Add a test to verify that retargeting returns from calls to non-returning
functions is indeed solving the problem this pass has with "dangling"
references to return sites.

Link: fkie-cad#462 (comment)
Signed-off-by: Valentin Obst <[email protected]>
The pass that retargets "returns" from non-returning functions runs
after the block-to-sub mapping has been made unique. This invariant is
relied upon by later analyses.

Currently, the pass does not uphold this invariant since it always
retargets to the same global artificial sink block.
Modify the pass s.t. it preserves a unique block-to-sub mapping by
retargeting returns directly to the Sub's artificial sink.

Signed-off-by: Valentin Obst <[email protected]>
Copy link
Collaborator Author

@vobst vobst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like force push did not break anything.

@vobst vobst merged commit 2e04828 into fkie-cad:master May 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic in graph.rs because unwrap() of None
2 participants