Fix & improve Propagate Control Flow normalization pass #462

vobst · 2024-05-03T10:39:13Z

Issue #461 discovered a problem with the Propagate Control Flow normalization pass: In rare cases it could happen that a basic block would be removed without re-targeting calls that return to it. link

This PR fixes the issue by allowing the re-targeting of call returns. Besides that, it includes a number of improvements to the optimization pass at large. In particular:

It adds support for the re-targeting of jumps without a known condition. Currently, a jump can only be re-targeted if some condition is known to be true whenever this branch is taken. This is necessary to resolve conditional branches, however, unconditional branches can be optimized away without a known condition.
It adds support for the re-targeting of call returns. This is what fixes panic in graph.rs because unwrap() of None #461. Note that we cannot assume that conditions remain true across calls due to possible side-effects so the first change is required to make this work.
It adds support to derive the block precondition from multiple incoming edges. Currently, a block precondition can only be derived if the block has a single incoming edge that stems from a conditional jump. However, we can still derive a precondition if there are multiple incoming edges that have the same condition.
It adds support to remember precondition and branch condition when re-targeting conditional jumps. Currently, we only remember the branch condition, but it turns out that there are cases where computing and remembering the precondition helps as well.
It updates the docs and makes minor code-style modernizations.

Testing

There is a unit test for each of the proposed changes that make the optimization more aggressive. Furthermore, the optimizations of the new pass were manually verified (at random) for three programs for three different architectures (MIPS, ARM32, AMD64).

Measurements

The following table includes a comparison of this PR and the current optimization pass. It show the number of basic block nodes and jump edges in the CFG of the unoptimized readelf IR program for different host architectures. Next to that are the fractions of basic blocks and jump edges that can be removed by the current and proposed optimization passes respectively. In short, the improved pass can remove ~1.8x more stuff from the CFG than the old one.

One interesting bit is that the optimization performs performs worst for MIPS, which was the architecture for which it was originally introduced.

|             | unoptimized           | master                   | PR                                                |
|             | BB        | Jumps     | dBB [%]     | dJumps [%] | dBB [%] | dJumps  [%] | impr. BB    | impr. Jumps |
|-------------|-----------|-----------|-------------|------------|---------|-------------|-------------|-------------|
| arm64       | 35,462.00 | 40,296.00 | 2.35        | 2.07       | 6.20    | 5.46        | 2.64        | 2.63        |
| armel       | 50,323.00 | 60,404.00 | 3.26        | 4.01       | 6.77    | 6.93        | 2.08        | 1.73        |
| armhf       | 50,385.00 | 61,036.00 | 3.58        | 4.49       | 6.89    | 7.22        | 1.92        | 1.61        |
| mipsel      | 27,784.00 | 31,661.00 | 0.63        | 0.58       | 0.69    | 0.64        | 1.09        | 1.09        |
| amd64       | 48,556.00 | 55,543.00 | 3.20        | 2.80       | 5.48    | 4.84        | 1.71        | 1.73        |
|-------------|-----------|-----------|-------------|------------|---------|-------------|-------------|-------------|
| avrg.       | 42,502.00 | 49,788.00 | 2.60        | 2.79       | 5.21    | 5.02        | 1.89        | 1.76        |
| std. dev.   | 10,321.60 | 13,143.66 | 1.19        | 1.56       | 2.58    | 2.64        | 0.56        | 0.56        |

Enkelmann

Except for some very small nitpicks, looks good to me.

By the way: The reason that it performs so bad on MIPS is probably because we still have a known error when parsing conditional assignment instructions in MIPS. Since the whole control flow path propagation exists mainly for improving the control flow in the presence of conditional assignments, it cannot do much if the instructions are not parsed correctly.

Edit: The bug fix might not be complete, see my other comment. You can probably check with some elaborated unit test whether I am right or not.

src/cwe_checker_lib/src/analysis/graph.rs

src/cwe_checker_lib/src/intermediate_representation/project/propagate_control_flow.rs

Enkelmann · 2024-05-03T17:31:21Z

I might have an idea why the old code did not catch the case: We use the CFG to check for incoming edges when removing blocks, right? But if a callee has no return instruction, then the CFG may not contain an edge to the return site. But the TID of the return site is nevertheless referenced in the call instruction. Your code solves that by retargeting the return site. In theory it might be possible that the return site cannot be retargeted, while all other edges to the return site are retargeted nevertheless. So I think the bug still persists, it is just less likely now...

vobst · 2024-05-03T19:40:27Z

I might have an idea why the old code did not catch the case: We use the CFG to check for incoming edges when removing blocks, right? But if a callee has no return instruction, then the CFG may not contain an edge to the return site. But the TID of the return site is nevertheless referenced in the call instruction. Your code solves that by retargeting the return site. In theory it might be possible that the return site cannot be retargeted, while all other edges to the return site are retargeted nevertheless. So I think the bug still persists, it is just less likely now...

Good catch! It is indeed not too difficult to construct this situation. Essentially a case where a call to a non-returning function is not recognized as such and "returns" to a conditional block that can be optimized away. See below for a concrete example.

I see a couple of options how to proceed:

a.) Add another, preceding normalization pass that recognizes calls that have Some return TID and for which no artificial return nodes are generated in the CFG for some reason. Set the return TID to None for those calls, essentially marking them non-returning.
b.) Keep track of all call-returns manually in the propagate CF pass and make sure that their target is only removed if they are re-targeted as well.
c.) It is already quite unlikely to hit the original case (at least it took us quite a while to notice) and will probably take much longer for a binary to emerge that triggers the remaining edge case. So just ignore it.

b.) and c.) are both unsatisfactory. For a.) it would be interesting to know if you have a feeling whether there are any bad surprises waiting down the line. On the first sight it seems like the return target information is worthless if the artificial return nodes are not generated so we might as well throw it away entirely (Indirect calls and call-other are handled differently and are not affected, right?). The question is rather why Ghidra generated it in the first place.

  sub_1                                                 sub_2           
 ┌───────────────────────────────────────────────┐      ┌─────────────┐ 
 │                                               │      │             │ 
 │                               ┌────────────┐  │      │ ┌────────┐  │ 
 │                               │            │  │      │ │        │  │ 
 │                 ┌─────┐       │ call sub_2 │  │      │ │  loop  │  │ 
 │                 │     │       │            │  │      │ │        │  │ 
 │                 │  C  │       └────────────┘  │      │ └───┬───▲┘  │ 
 │                 │     │                       │      │     │   │   │ 
 │                 └┬───┬┘              │ return │      │     └───┘   │ 
 │                  │   │               │        │      │             │ 
 │                  │   │                        │      └─────────────┘ 
 │                  │   │            ┌─────┐     │                      
 │                  │   │   C        │     │     │                      
 │                  │   └───────────►│  C  │     │                      
 │ ┌─────┐ not C    │                │     │     │                      
 │ │     ◄──────────┘                └─┬─┬─┘     │                      
 │ │ E_1 │               not C         │ │       │                      
 │ │     ◄─────────────────────────────┘ │       │                      
 │ └─────┘                               │ C     │                      
 │                                       │       │                      
 │                                   ┌───▼───┐   │                      
 │                                   │       │   │                      
 │                                   │  E_2  │   │                      
 │                                   │       │   │                      
 │                                   └───────┘   │                      
 │                                               │                      
 └───────────────────────────────────────────────┘

In this example the first C branch can be re-targeted which means that the second conditional block becomes orphaned as the call does not cause an incoming edge due to the missing return statement.

This unit test uses the above example to show that we can still trigger this bug under those conditions.

#[test]
fn call_return_to_cond_jump_removed() {
    let sub_1 = Sub {
        name: "sub_1".to_string(),
        calling_convention: None,
        blocks: vec![
            mock_condition_block("cond_blk_1", "cond_blk_2", "end_blk_1"),
            mock_block_with_defs_and_call("call_blk", "sub_2", "cond_blk_2"),
            mock_condition_block("cond_blk_2", "end_blk_2", "end_blk_1"),
            mock_block_with_defs("end_blk_1", "end_blk_1"),
            mock_block_with_defs("end_blk_2", "end_blk_2"),
        ],
    };
    let sub_1 = Term {
        tid: Tid::new("sub_1"),
        term: sub_1,
    };
    let sub_2 = Sub {
        name: "sub_2".to_string(),
        calling_convention: None,
        blocks: vec![mock_block_with_defs("loop_block", "loop_block")],
    };
    let sub_2 = Term {
        tid: Tid::new("sub_2"),
        term: sub_2,
    };
    let mut project = Project::mock_arm32();
    project.program.term.subs =
        BTreeMap::from([(Tid::new("sub_1"), sub_1), (Tid::new("sub_2"), sub_2)]);

    let cfg_before_normalization = graph::get_program_cfg(&project.program);
    cfg_before_normalization.print_compact_json();

    propagate_control_flow(&mut project);
    // construction of CFG would panic now
    //graph::get_program_cfg(&project.program);
    let expected_blocks = vec![
        // `cond_blk_1` can be retarget.
        mock_condition_block("cond_blk_1", "end_blk_2", "end_blk_1"),
        // `call_blk` can not be re-targetd since no condition is known.
        mock_block_with_defs_and_call("call_blk", "sub_2", "cond_blk_2"),
        // `cond_blk_2` was removed since cond_blk_1 was re-targeted.
        // Note: `call_blk` did not contribute an incoming edge since the
        // callee does not return.
        mock_block_with_defs("end_blk_1", "end_blk_1"),
        mock_block_with_defs("end_blk_2", "end_blk_2"),
    ];

    assert_eq!(
        &project.program.term.subs[&Tid::new("sub_1")].term.blocks[..],
        &expected_blocks[..]
    );
}

PS: How do you like the updated ToJsonCompact impl ;)

{
  "edge_counts": {
    "block": 6,
    "call": 1,
    "call_combine": 1,
    "cr_call_stub": 0,
    "cr_return_stub": 0,
    "extern_call_stub": 0,
    "jump": 7,
    "return_combine": 0,
    "total": 15
  },
  "edges": {
    "0 -> 1": "Block",
    "1 -> 4": "Jump",
    "1 -> 6": "Jump",
    "10 -> 11": "Block",
    "11 -> 10": "Jump",
    "12 -> 10": "Call",
    "2 -> 3": "Block",
    "3 -> 12": "CallCombine",
    "4 -> 5": "Block",
    "5 -> 6": "Jump",
    "5 -> 8": "Jump",
    "6 -> 7": "Block",
    "7 -> 6": "Jump",
    "8 -> 9": "Block",
    "9 -> 8": "Jump"
  },
  "node_counts": {
    "blk_end": 6,
    "blk_start": 6,
    "call_return": 0,
    "call_source": 1,
    "total": 13
  },
  "nodes": {
    "0": "BlkStart @ cond_blk_1 (sub sub_1)",
    "1": "BlkEnd @ cond_blk_1 (sub sub_1)",
    "10": "BlkStart @ loop_block (sub sub_2)",
    "11": "BlkEnd @ loop_block (sub sub_2)",
    "12": "CallSource @ loop_block (sub sub_2) (caller @ call_blk (sub sub_1))",
    "2": "BlkStart @ call_blk (sub sub_1)",
    "3": "BlkEnd @ call_blk (sub sub_1)",
    "4": "BlkStart @ cond_blk_2 (sub sub_1)",
    "5": "BlkEnd @ cond_blk_2 (sub sub_1)",
    "6": "BlkStart @ end_blk_1 (sub sub_1)",
    "7": "BlkEnd @ end_blk_1 (sub sub_1)",
    "8": "BlkStart @ end_blk_2 (sub sub_1)",
    "9": "BlkEnd @ end_blk_2 (sub sub_1)"
  }
}

Enkelmann · 2024-05-04T04:47:57Z

Your updated ToJsonCompact format looks very good!

Right now, your option a) would probably result in the best CFG that we can generate. Just make sure to mention it prominently in the doc-comment of the normalization pass. Maybe also add a INFO-level log message for each function (or call?) deemed non-returning. Just to make it easier to spot real-world binaries where suspiciously many functions are marked as non-returning.

vobst · 2024-05-08T09:16:48Z

Your updated ToJsonCompact format looks very good!

Nice, I included that in the first new commit.

Right now, your option a) would probably result in the best CFG that we can generate. Just make sure to mention it prominently in the doc-comment of the normalization pass. Maybe also add a INFO-level log message for each function (or call?) deemed non-returning. Just to make it easier to spot real-world binaries where suspiciously many functions are marked as non-returning.

I decided to split up the existing remove_references_to_nonexisting_tids_and_retarget_non_returning_calls pass into three passes:

Unconditionally add the artificial sinks. (As far as I can see there is no problem with having them when they are not needed and it simplifies the code.)
remove_references_to_nonexisting_tids: The part of the code in remove_references_to_nonexisting_tids_and_retarget_non_returning_calls that was responsible for that; it can now assume that artificial sinks always exist.
retarget_non_returning_calls_to_artifical_sink: handles calls to external functions AND functions without a return instruction; it can now assume that artificial sinks always exist.

Here are some stats about how often this new pass strikes. All in all they suggest that it is a pretty rare thing. I included some reasons for false positives in the doc comment.

| $arch-$binary        | functions | non-returning | calls | non-returning |
|----------------------|-----------|---------------|-------|---------------|
| arm64-readelf        | 429       | 0             | 8497  | 0             |
| armel-readelf        | 365       | 0             | 8046  | 0             |
| armhf-readelf        | 369       | 0             | 7939  | 0             |
| mipsel-readelf       | 346       | 9             | 5213  | 27            |
| amd64-readelf        | 681       | 7             | 10094 | 0             |
| armel-ls             | 235       | 0             | 1190  | 0             |
| mipsel-ls            | 220       | 7             | 1409  | 14            |
| arm64-ls             | 348       | 0             | 1551  | 0             |
| x86-ls               | 338       | 10            | 1460  | 134           |
| armhf-ls             | 162       | 5             | 731   | 3             |
| amd64-ls             | 322       | 4             | 1400  | 0             |
| ppc64el-ls           | 435       | 1             | 2443  | 0             |
| x86-netfs.ko         | 37        | 6             | 79    | 0             |
| powerpc64le-netfs.ko | 109       | 10            | 364   | 0             |
| mips64r2el-netfs.ko  | 52        | 0             | 720   | 0             |
| mips32r2el-netfs.ko  | 54        | 0             | 790   | 0             |
| amd64-netfs.ko       | 62        | 0             | 310   | 0             |
| arm64-netfs.ko       | 55        | 0             | 194   | 0             |
| armhf-netfs.ko       | 54        | 0             | 227   | 0             |

However, these stats may also be a bit biased. On the IoT ip executable from the original issue we get:

total_fn = 366
total_non_ret_fn = 11
total_calls_with_ret = 3469
total_retargeted_calls = 414

Maybe something with the compiler+settings. In general it does no harm to retarget those calls as they can not be analyzed anyway - however - maybe we can address some of the root causes an another point.

The original fix for Issue fkie-cad#461 in Commit ("lib/ir/project: propagate control flow for call returns") was incomplete. The original problem was due to a call to a function without a return instruction "returning" to a block that could be optimized away in the propagate control flow pass. Retargeting the call return can only solve the issue when the return block can be retargeted (and the retarget is not optimized away), which is not the case for condition blocks. Thus, always retarget returns from calls to functions without a ret to the artificial sink. Link: fkie-cad#462 (comment) Signed-off-by: Valentin Obst <[email protected]>

…tion Add a test to verify that retargeting returns from calls to non-returning functions is indeed solving the problem this pass has with "dangling" references to return sites. Link: fkie-cad#462 (comment) Signed-off-by: Valentin Obst <[email protected]>

src/cwe_checker_lib/src/intermediate_representation/project.rs

Enkelmann

Apart from the issue you mentioned yourself, everything looks good to me.

src/cwe_checker_lib/src/intermediate_representation/project.rs

vobst · 2024-05-14T10:55:23Z

Apart from the issue you mentioned yourself, everything looks good to me.

Thanks for the review! Then I'll now clean up the commit log and resolve the merge conflicts with the benchmarking PR.

vobst

Just a few small nits.

src/cwe_checker_lib/src/intermediate_representation/project.rs

src/cwe_checker_lib/src/intermediate_representation/term.rs

src/cwe_checker_lib/src/intermediate_representation/project/propagate_control_flow.rs

This patch does two things: 1. It allows the re-targeting of jumps for which no known true condition is available. Without a known condition, only blocks that consist of a single, unconditional jump can be skipped. 2. It allows the re-targeting of call returns in the same way that we already do it for unconditional jumps. For calls we never have a known condition as side-effects may invalidate any knowledge we have after the execution of all DEFs in the block. Example: Before the optimization we might have code like this: BLK [blk_0040a9c4] DEF [instr_0040a9c4_0] ra:4 = 0x40a9cc:4 JMP [instr_0040a9c4_1] call sub_00403f80 ret blk_0040a9cc BLK [blk_0040a9cc] JMP [instr_0040a9cc_1] Jump to blk_0040a9d0 BLK [blk_0040a9d0] DEF [instr_0040a9d0_0] a0:4 = ((0x43:4 << 0x10:4) + 0xffffb730:4) JMP [instr_0040a9d0_1] Jump to blk_0040a9d4 whereas after the optimization it becomes: BLK [blk_0040a9c4] DEF [instr_0040a9c4_0] ra:4 = 0x40a9cc:4 JMP [instr_0040a9c4_1] call sub_00403f80 ret blk_0040a9d0 BLK [blk_0040a9d0] DEF [instr_0040a9d0_0] a0:4 = ((0x43:4 << 0x10:4) + 0xffffb730:4) JMP [instr_0040a9d0_1] Jump to blk_0040a9d4 Fixes: 2487aac ("remove dead code originating from control flow propagation (fkie-cad#384)") Closes: fkie-cad#461 Reported-by: https://github.com/ElDavoo Signed-off-by: Valentin Obst <[email protected]>

…g edges If a basic block has multiple incoming edges that are all conditioned on the same condition, use this condition when retargeting the control flow transfer at the end of the block. Signed-off-by: Valentin Obst <[email protected]>

Signed-off-by: Valentin Obst <[email protected]>

Remember precondition and branch condition when retrageting a block that ends with a conditional jump. Signed-off-by: Valentin Obst <[email protected]>

Signed-off-by: Valentin Obst <[email protected]>

No functional changes. Hopefully. Signed-off-by: Valentin Obst <[email protected]>

No functional changes. Signed-off-by: Valentin Obst <[email protected]>

The original fix for Issue fkie-cad#461 in Commit ("lib/ir/project: propagate control flow for call returns") was incomplete. The original problem was due to a call to a function without a return instruction "returning" to a block that could be optimized away in the propagate control flow pass. Retargeting the call return can only solve the issue when the return block can be retargeted (and the retarget is not optimized away), which is not the case for condition blocks. Thus, always retarget returns from calls to functions without a ret to the artificial sink. Link: fkie-cad#462 (comment) Signed-off-by: Valentin Obst <[email protected]>

…tion Add a test to verify that retargeting returns from calls to non-returning functions is indeed solving the problem this pass has with "dangling" references to return sites. Link: fkie-cad#462 (comment) Signed-off-by: Valentin Obst <[email protected]>

The pass that retargets "returns" from non-returning functions runs after the block-to-sub mapping has been made unique. This invariant is relied upon by later analyses. Currently, the pass does not uphold this invariant since it always retargets to the same global artificial sink block. Modify the pass s.t. it preserves a unique block-to-sub mapping by retargeting returns directly to the Sub's artificial sink. Signed-off-by: Valentin Obst <[email protected]>

Link: fkie-cad#462 Signed-off-by: Valentin Obst <[email protected]>

vobst

Looks like force push did not break anything.

vobst force-pushed the issue_461 branch from 4e52cf8 to 31211b2 Compare May 3, 2024 15:02

vobst marked this pull request as ready for review May 3, 2024 15:03

Enkelmann self-requested a review May 3, 2024 15:38

Enkelmann suggested changes May 3, 2024

View reviewed changes

vobst requested a review from Enkelmann May 8, 2024 09:34

vobst commented May 8, 2024

View reviewed changes

src/cwe_checker_lib/src/intermediate_representation/project.rs Show resolved Hide resolved

Enkelmann approved these changes May 12, 2024

View reviewed changes

src/cwe_checker_lib/src/intermediate_representation/project.rs Show resolved Hide resolved

vobst commented May 14, 2024

View reviewed changes

Valentin Obst added 11 commits May 14, 2024 21:31

lib/ir/project: test control flow propagation for call returns

c990563

Signed-off-by: Valentin Obst <[email protected]>

lib/ir/project: CF-propagation with multiple conditions

7a98d2d

Remember precondition and branch condition when retrageting a block that ends with a conditional jump. Signed-off-by: Valentin Obst <[email protected]>

lib/analysis/graph: impl ToJsonCompact for Graph

58a951d

Signed-off-by: Valentin Obst <[email protected]>

lib/ir/project: code cleanup of CF-propagation pass

e9bd03d

No functional changes. Hopefully. Signed-off-by: Valentin Obst <[email protected]>

lib/ir/project: update docs of CF-propagation pass

5d2a31c

No functional changes. Signed-off-by: Valentin Obst <[email protected]>

CHANGES: improved Control Flow Propagation pass

395e0c2

Link: fkie-cad#462 Signed-off-by: Valentin Obst <[email protected]>

vobst force-pushed the issue_461 branch from 58f234c to 395e0c2 Compare May 14, 2024 19:33

vobst commented May 15, 2024

View reviewed changes

vobst merged commit 2e04828 into fkie-cad:master May 15, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix & improve Propagate Control Flow normalization pass #462

Fix & improve Propagate Control Flow normalization pass #462

vobst commented May 3, 2024 •

edited

Loading

Enkelmann left a comment •

edited

Loading

Enkelmann commented May 3, 2024

vobst commented May 3, 2024 •

edited

Loading

Enkelmann commented May 4, 2024

vobst commented May 8, 2024 •

edited

Loading

Enkelmann left a comment

vobst commented May 14, 2024

vobst left a comment •

edited

Loading

vobst left a comment

Fix & improve Propagate Control Flow normalization pass #462

Fix & improve Propagate Control Flow normalization pass #462

Conversation

vobst commented May 3, 2024 • edited Loading

Testing

Measurements

Enkelmann left a comment • edited Loading

Choose a reason for hiding this comment

Enkelmann commented May 3, 2024

vobst commented May 3, 2024 • edited Loading

Enkelmann commented May 4, 2024

vobst commented May 8, 2024 • edited Loading

Enkelmann left a comment

Choose a reason for hiding this comment

vobst commented May 14, 2024

vobst left a comment • edited Loading

Choose a reason for hiding this comment

vobst left a comment

Choose a reason for hiding this comment

vobst commented May 3, 2024 •

edited

Loading

Enkelmann left a comment •

edited

Loading

vobst commented May 3, 2024 •

edited

Loading

vobst commented May 8, 2024 •

edited

Loading

vobst left a comment •

edited

Loading