Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Region identification issues with array accesses #264

Open
Tracked by #270
l-kent opened this issue Oct 30, 2024 · 5 comments
Open
Tracked by #270

Region identification issues with array accesses #264

l-kent opened this issue Oct 30, 2024 · 5 comments
Assignees

Comments

@l-kent
Copy link
Contributor

l-kent commented Oct 30, 2024

For the test correct/initialisation/clang:BAP, the analysis gives regions with incorrect size for accesses %00000367 and %00000359.

000002fd: R8 := 0x11000
00000302: R9 := 0x11000
00000308: R9 := R9 + 0x40
0000030d: R11 := 0x11000
00000314: R10 := pad:64[mem[R11 + 0x30, el]:u32]
0000031a: R10 := pad:64[31:0[R10] + 1]
00000322: mem := mem with [R11 + 0x30, el]:u32 <- 31:0[R10]
00000327: R11 := 0x11000
0000032e: R10 := mem[R11 + 0x38, el]:u64
00000334: #4 := R10 - 2
00000339: VF := extend:65[#4 + 1] <> extend:65[R10] - 1
0000033e: CF := pad:65[#4 + 1] <> pad:65[R10] + 0xFFFFFFFFFFFFFFFF
00000342: ZF := #4 + 1 = 0
00000346: NF := 63:63[#4 + 1]
0000034a: R10 := #4 + 1
00000352: mem := mem with [R11 + 0x38, el]:u64 <- R10
00000359: R8 := pad:64[mem[R8 + 0x40, el]:u32]
0000035f: R8 := pad:64[31:0[R8] + 3]
00000367: mem := mem with [R9 + 4, el]:u32 <- 31:0[R8]
0000036c: R9 := 0x11000
00000373: R8 := pad:64[mem[R9 + 0x34]]
00000379: R8 := pad:64[31:0[R8] + 1]
00000381: mem := mem with [R9 + 0x34] <- 7:0[R8]
00000386: R0 := 0
0000038b: call R30 with noreturn

Here, the access at %00000367 is to the address 0x11044 (69700 in decimal). This is a 32-bit access or 4-byte access. The GRA gives this access the region data_11, which starts at 69700 and has a size of 8 bytes. This is an incorrect size - it should only be 4 bytes as this is the only time this address is accessed. These accesses are both accesses to an array global variable.

The 32-bit (4-byte) access at %00000359 is to the address 0x11040 (69696 in decimal). The GRA gives this access the region data_14, which starts at 69696 and has a size of 8 bytes. This is also an incorrect size - this is also the only time this address is accessed, so it should only have a size of 4 bytes. In addition, if both those sizes were correct, the regions should have been merged due to being overlapping.

For correct/initialisation/clang_pic:BAP, there are some region size issues similar to the previous example, but there is another error in the analysis as well.

This is the relevant parts of the .bir file:

00000305: R9 := 0x10000
0000030c: R9 := mem[R9 + 0xFC8, el]:u64

0000036b: R8 := pad:64[mem[R9, el]:u32]

00000379: mem := mem with [R9 + 4, el]:u32 <- 31:0[R8]

The access at %0000036b is incorrectly given a region with a size of 8 instead of 4 (it only has a 32-bit access) - this is the same region size issue with array accesses.

The access at %00000379 is incorrectly given a region with size 8 and start address 0x10FCC (69580). It seems to have gotten this by doing 0x10000 + 0xFC8 + 4, but the fact that 0x10FC8 (69576) is loaded from memory and the result has 4 added to it has been missed. The correct address is 0x11044 (69700) as mem[0x10FC8] == 0x11040 (69696).

@l-kent
Copy link
Contributor Author

l-kent commented Nov 4, 2024

The access at %00000379 is incorrectly given a region with size 8 and start address 0x10FCC (69580). It seems to have gotten this by doing 0x10000 + 0xFC8 + 4, but the fact that 0x10FC8 (69576) is loaded from memory and the result has 4 added to it has been missed. The correct address is 0x11044 (69700) as mem[0x10FC8] == 0x11040 (69696).

I'm not sure but I think this is likely to be related to the following sections of code:

case assign: Assign =>
val unwrapped = unwrapExprToVar(assign.rhs)
if (unwrapped.isDefined) {
// X1 = X2: [[X1]] = [[X2]]
val X1 = assign.lhs
val X2 = unwrapped.get
unify(IdentifierVariable(RegisterWrapperEqualSets(X1, getDefinition(X1, assign, reachingDefs))), IdentifierVariable(RegisterWrapperEqualSets(X2, getUse(X2, assign, reachingDefs))))
} else {

def unwrapExprToVar(expr: Expr): Option[Variable] = {
expr match {
case variable: Variable =>
Some(variable)
case e: Extract => unwrapExprToVar(e.body)
case e: SignExtend => unwrapExprToVar(e.body)
case e: ZeroExtend => unwrapExprToVar(e.body)
case repeat: Repeat => unwrapExprToVar(repeat.body)
case unaryExpr: UnaryExpr => unwrapExprToVar(unaryExpr.arg)
case binaryExpr: BinaryExpr => // TODO: incorrect
unwrapExprToVar(binaryExpr.arg1)
unwrapExprToVar(binaryExpr.arg2)
case memoryLoad: MemoryLoad => unwrapExprToVar(memoryLoad.index)
case _ =>
None
}
}

In the Steensgaard analysis, if a MemoryLoad is encountered in the RHS of an assignment, it will treat the index of the load as what is being assigned to the LHS. This is incorrect and not sound. The current implementation means that the Steensgaard analysis does not differentiate between the following:

R1 := R2
R1 := mem[R2]

There are also obvious issues relating to the handling of BinaryExpr which is stated to be incorrect.

@yousifpatti
Copy link
Contributor

This issue is related to VSA precisely when it tries to match the region "found" with the data from the relf file and this results in a partial match. The bug was that it takes the biggest size of both (the found region 4, and the partial match 8) which is not correct.

I have addressed this in 4e63388

Once solution checked this issue can be closed

@l-kent
Copy link
Contributor Author

l-kent commented Nov 10, 2024

Can you please place these fixes in their own branch, one that does not have other unrelated changes (such as the SASI VSA), and create a pull request for it?

@l-kent
Copy link
Contributor Author

l-kent commented Nov 11, 2024

You've fixed most of the current issues but that has exposed another underlying issue with the VSA for some cases, including the following:
correct/basic_lock_read/gcc:BAP
correct/basic_lock_read/gcc_pic:BAP
correct/initialisation/clang:BAP
correct/initialisation/clang_O2:BAP
correct/initialisation/clang_pic:BAP
correct/initialisation/gcc:BAP
correct/initialisation/gcc_O2:BAP
correct/initialisation/gcc_pic:BAP

For correct/initialisation/clang:BAP, the issue happens when GlobalRegionAnalysis reaches the line %0000035f in the second iteration (after the VSA). The relevant parts of the .bir are as follows:

000002fd: R8 := 0x11000

00000359: R8 := pad:64[mem[R8 + 0x40, el]:u32]
0000035f: R8 := pad:64[31:0[R8] + 3]

The VSA has that R8 = 69696 (0x11040) at %0000035f, which is incorrect as that address has just been accessed in the previous line. The actual value of R8 at %0000035f is 1 since that is the value that the address 0x11040 is initialised to, but it would be sufficient here if the VSA just had that R8 = top at %0000035f. Currently the VSA gives the completely wrong value, which causes the GRA to create a region from 69696 + 3 (69699) which overrides the correct region at 69696 that we care about in the MMM.

@yousifpatti
Copy link
Contributor

This issue has been addresses here:
#270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants