-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded proofs of memory safety #146
Unbounded proofs of memory safety #146
Conversation
9a63610
to
2b0e454
Compare
source/core_json.c
Outdated
@@ -102,6 +120,13 @@ static size_t countHighBits( uint8_t c ) | |||
size_t i = 0; | |||
|
|||
while( ( n & 0x80U ) != 0U ) | |||
assigns( i, n ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what/why we are proving here. It's not memory safety because there are no memory accesses. It could be that we are proving that the loop computes the correct value, but then I'd expect to see a post condition after the loop or a function contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear because both the contract and assertions with more context for this invariant are in the caller function skipUTF8MultiByte
. We have a contract for skipUTF8
, which calls skipUTF8MultiByte
, which finally calls countHighBits
. The computational result from countHighBits
, if not constrained properly, may lead to spurious arithmetic overflows. As I refactor and clean up these invariants, I'll add more documentation for clarification.
@@ -1070,7 +1135,7 @@ static JSONStatus_t skipCollection( const char * buf, | |||
case '[': | |||
depth++; | |||
|
|||
if( depth == JSON_MAX_DEPTH ) | |||
if( depth >= JSON_MAX_DEPTH ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and the other local changes, is interesting. Was the code wrong before, or just harder to prove correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it was harder to prove correct. As you can see, depth
is part of our assigns clause. Thus, after havocing this variable and performing the inductive step, the code wasn't considering the case where depth
could be beyond JSON_MAX_DEPTH
(which wouldn't happen if we just iterate over the loop). We might want to change the invariant or make the code more cohesive. In this case, I opt for the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I should move all code alterations to a separate commit with proper documentation.
source/core_json.c
Outdated
@@ -1466,6 +1536,9 @@ static bool skipQueryPart( const char * buf, | |||
while( ( i < max ) && | |||
!isSeparator_( buf[ i ] ) && | |||
!isSquareOpen_( buf[ i ] ) ) | |||
assigns( i ) | |||
loopInvariant( i <= max ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to ask what we are proving here again since there is no memory access or post-condition ... but I think I get it now. We are just trying to get rid of all bounded analysis, so even though this loop is not interesting from a memory safety point of view we need it abstracted to we can claim we are doing unbounded memory safety. The invariant "true" would have worked equally well for that I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe we needed the invariant to prove the decreases clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right both times. We just to cross the line of unbounded analysis and we just want to prove termination :)
@@ -39,7 +39,7 @@ | |||
bool isValidBoundedBuffer( char * buf, | |||
size_t max ) | |||
{ | |||
return ( 0 < max && max < CBMC_MAX_BUFSIZE ) | |||
return ( 0U < max ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this wasn't your change, but that's a curious use of bit-wise and on the next line. What's that about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added a comment on top of the file about it.
/**
* When to use `&` vs `&&`:
* Prefer `cond1 & cond2` when `cond2` can be evaluated without errors without knowing that `cond1` holds: e.g. `( 0 < i ) & ( i < j )`.
*
* Use `cond1 && cond2` when `cond1` must be established first to ensure that `cond2` can be evaluated without error: e.g.
* `( allocated( p, size ) & ( 0 < i & i < size ) ) && p[i] > 0`.
*/
It mainly help us avoid existing performance issues during instrumentation when dealing with memory predicates and operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I know this is already all here, and not expecting you to go and change all of these
But I really dislike this coding style. It's rather hard for a human to follow, which is clear since there needs to be a comment on the top of the file explaining it.
Then I feel like a compiler is going to optimize this
return ( start >= old_start ) &&
( ( old_start < max ) ? ( start <= max ) : ( start == old_start ) );
To be, for all intents and purposes, the same as
bool retVal = false;
if ( ( start >= old start ) && ( old_start < max) )
{
retVal = ( start <= max );
}
else
{
retVal = ( start == old_start );
}
return ( retVal );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we use this style because the style you suggested leads to drastic performance drawbacks when CBMC's back-end is solving the mathematical formula generated from the code. We can focus on this PR to include the loop invariants and I can certainly open a GitHub issue so we can properly discuss what would be a more readable C-style for these specifications.
Signed-off-by: Felipe R. Monteiro <[email protected]>
af58865
to
2b71ebe
Compare
Signed-off-by: Felipe R. Monteiro <[email protected]>
In order to turn all bounded CBMC proofs into unbounded proofs, we must make some small changes to coreJSON. Basically, we want to avoid uninitialized variables (CBMC sees them as non-deterministic values) and make sure the code in loops is resilient enough when working with inputs of arbitrary size. The latter is important because we must annotate all loops with loop invariants. The implementation should work together with these annotations in order to prove memory safety for inputs of arbitrary size. Signed-off-by: Felipe R. Monteiro <[email protected]>
Modify existing CBMC proofs to remove any assumptions over the size of the input (a.k.a. bound). Instrument all loops with Hoare-style loop invariants and assigns clauses (define the modified memory within a loop), which CBMC uses to prove correctness. Annotate all loops using decreases clauses, which CBMC uses to prove termination. Signed-off-by: Felipe R. Monteiro <[email protected]>
2b71ebe
to
69b6d11
Compare
@@ -73,7 +73,7 @@ static void skipSpace( const char * buf, | |||
size_t * start, | |||
size_t max ) | |||
{ | |||
size_t i; | |||
size_t i = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly certain you don't need to mark 0 as 0U?
I believe it's unsigned by default in C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misra 2012 requires U. signed is default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Literal 0 is an integer constant of type "signed int" (ISO:C90 6.1.3.2 - Semantics).
79b2953
to
daa79b9
Compare
0f43e5c
to
c7f1b4f
Compare
Signed-off-by: Felipe R. Monteiro <[email protected]>
c7f1b4f
to
52918f1
Compare
@n9wxu could I get one more review? |
Description
We modified existing CBMC proofs to remove any assumptions over the size of the input (a.k.a. bound). We need to instrument all loops with Hoare-style loop invariants and assigns clauses (define the modified memory within a loop), which CBMC uses to prove correctness. We also annotate all loops using decreases clauses, which CBMC uses to prove termination.
We include a new proof for
multiSearch
function. That allow us to dramatically improve performance in CI. All CBMC proofs in CI now run under 3 minutes using less than 3GB of RAM (before they took up to 40 minutes and 40GB of RAM).Finally, we clean up all Makefiles.
Test Steps
See proof results.
Checklist:
Related Issue
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.