Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null Byte Array Domain #1076

Merged
merged 124 commits into from
Dec 12, 2023
Merged

Conversation

nathanschmidt
Copy link
Collaborator

@nathanschmidt nathanschmidt commented Jun 6, 2023

This PR implements a Null Byte Array Domain tracking all indexes of array elements thay may be null and must be null. Additionally, it also memorizes the size of the array. It is always run as part of the AttributeConfiguredArrayDomain, which allowed an easy extension of the cases for the functions listed above in the base analysis.

This domain makes it possible to gain meaningful information for strings, especially concerning the effects of the functions strcpy, strncpy, strcat, strncat, strlen, strstr, strcmp and strncmp. On top, the domain tries to identify possible and sometimes even certain buffer overflows.

TODOs:

  • map all array elements that are read-in to null () resp. not_null ()
  • tackle TODO annotations in code
  • write test cases for both sound uses of the functions and scenarios for which warnings should be triggered

src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
src/cdomains/arrayDomain.ml Fixed Show fixed Hide fixed
@michael-schwarz michael-schwarz requested review from michael-schwarz and sim642 and removed request for sim642 June 7, 2023 09:15
@michael-schwarz
Copy link
Member

I'll review the things in base and the NullByteDomain for now, we agreed that the rest needs some changes.

nathanschmidt and others added 9 commits June 8, 2023 12:38
- Updated null recognition in Compound of valueDomain
- strstr analysis can now detect NULL ptr
- fixed get of AttributeConfiguredArrayDomain
- Null Byte domain can now be called for all wished functions in base
and values are correctly updated
- Base now sets dest to top if string functions receive an array as dest
and a string literal as src
- Added function setting whole array content to top but still memorizing
type and size
- Fixed inverted comparisons in string_copy
- Fixed wrong claim in string_comparison
@michael-schwarz michael-schwarz marked this pull request as ready for review November 28, 2023 20:58
src/common/util/options.schema.json Outdated Show resolved Hide resolved
tests/regression/73-strings/03-string_basics.c Outdated Show resolved Hide resolved
Comment on lines +45 to +47
module MaySet = struct
module M = SetDomain.ToppedSet (IntDomain.BigInt) (struct let topname = "All Null" end)
include M
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing prevents this set from growing unbounded, does it?

Copy link
Member

@michael-schwarz michael-schwarz Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no such guarantee. There's some attempt to represent sets that comprise the full range by top, but that is not done consistently.
My ambition at some point was to decouple the set nature of this from the analysis to admit other abstractions here (e.g. our int domains). But I did not get around to doing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we already have interval sets and they're simply a generalization of this, so that should be possible but is probably too much work for now.

Comment on lines 2306 to 2308
(* if s string literal, compute strlen in string literals domain *)
if AD.type_of a = charPtrType then
Int (AD.to_string_length a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polymorphic comparison of types. But also, I'm not sure how reliable this type-based check is because looks like it's trying to check for StrPtr. But there may be a char* which isn't StrPtr.

src/analyses/base.ml Outdated Show resolved Hide resolved
Comment on lines 2192 to 2203
let address_from_value (v:value) = match v with
| Address a ->
let rec lo = function
| `Index (i, `NoOffset) -> `NoOffset
| `NoOffset -> `NoOffset
| `Field (f, o) -> `Field (f, lo o)
| `Index (i, o) -> `Index (i, lo o) in
let rmLastOffset = function
| Addr.Addr (v, o) -> Addr.Addr (v, lo o)
| other -> other in
AD.map rmLastOffset a
| _ -> raise (Failure "String function: not an address")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can the last index just be ignored unconditionally. I find it odd because this looks like Offset.remove_offset but that one also consistently removes fields.

Also, this is just called address_from_value and exposed in quite large scope, which is prone to misuse. If the logic is indeed exactly as needed, then this should have a different name or be withint string_manipulation or whereever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the framework has tacked on a [0] offset to string pointer somewhere, but we do want the raw pointer here. You are probably right about naming and moving to a different scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be, but there must be a more reliable way to know when this needs to be dropped. Because this also drops such indices when they explicitly occur in the program, rather than being inserted by CIL, and then we would likely do something wrong. For example, what about an array of strings (i.e. multi-dimensional array of chars).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not investigated this in detail, but agree someone should ideally do so.

Comment on lines 26 to +27
val invalidate_value: VDQ.t -> typ -> t -> t
val invalidate_abstract_value: t -> t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a different thing here? If we're dealing with strings, surely we know the type, no?

@sim642 sim642 self-requested a review December 5, 2023 13:39
@sim642 sim642 added this to the v2.4.0 milestone Dec 12, 2023
@sim642 sim642 merged commit c90eb12 into goblint:master Dec 12, 2023
9 of 10 checks passed
sim642 added a commit that referenced this pull request Dec 12, 2023
sim642 added a commit that referenced this pull request Dec 12, 2023
sim642 added a commit to sim642/opam-repository that referenced this pull request Aug 2, 2024
CHANGES:

* Remove unmaintained analyses: spec, file (goblint/analyzer#1281).
* Add linear two-variable equalities analysis (goblint/analyzer#1297, goblint/analyzer#1412, goblint/analyzer#1466).
* Add callstring, loopfree callstring and context gas analyses (goblint/analyzer#1038, goblint/analyzer#1340, goblint/analyzer#1379, goblint/analyzer#1427, goblint/analyzer#1439).
* Add non-relational thread-modular value analyses with thread IDs (goblint/analyzer#1366, goblint/analyzer#1398, goblint/analyzer#1399).
* Add NULL byte array domain (goblint/analyzer#1076).
* Fix spurious overflow warnings from internal evaluations (goblint/analyzer#1406, goblint/analyzer#1411, goblint/analyzer#1511).
* Refactor non-definite mutex handling to fix unsoundness (goblint/analyzer#1430, goblint/analyzer#1500, goblint/analyzer#1503, goblint/analyzer#1409).
* Fix non-relational thread-modular value analysis unsoundness with ambiguous points-to sets (goblint/analyzer#1457, goblint/analyzer#1458).
* Fix mutex type analysis unsoundness and enable it by default (goblint/analyzer#1414, goblint/analyzer#1416, goblint/analyzer#1510).
* Add points-to set refinement on mutex path splitting (goblint/analyzer#1287, goblint/analyzer#1343, goblint/analyzer#1374, goblint/analyzer#1396, goblint/analyzer#1407).
* Improve narrowing operators (goblint/analyzer#1502, goblint/analyzer#1540, goblint/analyzer#1543).
* Extract automatic configuration tuning for soundness (goblint/analyzer#1369).
* Fix many locations in witnesses (goblint/analyzer#1355, goblint/analyzer#1372, goblint/analyzer#1400, goblint/analyzer#1403).
* Improve output readability (goblint/analyzer#1294, goblint/analyzer#1312, goblint/analyzer#1405, goblint/analyzer#1497).
* Refactor logging (goblint/analyzer#1117).
* Modernize all library function specifications (goblint/analyzer#1029, goblint/analyzer#688, goblint/analyzer#1174, goblint/analyzer#1289, goblint/analyzer#1447, goblint/analyzer#1487).
* Remove OCaml 4.10, 4.11, 4.12 and 4.13 support (goblint/analyzer#1448).
avsm pushed a commit to avsm/opam-repository that referenced this pull request Sep 5, 2024
CHANGES:

* Remove unmaintained analyses: spec, file (goblint/analyzer#1281).
* Add linear two-variable equalities analysis (goblint/analyzer#1297, goblint/analyzer#1412, goblint/analyzer#1466).
* Add callstring, loopfree callstring and context gas analyses (goblint/analyzer#1038, goblint/analyzer#1340, goblint/analyzer#1379, goblint/analyzer#1427, goblint/analyzer#1439).
* Add non-relational thread-modular value analyses with thread IDs (goblint/analyzer#1366, goblint/analyzer#1398, goblint/analyzer#1399).
* Add NULL byte array domain (goblint/analyzer#1076).
* Fix spurious overflow warnings from internal evaluations (goblint/analyzer#1406, goblint/analyzer#1411, goblint/analyzer#1511).
* Refactor non-definite mutex handling to fix unsoundness (goblint/analyzer#1430, goblint/analyzer#1500, goblint/analyzer#1503, goblint/analyzer#1409).
* Fix non-relational thread-modular value analysis unsoundness with ambiguous points-to sets (goblint/analyzer#1457, goblint/analyzer#1458).
* Fix mutex type analysis unsoundness and enable it by default (goblint/analyzer#1414, goblint/analyzer#1416, goblint/analyzer#1510).
* Add points-to set refinement on mutex path splitting (goblint/analyzer#1287, goblint/analyzer#1343, goblint/analyzer#1374, goblint/analyzer#1396, goblint/analyzer#1407).
* Improve narrowing operators (goblint/analyzer#1502, goblint/analyzer#1540, goblint/analyzer#1543).
* Extract automatic configuration tuning for soundness (goblint/analyzer#1369).
* Fix many locations in witnesses (goblint/analyzer#1355, goblint/analyzer#1372, goblint/analyzer#1400, goblint/analyzer#1403).
* Improve output readability (goblint/analyzer#1294, goblint/analyzer#1312, goblint/analyzer#1405, goblint/analyzer#1497).
* Refactor logging (goblint/analyzer#1117).
* Modernize all library function specifications (goblint/analyzer#1029, goblint/analyzer#688, goblint/analyzer#1174, goblint/analyzer#1289, goblint/analyzer#1447, goblint/analyzer#1487).
* Remove OCaml 4.10, 4.11, 4.12 and 4.13 support (goblint/analyzer#1448).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants