From 0ea80195f722fbc86338abe7b8ca16932939f56a Mon Sep 17 00:00:00 2001 From: Vitaliy <42554566+Leosimetti@users.noreply.github.com> Date: Sun, 27 Mar 2022 20:57:39 +0500 Subject: [PATCH 1/3] docs: .md files with general description and description of the third defect type --- .../unjustified_assumption_detection.md | 313 ++++++++++++++++++ docs/general_information.md | 42 +++ 2 files changed, 355 insertions(+) create mode 100644 docs/analysis/unjustified_assumption_detection.md create mode 100644 docs/general_information.md diff --git a/docs/analysis/unjustified_assumption_detection.md b/docs/analysis/unjustified_assumption_detection.md new file mode 100644 index 00000000..778a08e5 --- /dev/null +++ b/docs/analysis/unjustified_assumption_detection.md @@ -0,0 +1,313 @@ +# Odin: Detection Of Unjustified Assumptions In Subclasses + +For this type of defect we detect whether the refactoring of superclasses by inlining can affect the functionality of subclasses in an undesired way. The notion of unjustified assumption in this context refers to assumptions made in subclasses regarding method dependencies in superclasses. + + + +## Brief algorithm description +The following stages and steps take place during program analysis: +#### Setting locators +0. The source code of the program is parsed to build its corresponding AST. +1. Identifiers without explicitly set locators get locators derived and set. +#### Inlining +2. Object hierarchy is analyzed to set apart methods that comply with specific criteria (can be found below) and are not redefined. +3. Every method in the resulting set is paired with its version in which all valid calls (criteria can be found below) to other methods of the containing object are inlined non-recursively. +#### Logic extraction +4. Logical properties (constraints on variables in the form of a logical expression) are extracted for both versions of the method in the pair. +5. The derived expression for the version before inlining is connected with the expression after inlining by the means of implication. Doing so allows us to discover whetner the constraints on the variables have become weaker or stronger. +6. We deduce that the unjustified assumption takes place if the resulting logical expression is false, meaning that there exist some inputs such that they worked before the inlining, and stopped working after. + + +## In-depth description of algorithms + +### Requirements for the methods to be chosen for analysis: + +1. Its value is an object term with void self and attached @ attributes +2. There are no references to its @ attribute in any of its attached attributes. + +### Requirements for the call to be considered valid during inlining: + +1. The called method belongs to the same object as the method containing the call. +2. The call the general form `self.method self ...`. Meaning that `self` is used as both the source for the method and is passed as the first argument. + +### Algorith for setting locators: +0. Explicit locator chains of from (^.^.value) are resolved at parse-time and are converted to corresponding AST-nodes. +1. All predefined EO-keywords are stored in a special structure called 'context' and have locators point to the root of the program. +2. The first step of actual processing extends the existing context by adding all top-level objects. +3. Each top-level object is recursiely explored while keeping keeping track of encountered objects and their depth in the context. +4. The terms that refer to the definitions in the current scope are given 0-locators (\$), while terms contained in the context are given locators by subtracting their depth from current depth. + +Before: +``` +[] > obj + [self] > method + self > @ + [method] > shadowedMethod + method > @ + [] > notShadowedMethod + method > @ + +[] > notShadowedObj + obj > @ + +[] > shadowedObj + [obj] > method + [] > innerMethod + obj > @ + obj > @ + +[] > outer + [] > self + 256 > magic + [] > dummy + [outer] > dummyMethod + outer > @ + outer.self > @ + self "yahoo" > @ + [self] > method + self.magic > @ +``` +After: +``` +[] > obj + [self] > method + $.self > @ + [method] > shadowedMethod + $.method > @ + [] > notShadowedMethod + ^.method > @ +[] > notShadowedObj + ^.obj > @ +[] > shadowedObj + [obj] > method + [] > innerMethod + ^.obj > @ + $.obj > @ +[] > outer + [] > self + 256 > magic + [] > dummy + [outer] > dummyMethod + $.outer > @ + ^.^.^.outer.self > @ + ^.self > @ + "yahoo" + [self] > method + $.self.magic > @ +``` +### Inlining algorithm: +0. All objects are extracted from the AST +1. Every AST-object is converted into the 'Object' data strucutre +2. Each created 'Object' is processed as follows: + 2.1 All calls in the object are processed, while non-call binds are returned as is. + 2.1.1 Replace occurrences of formal parameters in the method body with the argument values. + 2.1.2 If the method being inlined contains local bindings other than `@` attribute, extract them into the `local-attrs` object and put this object adjacent to the call-site. + 2.1.3. References to local attributes are replaced with their counterparts from the `local-attrs` object. + 2.1.4. The call is replaced with the value of the called method's `@` attribute. +3. If the conversion was successful, the modified Object is returned. Otherwise, the list of errors is returned. + +Before: +``` +[] > obj + [self arg1 arg2 arg3] > average3 + arg1.add (arg2.add arg3) > sum + 3 > count + sum.div count > average + [] > @ + ^.sum > sum + ^.count > count + ^.average > average + [self] > call-site + self.average3 self 1 2 3 > @ +``` +After: +``` +[] > obj + [self arg1 arg2 arg3] > average3 + $.arg1.add > sum + $.arg2.add + $.arg3 + 3 > count + $.sum.div > average + $.count + [] > @ + ^.sum > sum + ^.count > count + ^.average > average + [self] > call-site + [] > local_average3 + 1.add > sum + 2.add + 3 + 3 > count + $.sum.div > average + $.count + [] > @ + ^.local_average3.sum > sum + ^.local_average3.count > count + ^.local_average3.average > average +``` + +### Alogrithm for logical expression extraction: +0. The list of recognized terms and their corresponding expressions/properties: + ![](https://i.imgur.com/PBFB7ac.png) +1. All methods that can be called from the target method are collected and stored. +2. Properties and values of each function in the set are derived according to the above rules. This is done to account for their potential calls. The resulting SMT-code has all properties and returned expressions of every functions listed at the beginning of top of the file as function definitions. +3. The target method and its version with inlined calls are recursively processed on field-by-field basis. +4. The acquired logical expressions are connected using the implication operation and assertion in the following manner: `assert(logic-before-inlining => logic-after-inlining)`. +5. The resulting formula is then complemented by ``(check-sat)`` and ``(get-model)`` commands and sent the SMT-solver for further processing. + +Program: +``` +[] > test + [] > parent + [self x] > f + x.sub 5 > y1 + seq > @ + assert (0.less y1) + x + [self y] > g + self.f self y > @ + [self z] > h + z > @ + [] > child + parent > @ + [self y] > f + y > @ + [self z] > h + self.g self z > @ +``` +Resulting SMT-code: +```clojure= +(define-fun value-of-before-f ((var-before-y Int)) Int var-before-y) +(define-fun properties-of-before-f ((var-before-y Int)) Bool (and true true)) +(define-fun value-of-before-g ((var-before-y Int)) Int (value-of-before-f var-before-y)) +(define-fun properties-of-before-g ((var-before-y Int)) Bool (and (properties-of-before-f var-before-y) true)) +(define-fun value-of-before-h ((var-before-z Int)) Int (value-of-before-g var-before-z)) +(define-fun properties-of-before-h ((var-before-z Int)) Bool (and (properties-of-before-g var-before-z) true)) +(define-fun value-of-after-f ((var-after-y Int)) Int var-after-y) +(define-fun properties-of-after-f ((var-after-y Int)) Bool (and true true)) +(define-fun value-of-after-g ((var-after-y Int)) Int var-after-y) +(define-fun properties-of-after-g ((var-after-y Int)) Bool (exists ((var-after-local_f Int)(var-after-local_f-y1 Int)) (and (and (and (and true true) (< 0 var-after-local_f-y1)) true) (and (and (and true true) (= var-after-local_f-y1 (- var-after-y 5))) true)))) +(define-fun value-of-after-h ((var-after-z Int)) Int (value-of-after-f var-after-z)) +(define-fun properties-of-after-h ((var-after-z Int)) Bool (and (properties-of-after-f var-after-z) true)) +(assert (forall ((var-before-y Int)) (exists ((var-after-y Int)) (and (and true (= var-before-y var-after-y)) (=> (and (properties-of-before-f var-before-y) true) (exists ((var-after-local_f Int)(var-after-local_f-y1 Int)) (and (and (and (and true true) (< 0 var-after-local_f-y1)) true) (and (and (and true true) (= var-after-local_f-y1 (- var-after-y 5))) true)))))))) + +(check-sat) +(get-model) +``` + +## Current Limitations: +1. Method pairs for inspection are formed by considering only direct descendants. For example, in case of an inheritance chain A -> B -> C only combinations (A,B) and (B,C) are examined. It would be ideal to examine all possible combinations. So, in the given example an additional pair (A,C) is to be examined. + +2. Presence of methods with mutual recursion causes the program to be unable to detect defects of related to unjistifed assumptions. +3. Current implementation makes it impossible to store boolean values in object fields. Only integer values are supported as of now. +4. No checks are performed on the contents of asserts in EO. As such, it is possible to assert an integer value, which would cause unanticipated behavior. For example, `assert (3.add 5)`. +5. Imported functions are not supported by the analyzer. +6. Methods without arguments are not yet supported. + +## Possible Errors +The error messages that the analyser can fail with at each stage are listed here. + +### Setting locators +#### 1. The analyzer encounters a reference to a non-existent object. +``` +[] > a + [self] > method + b > @ +``` +> `Exception in thread "main" java.lang.Exception: Could not set locator for non-existent object with name b` + + +### Inlining +#### 1. Decoratee is not a valid object. +``` +1 < non-existent +[] > a + non-existent > @ +``` +> `java.lang.Exception: There is no such parent with name "^.non-existent"` + + +#### 2. Locators point too broadly. +``` +[] > b +[] > a + ^.^.b > @ +``` +> `java.lang.Exception: Locator overshoot at ^.^.b ` + + + +#### 3. Attempt to call a method that does not exist. +``` +[] > a + [self] > f + self.non-existent self > @ +``` +> `java.lang.Exception: Inliner: attempt to call non-existent method "non-existent"` + + +#### 4. Attempt to call a method with the wrong number of arguments. +``` +[] > a + [self a b] > f + a.add b > @ + [self] > g + self.f self 1 > @ + [self] > h + self.f self 1 2 3 > @ +``` +> `java.lang.Exception: +Wrong number of arguments given for method f. +Wrong number of arguments given for method f. +` + + + + +### Logic extraction +#### 1. Objects with void attributes as local definitions of methods are not supported. +``` +[self x] > f + 22 > local-def + [void] > local-obj +``` +> `Exception in thread "main" java.lang.Exception: object with void attributes are not supported yet!` . + +#### 2. Attempt to access attributes of applications. +``` + [] > parent + [self x] > f + (self.g 10).@ > res + 10 > @ + [self y] > g + y.add 1 > @ +``` +> `Exception in thread "main" java.lang.Exception: Cannot analyze dot of app: (self.g 10).@` . +#### 3. Using a primitive that is not yet supported. +``` +[self] > method + some-unsupported-primitive 10 > @ +``` +> `Exception in thread "main" java.lang.Exception: Unsupported 1-ary primitive some-unsupported-primitive` . + +``` +[self x] > method + x.some-unsupported-primitive 10 > @ +``` +> `Exception in thread "main" java.lang.Exception: Unsupported 1-ary primitive .some-unsupported-primitive` . +#### 4. Call to a method with no arguments. +``` + [] > parent + [self x] > f + (self.g 10).@ > res + 10 > @ + [self y] > g + y.add 1 > @ +``` +> `Exception in thread "main" java.lang.Exception: Methods with no arguments are not supported` . +#### 5. Some SMT-related issue. +> `Exception in thread "main" java.lang.Exception: SMT solver failed with error: ??? .` + +> `Exception in thread "main" java.lang.Exception: SMT failed to parse the generated program with error: ???` . diff --git a/docs/general_information.md b/docs/general_information.md new file mode 100644 index 00000000..7a5ded50 --- /dev/null +++ b/docs/general_information.md @@ -0,0 +1,42 @@ +# Object Dependency INspector (ODIN) +Tool to inspect EO programs for potential defects. Additionaly, it can be used to analyses programs in Java and C++ if they are first passed through such transpilers as `J2EO` and `C2EO`. + +## Detectable Defects +### 0. Divison by zero +> Instances of division by zero. (implementation is available in polystat) +### 1. Unanticipaded mutual recursion +> Mutual recursion caused by method redefinition during inheritance. + +(refer to [this document](https://github.com/polystat/odin/blob/master/docs/analysis/mutual_recursion_analyzer.md) for more information) +### 2. Unjustifed assumptions about methods of superclasses +> Assumptions made in subclasses regarding method dependencies in superclasses. + +(refer to [this document](https://github.com/polystat/odin/blob/master/docs/analysis/unjustified_assumption_detection.md) for more information) +## Assumptions and Limitations +Some assumptions are made about EO programs and used EO constructs during analysis for all type of defects. Additionally, some constructs and syntax are not supported intentionally. + +### 1. No support for named arguments +The syntax is deemed unnecessary in the scope of analysis, as the target programs generated by `J2EO` and `C2EO` code transpilers do not rely on it. +For example, the following code snippet cannot be processed by Odin: +``` +distance. + point + 5:x + -3:y + point:to + 13:x + 3.9:y +``` + +### 2. No support for multilene attribute access +As with above, the syntax is deemed unnecessary in the scope of analysis, as the target programs generated by `J2EO` and `C2EO` code transpilers do not rely on it. +As such, the following code snippet will not be regonised and process by Odin: +``` +dx.pow 2 + .add + dy.pow 2 + .sqrt > length +``` + +### 3. Multi-file EO programs are only partially supported +While, the imports are recognised by the analyzer in the form of `+alias optional-alias imoprt` statements, the actual code behind them is not used during analysis. More than that, the only instance when imports are considered is during the setting-locators stage of the third defect analysis. Even then, the information is not used in any meaningful way. \ No newline at end of file From 03f848019b3f310bb60bdebd75e794e756f8f599 Mon Sep 17 00:00:00 2001 From: Vitaliy <42554566+Leosimetti@users.noreply.github.com> Date: Sun, 27 Mar 2022 21:10:16 +0500 Subject: [PATCH 2/3] docs: updated links to documentation --- README.md | 6 +++++- ...tion_detection.md => unjustified_assumption_analyzer.md} | 0 docs/general_information.md | 4 ++-- 3 files changed, 7 insertions(+), 3 deletions(-) rename docs/analysis/{unjustified_assumption_detection.md => unjustified_assumption_analyzer.md} (100%) diff --git a/README.md b/README.md index 9c1e07ac..cadb1413 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,11 @@ Odin (object dependency inspector) — a static analyzer for [EO programming lan > The project is still in active development stage and might not be usable or fully documented yet. # Documentation -Documentation for the mutual recursion analyzer is available [here](docs/analysis/mutual_recursion_analyzer.md) +General documentation is available [here](docs/general_information.md). + +Documentation for the mutual recursion analyzer is available [here](docs/analysis/mutual_recursion_analyzer.md). + +Documentation for the unjustified assumption analyzer is available [here](docs/analysis/unjustified_assumption_analyzer.md). # Running diff --git a/docs/analysis/unjustified_assumption_detection.md b/docs/analysis/unjustified_assumption_analyzer.md similarity index 100% rename from docs/analysis/unjustified_assumption_detection.md rename to docs/analysis/unjustified_assumption_analyzer.md diff --git a/docs/general_information.md b/docs/general_information.md index 7a5ded50..477ac177 100644 --- a/docs/general_information.md +++ b/docs/general_information.md @@ -7,11 +7,11 @@ Tool to inspect EO programs for potential defects. Additionaly, it can be used t ### 1. Unanticipaded mutual recursion > Mutual recursion caused by method redefinition during inheritance. -(refer to [this document](https://github.com/polystat/odin/blob/master/docs/analysis/mutual_recursion_analyzer.md) for more information) +(refer to [this document](analysis/mutual_recursion_analyzer.md) for more information) ### 2. Unjustifed assumptions about methods of superclasses > Assumptions made in subclasses regarding method dependencies in superclasses. -(refer to [this document](https://github.com/polystat/odin/blob/master/docs/analysis/unjustified_assumption_detection.md) for more information) +(refer to [this document](analysis/unjustified_assumption_analyzer.md) for more information) ## Assumptions and Limitations Some assumptions are made about EO programs and used EO constructs during analysis for all type of defects. Additionally, some constructs and syntax are not supported intentionally. From f8ab1803358dca8154c85885a25ca720d5c83e1b Mon Sep 17 00:00:00 2001 From: Vitaliy <42554566+Leosimetti@users.noreply.github.com> Date: Sun, 27 Mar 2022 21:13:01 +0500 Subject: [PATCH 3/3] docs: fixed typos --- docs/analysis/unjustified_assumption_analyzer.md | 12 ++++++------ docs/general_information.md | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/analysis/unjustified_assumption_analyzer.md b/docs/analysis/unjustified_assumption_analyzer.md index 778a08e5..30452fbb 100644 --- a/docs/analysis/unjustified_assumption_analyzer.md +++ b/docs/analysis/unjustified_assumption_analyzer.md @@ -14,7 +14,7 @@ The following stages and steps take place during program analysis: 3. Every method in the resulting set is paired with its version in which all valid calls (criteria can be found below) to other methods of the containing object are inlined non-recursively. #### Logic extraction 4. Logical properties (constraints on variables in the form of a logical expression) are extracted for both versions of the method in the pair. -5. The derived expression for the version before inlining is connected with the expression after inlining by the means of implication. Doing so allows us to discover whetner the constraints on the variables have become weaker or stronger. +5. The derived expression for the version before inlining is connected with the expression after inlining by the means of implication. Doing so allows us to discover whether the constraints on the variables have become weaker or stronger. 6. We deduce that the unjustified assumption takes place if the resulting logical expression is false, meaning that there exist some inputs such that they worked before the inlining, and stopped working after. @@ -30,11 +30,11 @@ The following stages and steps take place during program analysis: 1. The called method belongs to the same object as the method containing the call. 2. The call the general form `self.method self ...`. Meaning that `self` is used as both the source for the method and is passed as the first argument. -### Algorith for setting locators: +### Algorithm for setting locators: 0. Explicit locator chains of from (^.^.value) are resolved at parse-time and are converted to corresponding AST-nodes. 1. All predefined EO-keywords are stored in a special structure called 'context' and have locators point to the root of the program. 2. The first step of actual processing extends the existing context by adding all top-level objects. -3. Each top-level object is recursiely explored while keeping keeping track of encountered objects and their depth in the context. +3. Each top-level object is recursively explored while keeping track of encountered objects and their depth in the context. 4. The terms that refer to the definitions in the current scope are given 0-locators (\$), while terms contained in the context are given locators by subtracting their depth from current depth. Before: @@ -97,7 +97,7 @@ After: ``` ### Inlining algorithm: 0. All objects are extracted from the AST -1. Every AST-object is converted into the 'Object' data strucutre +1. Every AST-object is converted into the 'Object' data structure 2. Each created 'Object' is processed as follows: 2.1 All calls in the object are processed, while non-call binds are returned as is. 2.1.1 Replace occurrences of formal parameters in the method body with the argument values. @@ -148,7 +148,7 @@ After: ^.local_average3.average > average ``` -### Alogrithm for logical expression extraction: +### Algorithm for logical expression extraction: 0. The list of recognized terms and their corresponding expressions/properties: ![](https://i.imgur.com/PBFB7ac.png) 1. All methods that can be called from the target method are collected and stored. @@ -200,7 +200,7 @@ Resulting SMT-code: ## Current Limitations: 1. Method pairs for inspection are formed by considering only direct descendants. For example, in case of an inheritance chain A -> B -> C only combinations (A,B) and (B,C) are examined. It would be ideal to examine all possible combinations. So, in the given example an additional pair (A,C) is to be examined. -2. Presence of methods with mutual recursion causes the program to be unable to detect defects of related to unjistifed assumptions. +2. Presence of methods with mutual recursion causes the program to be unable to detect defects of related to unjustified assumptions. 3. Current implementation makes it impossible to store boolean values in object fields. Only integer values are supported as of now. 4. No checks are performed on the contents of asserts in EO. As such, it is possible to assert an integer value, which would cause unanticipated behavior. For example, `assert (3.add 5)`. 5. Imported functions are not supported by the analyzer. diff --git a/docs/general_information.md b/docs/general_information.md index 477ac177..1d096909 100644 --- a/docs/general_information.md +++ b/docs/general_information.md @@ -1,14 +1,14 @@ # Object Dependency INspector (ODIN) -Tool to inspect EO programs for potential defects. Additionaly, it can be used to analyses programs in Java and C++ if they are first passed through such transpilers as `J2EO` and `C2EO`. +Tool to inspect EO programs for potential defects. Additionally, it can be used to analyse programs in Java and C++ if they are first passed through such transpilers as `J2EO` and `C2EO`. ## Detectable Defects -### 0. Divison by zero +### 0. Division by zero > Instances of division by zero. (implementation is available in polystat) -### 1. Unanticipaded mutual recursion +### 1. Unanticipated mutual recursion > Mutual recursion caused by method redefinition during inheritance. (refer to [this document](analysis/mutual_recursion_analyzer.md) for more information) -### 2. Unjustifed assumptions about methods of superclasses +### 2. Unjustified assumptions about methods of superclasses > Assumptions made in subclasses regarding method dependencies in superclasses. (refer to [this document](analysis/unjustified_assumption_analyzer.md) for more information) @@ -28,9 +28,9 @@ distance. 3.9:y ``` -### 2. No support for multilene attribute access +### 2. No support for multiline attribute access As with above, the syntax is deemed unnecessary in the scope of analysis, as the target programs generated by `J2EO` and `C2EO` code transpilers do not rely on it. -As such, the following code snippet will not be regonised and process by Odin: +As such, the following code snippet will not be recognised and process by Odin: ``` dx.pow 2 .add