From 7596377768721207d2ff40b2ee6150163ce8f549 Mon Sep 17 00:00:00 2001 From: jlanson Date: Fri, 27 Sep 2024 15:19:05 -0400 Subject: [PATCH] feat: RFD8 for describing policy expr internals refactor --- .../rfds/0008-policy-expr-typing-refactor.md | 235 ++++++++++++++++++ 1 file changed, 235 insertions(+) create mode 100644 site/content/rfds/0008-policy-expr-typing-refactor.md diff --git a/site/content/rfds/0008-policy-expr-typing-refactor.md b/site/content/rfds/0008-policy-expr-typing-refactor.md new file mode 100644 index 00000000..7abf2032 --- /dev/null +++ b/site/content/rfds/0008-policy-expr-typing-refactor.md @@ -0,0 +1,235 @@ +--- +title: "Policy Expression Typing" +--- + +## The Current Situation + +Expressions in the policy expression language used by Hipcheck do not get +type-checked until they are evaluated, similar to an interpreted language. +Hipcheck analysis takes multiple orders of magnitude longer to complete than it +takes to evaluate a policy expression. Because of this, it can be frustrating +for people writing policy expressions to wait until after an analysis has run to +find out they had a type error in their expression. It would be nice to be able +to vet the expressions in a policy file as much as possible before running +analysis to avoid these types of situations. + +The current design of policy expressions does not have a robust type system, as +evidenced by the inability to do pre-evaluation type checking, and the inability +to run-time cast between primitive types, which lead to [Issue #449](https://github.com/mitre/hipcheck/issues/449). + +In this bug, if Hipcheck analysis query happened to return a float that +ended in `.0` and was therefore a whole number, when serialized then +deserialized and inserted into a policy expression, it would be treated as an +integer, since JSON does not distinguish between floats and ints. The binary +arithmetic functions (e.g. `add`) in the policy expression standard environment +expect two like-typed primitives (float + float, int + int), and so would throw +a runtime type error when a would-be float operand was late-bound as an integer. + +``` +(lte $ 0.2) , "0.0" --> (lte 0 0.2) --> Error::BadType +``` + +This particular issue was solved by introducing an `upcast` function that could +turn a `Primitive::Int` into a `Primitive::Float`, and then changing the +behavior of `binary_primitive_op()` to detect a "one int, one float" situation, +and dynamically promote the integer operand. + +However, it would be nicer and cleaner to be able to type-check and upcast +primitives across the entire expression before evaluation. For example, allowing +`[0 1 2.0 3]` by upcasting all non-float elements to floats. + +## The Goal + +Briefly, the goal of this RFD is to describe a refactor of the Policy Expression +system that enables "compile"-time type checking as much as possible, and makes +it in general to write Expr-manipulating functionality using the `Visitor` +pattern. + +### What Would Need To Happen + +1. Introduce first-class types and a uniform way to get the type of a given + `Expr/Primitive` +2. Represent the intention to cast a primitive in the Expr struct ecosystem. +3. Type information associated with functions/variables added to an `Env` + instance. +4. `Env` instance available when type-checking to grab information about a + function/variable. +5. Implement `TypeChecker` as a struct that implements a new `ExprVisitor` trait +6. Re-implement expression evaluation as a `impl ExprVisitor` + +## Proposal + +### Extracting `Expr` Variants into Types + +First, we move out variants of `Expr` with fields into their own structs. E.g. + +```rust +Expr::Array(Vec) + +// Becomes + +struct Array { + elts: Vec +} +Expr::Array(Array) +``` + +The benefit of this is that we can implement traits on each `Expr` variant, and +add different `Visitor` functions for each. Without the above example change to +`Array`, you can't write a `fn visit_array(array: ?)` that is different from `fn +visit_expr(expr: Expr)` because there is no internal type to unwrap and +therefore distinguish it. + +We do this same change to `Function`, and `Lambda`. + +### Adding Typing + +We then add a `enum Type` to capture the types of all expressions/primitives as +follows. + +```rust +type PrimitiveType = std::mem::Discriminant; +type ArrayType = Option>; +struct FunctionType { + pub args: Vec>, + pub output: Box, +} +enum Type { + Primitive(PrimitiveType), + Array(ArrayType), + Function(FunctionType), + ... +} +``` + +For primitives, we use the `Discriminant` of the primitive enum. As arrays can +only contain primitives, they also use the `Primitive` discriminant, but since +we can't know at compile time the type of an empty array or one whose only +element is a `JSONPointer`, it is an `Option`. + +A `FunctionType` is a combination of the array of types of its inputs and +outputs. This will be somewhat difficult to resolve with the existing `Env` +system, as many functions do implicit overloading (e.g. the same `add` function +can handle int, float, and span types), so the output type is dependent on the +input type. We should consider dynamically retrieving the type of a function by +passing a `&Env` reference. + +With this `Type` information represented, we can now add and implement the +`Typed` trait: + +```rust +trait Typed { + fn get_type(&self) -> Type; +} +impl Typed for Primitive { + fn get_type(&self) -> Type { + Type::Primitive(std::mem::discriminant(self)) + } +} +impl Typed for Expr { + ... +} +``` + +Notably because of an overloaded function's need to check its own arguments +against the function implementation, we'll either need to augment +`Expr::Function` to contain some reference to the actual underlying function in +`Env` or to be able to query `Env` about that function when `Typed::get_type()` +is being executed. + +### Adding `Cast` Type + +Now to be able to represent a cast operation in the `Expr` ecosystem. Following +the above rules about creating distinct struct, we add the following to the +`Expr` enum: + +```rust +struct Cast { + target: PrimitiveType, + expr: Box, +} + +Expr::Cast(Cast) + +impl Typed for Cast { + fn get_type(&self) -> Type { + Type::Primitive(self.target) + } +} + +``` + +With this tool, when we perform type checking / upcasting as a stage of +expression compilation, `Function` and `Array` instances can insert `Cast` +nodes to "wrap" improperly typed primitives. This would allow `(lte 0 0.2)` to +be evaluated properly, as the function replaces its first operand with +`Cast::new(op2.get_type(), op1)`. There will need to be some `try_upcast() -> +Result` function to determine whether a cast can be done according to the +semantics of the language. + +### The `ExprVisitor` Trait + +To re-organize computation/tranformation of the `Expr` tree generated by a +policy expression program, we can use the `Visitor` pattern. We define a trait +as such: + +```rust +trait ExprVisitor { + fn visit(&self, expr: &mut Expr) -> T; + fn visit_array(&self, arr: &mut Array) -> T; + fn visit_function(&self, f: &mut Function) -> T; + fn visit_primitive(&self, p: &mut Primitive) -> T; + fn visit_cast(&self, c: &mut Cast) -> T; + ... +} +``` + +Note that the separation of different functions to handle different `Expr` +variants is enabled by splitting them out into distinct types wrapped by their +`Expr` variant. + +We can write multiple structs that implement `ExprVisitor`. + +1. `struct InsertCast` which inserts `Cast` nodes where applicable +2. `struct TypeCheck` which returns `Result<()>` to report a type error. +3. `struct JsonInjector` which takes `context` and replaces all `Expr::JsonPointer` accordingly. +4. `struct Executor`. Re-impl the existing `Executor` struct to obey this pattern. + +This would allow us to organize our transformation steps more effectively. When +we first parse a policy expression from file, we can call `InsertCast, +TypeCheck` to make sure everything pre-context-injection is sound, with some +ambiguity allowed around the JSON pointers themselves. Then once analysis +completes we can `JsonInjector/InsertCast/TypeCheck` again, then finally call +`Executor`'s `visit()` function to return the evaluated `Expr`. + +### Checking Function Return and Argument Types + +When you call `Typed::get_type()` on a function, it will return the "function +type," which is a combination of information about the (actual) argument types +and the number of arguments it expects. However, sometimes we want to get the +"evaluated" type of the function, namely what type it returns when executed. +We also want to be able to tell whether the "evaluated" types passed to a +function match the expected types. We need the "evaluated" types of the passed +expressions because they themselves might be functions instead of primitives +or arrays. + +We therefore need a "type checker" function for each function that will run +first on the types of the function arguments to determine if there is a type +error or, if not, what will be returned. This is especially important for +overloaded functions whose return-type may be dictated by the types of the +passed arguments, or determined to be unknown if not enough information is +provided. + +A type checking function has the signature + +```rust +fn(args: &[Type]) -> Result +``` + +Where `ReturnableType` is the "evaluated" type of the function. The +`ReturnableType` must be a separate enum, since only "unknown", primitives, and +arrays are valid return types in the current policy expression language. + +Now that we have the concept of a "checker" function, every function registered +in the `Env` must be associated with a checker function, which will be executed +during type checking and at run-time.