Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

chirino · 2024-06-16T22:16:49Z

No description provided.

Signed-off-by: Hiram Chirino <[email protected]>

…ype. Signed-off-by: Hiram Chirino <[email protected]>

clarkmcc · 2024-06-19T18:16:54Z

Alright, sorry for the delay, I spent a few hours looking at this this morning. There are some concerns that I have with the implementation, for example:

Why does set_dynamic_resolver accept an option? If you're going to call this function, you should be forced to provide a handler.

cel-rust/interpreter/src/context.rs

Lines 153 to 155 in 5a5de52

    
           pub fn set_dynamic_resolver<T: 'static, F: 'static>(&mut self, handler: Option<F>) 
        
           where 
        
               F: Handler<T> + 'static,

Why is the variable resolver a Handler rather than a simple function? Each variable resolution requires that we allocate a FunctionContext which allocates some strings, etc for debugging behind the scenes. This is fine when we do it once per function instantiated in the context, but once every variable resolution is a problem.
A preliminary benchmark shows a ~200-250% performance regression when resolving variables when a dynamic resolver is present. In the tests you've provided this isn't as much of a concern since the only variables that need resolving come from the dynamic resolver. But this is not the case for the majority of CEL expressions where the variables will not be in the dynamic resolver. If the expression references a missing variable, it will cost 200-250% more with this change. I benchmarked an expression with 1, 10, and 100 variable references. In the following graph, the red is the implementation without the variable resolver.

From a philosophical perspective, I also have concerns that these changes are a departure from some of the primary goals of CEL. This change would effectively open the door to a class of features that allow users to hook into the CEL interpreter and change how execution works. For example, using this PR, a user could use a dynamic resolver to make an HTTP request in the middle of CEL traversing a variable reference. This would not fulfill the promise of linear time evaluation.

Keep it small & fast. CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript. https://github.com/google/cel-spec?tab=readme-ov-file

For these reasons I am not excited about merging a PR like this. I'll leave the PR open for a few more days if you'd like to provide any follow-up opinions/insights/feedback.

chirino · 2024-07-02T13:14:41Z

sorry for not getting back to you sooner, was on vacation.

Alright, sorry for the delay, I spent a few hours looking at this this morning. There are some concerns that I have with the implementation, for example:

Why does set_dynamic_resolver accept an option? If you're going to call this function, you should be forced to provide a handler.

cel-rust/interpreter/src/context.rs

Lines 153 to 155 in 5a5de52

pub fn set_dynamic_resolver<T: 'static, F: 'static>(&mut self, handler: Option<F>)

where

F: Handler<T> + 'static,

good idea.

Why is the variable resolver a Handler rather than a simple function? Each variable resolution requires that we allocate a FunctionContext which allocates some strings, etc for debugging behind the scenes. This is fine when we do it once per function instantiated in the context, but once every variable resolution is a problem.

Was not initially sure what the function signature should look like, Handler seemed like a more flexible option. But yeah, we can likely simplify.

A preliminary benchmark shows a ~200-250% performance regression when resolving variables when a dynamic resolver is present. In the tests you've provided this isn't as much of a concern since the only variables that need resolving come from the dynamic resolver. But this is not the case for the majority of CEL expressions where the variables will not be in the dynamic resolver. If the expression references a missing variable, it will cost 200-250% more with this change. I benchmarked an expression with 1, 10, and 100 variable references. In the following graph, the red is the implementation without the variable resolver.

This PR was just shared to share the general gist of the feature being requested, not really optimized for performance. Maybe with the changes mentioned above perf will improve a bit.

From a philosophical perspective, I also have concerns that these changes are a departure from some of the primary goals of CEL. This change would effectively open the door to a class of features that allow users to hook into the CEL interpreter and change how execution works. For example, using this PR, a user could use a dynamic resolver to make an HTTP request in the middle of CEL traversing a variable reference. This would not fulfill the promise of linear time evaluation.

Keep it small & fast. CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript. https://github.com/google/cel-spec?tab=readme-ov-file

For these reasons I am not excited about merging a PR like this. I'll leave the PR open for a few more days if you'd like to provide any follow-up opinions/insights/feedback.

The above is true for any user-provided functions. We should just document those caveats so that users are not surprised when they don't get linear execution time when they use non-linear functions or dynamic resolvers.

chirino · 2024-07-02T13:16:11Z

BTW.. where is that benchmark located? I can try to see if it can be optimized a bit.

clarkmcc · 2024-07-02T18:17:17Z

No problem! Hope it was relaxing.

The above is true for any user-provided functions. We should just document those caveats so that users are not surprised when they don't get linear execution time when they use non-linear functions or dynamic resolvers.

This is also the case in all CEL implementations so what the language designers likely meant was that the language itself was linear time, is mutation free, and not Turing-complete. The language designers made this claim likely with full understanding that you can't stop someone from throwing an infinite loop in a UDF. The concern I have here is that this is a change to how the language itself works, and that a simple foo.bar == 10, if foo were dynamically resolved could succeed one time, and fail the next time, or take drastically different amounts of time.

Frankly, I'm not excited about departing from the spec on this. It feels like a departure from some the primary goals of CEL: "This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript."

As a side note, the PR introduces an Any type which is also a departure from the spec.

I'll be happy to look through the PR if you decide to tweak it, I'm just trying to be fully transparent about where I personally stand on this issue today from a philosophical perspective.

BTW.. where is that benchmark located? I can try to see if it can be optimized a bit.

I had saved a patch for the benchmark, but there were other changes I had to make to get the benchmark to working properly, and apparently I don't have a patch for those changes anymore.

Subject: [PATCH] Dynamic resolver benchmark
---
Index: interpreter/benches/runtime.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/interpreter/benches/runtime.rs b/interpreter/benches/runtime.rs
--- a/interpreter/benches/runtime.rs	
+++ b/interpreter/benches/runtime.rs	
@@ -1,7 +1,9 @@
 use cel_interpreter::context::Context;
-use cel_interpreter::Program;
+use cel_interpreter::{ExecutionError, Program, Value};
 use criterion::{black_box, criterion_group, criterion_main, Criterion};
 use std::collections::HashMap;
+use std::io::BufWriter;
+use std::sync::Arc;
 
 pub fn criterion_benchmark(c: &mut Criterion) {
     let expressions = vec![
@@ -66,5 +68,29 @@
     group.finish();
 }
 
-criterion_group!(benches, criterion_benchmark, map_macro_benchmark);
+pub fn variable_resolution_benchmark(c: &mut Criterion) {
+    let mut group = c.benchmark_group("variable resolution");
+    let sizes = vec![1, 10, 100];
+
+    for size in sizes {
+        let mut expr = String::new();
+        for i in 0..size {
+            expr.push_str(&format!("var_{i}", i = i));
+            if i < size - 1 {
+                expr.push_str("||");
+            }
+        }
+
+        let program = Program::compile(&expr).unwrap();
+        let mut ctx = Context::default();
+        let resolver = move |ident: Arc<String>| Ok(Value::Null);
+        ctx.set_dynamic_resolver(Some(resolver));
+        group.bench_function(format!("variable_resolution_{}", size).as_str(), |b| {
+            b.iter(|| program.execute(&ctx).unwrap())
+        });
+    }
+    group.finish();
+}
+
+criterion_group!(benches, variable_resolution_benchmark);
 criterion_main!(benches);

chirino · 2024-07-03T14:47:50Z

Thanks, I added a version of that bench to the PR in #58, it's having better performance now. That might be due to the simplifications of the dynamic resolver function or maybe because it constructs the context inside the iteration loop.

chirino added 3 commits June 17, 2024 07:12

Don’t panic on invalid comparisons.

2b48025

Signed-off-by: Hiram Chirino <[email protected]>

Addd support for using a dynamic variable resolver.

7595f47

Signed-off-by: Hiram Chirino <[email protected]>

Add an Any variant to the Value type that can hold any user defined t…

5a5de52

…ype. Signed-off-by: Hiram Chirino <[email protected]>

chirino mentioned this pull request Jun 16, 2024

Dynamic value support #43

Open

clarkmcc closed this Jun 25, 2024

chirino mentioned this pull request Jul 2, 2024

Add support for using a dynamic variable resolver and a Value::DynamicCollection that can dynamically resolve nested values of collection types. #58

Open

Caellian mentioned this pull request Oct 14, 2024

Add Value::Dynamic variant #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

chirino commented Jun 16, 2024

clarkmcc commented Jun 19, 2024

chirino commented Jul 2, 2024

chirino commented Jul 2, 2024

clarkmcc commented Jul 2, 2024

chirino commented Jul 3, 2024

Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

Conversation

chirino commented Jun 16, 2024

clarkmcc commented Jun 19, 2024

chirino commented Jul 2, 2024

chirino commented Jul 2, 2024

clarkmcc commented Jul 2, 2024

chirino commented Jul 3, 2024