Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dynamic value resolver and the ability for values to resolve to an Any value #47

Closed
wants to merge 3 commits into from

Conversation

chirino
Copy link
Contributor

@chirino chirino commented Jun 16, 2024

No description provided.

@chirino chirino mentioned this pull request Jun 16, 2024
@clarkmcc
Copy link
Owner

Alright, sorry for the delay, I spent a few hours looking at this this morning. There are some concerns that I have with the implementation, for example:

  • Why does set_dynamic_resolver accept an option? If you're going to call this function, you should be forced to provide a handler.
    pub fn set_dynamic_resolver<T: 'static, F: 'static>(&mut self, handler: Option<F>)
    where
    F: Handler<T> + 'static,
  • Why is the variable resolver a Handler rather than a simple function? Each variable resolution requires that we allocate a FunctionContext which allocates some strings, etc for debugging behind the scenes. This is fine when we do it once per function instantiated in the context, but once every variable resolution is a problem.
  • A preliminary benchmark shows a ~200-250% performance regression when resolving variables when a dynamic resolver is present. In the tests you've provided this isn't as much of a concern since the only variables that need resolving come from the dynamic resolver. But this is not the case for the majority of CEL expressions where the variables will not be in the dynamic resolver. If the expression references a missing variable, it will cost 200-250% more with this change. I benchmarked an expression with 1, 10, and 100 variable references. In the following graph, the red is the implementation without the variable resolver. image

From a philosophical perspective, I also have concerns that these changes are a departure from some of the primary goals of CEL. This change would effectively open the door to a class of features that allow users to hook into the CEL interpreter and change how execution works. For example, using this PR, a user could use a dynamic resolver to make an HTTP request in the middle of CEL traversing a variable reference. This would not fulfill the promise of linear time evaluation.

Keep it small & fast. CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript. https://github.com/google/cel-spec?tab=readme-ov-file

For these reasons I am not excited about merging a PR like this. I'll leave the PR open for a few more days if you'd like to provide any follow-up opinions/insights/feedback.

@clarkmcc clarkmcc closed this Jun 25, 2024
@chirino
Copy link
Contributor Author

chirino commented Jul 2, 2024

sorry for not getting back to you sooner, was on vacation.

Alright, sorry for the delay, I spent a few hours looking at this this morning. There are some concerns that I have with the implementation, for example:

  • Why does set_dynamic_resolver accept an option? If you're going to call this function, you should be forced to provide a handler.
    pub fn set_dynamic_resolver<T: 'static, F: 'static>(&mut self, handler: Option<F>)
    where
    F: Handler<T> + 'static,

good idea.

  • Why is the variable resolver a Handler rather than a simple function? Each variable resolution requires that we allocate a FunctionContext which allocates some strings, etc for debugging behind the scenes. This is fine when we do it once per function instantiated in the context, but once every variable resolution is a problem.

Was not initially sure what the function signature should look like, Handler seemed like a more flexible option. But yeah, we can likely simplify.

  • A preliminary benchmark shows a ~200-250% performance regression when resolving variables when a dynamic resolver is present. In the tests you've provided this isn't as much of a concern since the only variables that need resolving come from the dynamic resolver. But this is not the case for the majority of CEL expressions where the variables will not be in the dynamic resolver. If the expression references a missing variable, it will cost 200-250% more with this change. I benchmarked an expression with 1, 10, and 100 variable references. In the following graph, the red is the implementation without the variable resolver.

This PR was just shared to share the general gist of the feature being requested, not really optimized for performance. Maybe with the changes mentioned above perf will improve a bit.

From a philosophical perspective, I also have concerns that these changes are a departure from some of the primary goals of CEL. This change would effectively open the door to a class of features that allow users to hook into the CEL interpreter and change how execution works. For example, using this PR, a user could use a dynamic resolver to make an HTTP request in the middle of CEL traversing a variable reference. This would not fulfill the promise of linear time evaluation.

Keep it small & fast. CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript. https://github.com/google/cel-spec?tab=readme-ov-file

For these reasons I am not excited about merging a PR like this. I'll leave the PR open for a few more days if you'd like to provide any follow-up opinions/insights/feedback.

The above is true for any user-provided functions. We should just document those caveats so that users are not surprised when they don't get linear execution time when they use non-linear functions or dynamic resolvers.

@chirino
Copy link
Contributor Author

chirino commented Jul 2, 2024

BTW.. where is that benchmark located? I can try to see if it can be optimized a bit.

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 2, 2024

No problem! Hope it was relaxing.

The above is true for any user-provided functions. We should just document those caveats so that users are not surprised when they don't get linear execution time when they use non-linear functions or dynamic resolvers.

This is also the case in all CEL implementations so what the language designers likely meant was that the language itself was linear time, is mutation free, and not Turing-complete. The language designers made this claim likely with full understanding that you can't stop someone from throwing an infinite loop in a UDF. The concern I have here is that this is a change to how the language itself works, and that a simple foo.bar == 10, if foo were dynamically resolved could succeed one time, and fail the next time, or take drastically different amounts of time.

Frankly, I'm not excited about departing from the spec on this. It feels like a departure from some the primary goals of CEL: "This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript."

As a side note, the PR introduces an Any type which is also a departure from the spec.

I'll be happy to look through the PR if you decide to tweak it, I'm just trying to be fully transparent about where I personally stand on this issue today from a philosophical perspective.


BTW.. where is that benchmark located? I can try to see if it can be optimized a bit.

I had saved a patch for the benchmark, but there were other changes I had to make to get the benchmark to working properly, and apparently I don't have a patch for those changes anymore.

Subject: [PATCH] Dynamic resolver benchmark
---
Index: interpreter/benches/runtime.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/interpreter/benches/runtime.rs b/interpreter/benches/runtime.rs
--- a/interpreter/benches/runtime.rs	
+++ b/interpreter/benches/runtime.rs	
@@ -1,7 +1,9 @@
 use cel_interpreter::context::Context;
-use cel_interpreter::Program;
+use cel_interpreter::{ExecutionError, Program, Value};
 use criterion::{black_box, criterion_group, criterion_main, Criterion};
 use std::collections::HashMap;
+use std::io::BufWriter;
+use std::sync::Arc;
 
 pub fn criterion_benchmark(c: &mut Criterion) {
     let expressions = vec![
@@ -66,5 +68,29 @@
     group.finish();
 }
 
-criterion_group!(benches, criterion_benchmark, map_macro_benchmark);
+pub fn variable_resolution_benchmark(c: &mut Criterion) {
+    let mut group = c.benchmark_group("variable resolution");
+    let sizes = vec![1, 10, 100];
+
+    for size in sizes {
+        let mut expr = String::new();
+        for i in 0..size {
+            expr.push_str(&format!("var_{i}", i = i));
+            if i < size - 1 {
+                expr.push_str("||");
+            }
+        }
+
+        let program = Program::compile(&expr).unwrap();
+        let mut ctx = Context::default();
+        let resolver = move |ident: Arc<String>| Ok(Value::Null);
+        ctx.set_dynamic_resolver(Some(resolver));
+        group.bench_function(format!("variable_resolution_{}", size).as_str(), |b| {
+            b.iter(|| program.execute(&ctx).unwrap())
+        });
+    }
+    group.finish();
+}
+
+criterion_group!(benches, variable_resolution_benchmark);
 criterion_main!(benches);

@chirino
Copy link
Contributor Author

chirino commented Jul 3, 2024

Thanks, I added a version of that bench to the PR in #58, it's having better performance now. That might be due to the simplifications of the dynamic resolver function or maybe because it constructs the context inside the iteration loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants