Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement plymorphic property testing with an existing interpreter #9

Open
wucke13 opened this issue Jun 26, 2024 · 3 comments
Open
Assignees

Comments

@wucke13
Copy link
Collaborator

wucke13 commented Jun 26, 2024

  • What interpreters to use?
    • wasmi
    • wasmtime
  • Add a boilerplate macro?
@george-cosma
Copy link
Collaborator

Thoughts on polymorphic testing

I've been thinking more about this issue, especially after our discussions on Wednesday. I keep coming back to the question "When will we need to test our implementation against another engine?". To answer this question, I think it's relevant to see what types of tests we might need:

  1. Op code functionality testing with optional internal state analysis. For this we won't need polymorphic testing since we don't really care if other engines correctly interpret an op code or not. More so, it is irrelevant to us (and somewhat difficult) to analyze the internal state of the other engines. Comparing something like Storage might be doable, but more than that is going take a lot of work.

There is something to be said about using other engines to CREATE our tests. However, if we use a macro to instantiate & run all engines for a test, should that prototyping code exist in the final code - does it bother us? How should we proceed with this?

  1. Functionality testing (for example, testing the whole suite of SIMD instructions). It might be interesting to test against another engine, but it's still mostly irrelevant. For this we also have the wasm spec testsuite. Both wasmi and wasmtime use it.

  2. Fuzzy testing. For this I believe it actually makes sense to compare our results to the other engines to see if differences appear. In this scenario all we would need to analyze is the final output (or at worst individual results after each function call) to determine "compliance".

In all of these cases, I don't see much more of a need then to instantiate all of the engines and put to extract results out of them, and assert that they are all equal (as per the final comments in PR #16 ). However, if we concentrate our efforts to what we will need on the short-term for an MVP, then I would consider Fuzzy Testing to not be relevant to us right now.

Boilerplate

As for the boilerplate in creating our own interpreter, I think I can quickly create a small macro that would look something like this:

let wat = r#" ... "#;
init!(wat, instance);

assert_eq!(10, instance::invoke_func::<i32,i32>("add_one", 9);  
...

That would just rewrite it as

let wat = r#" ... "#;
let wasm_bytes = wat::parse_str(wat).unwrap();
let validation_info = validate(&wasm_bytes).expect("validation failed");
let mut instance = RuntimeInstance::new(&validation_info).expect("instantiation failed");

assert_eq!(10, instance::invoke_func::<i32,i32>("add_one", 9);  
...

Is there something else we would need if we focus just on our intepreter? For multiple interpreter I've made some macros in this commit. I didn't open a PR for it since I want to first discuss if we need it.

WAST/Specification test parsing

On the topic of using the wasm spec testsuite, the .wast files in it has a format different from the usual .wat files. The assert_return, invoke, assert_malformed, etc. are not supported by the wat crate. I took a look at how wasmtime and how wasmi does it, and they both seem to use the wast crate for parsing, and then interpreting it.

After some digging, the .wast files seem to be "script" files, which have their format described here: https://github.com/WebAssembly/spec/tree/master/interpreter#scripts

I was thinking we could use it ourselves, but the downside is that we have to create a pre-verification step that interprets these new directives. The AST described for these secripts seems quite complicated, but from taking a quick glance at the .wast files in the specsuite, the more complicated directives like assert_exhaustion, script, input, output are rarely used.

What do you think? Should we move forward with wast, or should we do this another way?

@wucke13
Copy link
Collaborator Author

wucke13 commented Jul 5, 2024

In all of these cases, I don't see much more of a need then to instantiate all of the engines and put to extract results out of them, and assert that they are all equal (as per the final comments in PR #16 ). However, if we concentrate our efforts to what we will need on the short-term for an MVP, then I would consider Fuzzy Testing to not be relevant to us right now.

That is a quite sensible assessment IMHO.

Is there something else we would need if we focus just on our intepreter? For multiple interpreter I've made some macros in this commit. I didn't open a PR for it since I want to first discuss if we need it.

Not in the MVP, AFAICT. Of course on can always grow the scope. Just the macro as presented in you example only tests one interpreter; so it is a test creation convenience, but polymorphic. It becomes polymorphic from the moment where do two different things that should behave equivalent, i.e. call the same wasm code in two interpreters.

What do you think? Should we move forward with wast, or should we do this another way?

I think, WAST is the gold standard. If we pass 100% of the WAST tests, then we are quite sure that we do good. I'm a little afraid if I read about scripts and custom AST. That sounds like a lot of work. And it sounds like a lot of potential to invest time into on elaborate approach, only later to find a major roadblock with that approach. So, we should do it, but we should be very careful and thoughtful on how to do it.

@george-cosma
Copy link
Collaborator

Successful completion of #43 will help with this issues. wast testsuite files are interpreted at runtime and the argument's types are only known at runtime.

@george-cosma george-cosma self-assigned this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants