implement plymorphic property testing with an existing interpreter #9

wucke13 · 2024-06-26T14:15:12Z

What interpreters to use?
- wasmi
- wasmtime
Add a boilerplate macro?

george-cosma · 2024-07-05T09:22:26Z

Thoughts on polymorphic testing

I've been thinking more about this issue, especially after our discussions on Wednesday. I keep coming back to the question "When will we need to test our implementation against another engine?". To answer this question, I think it's relevant to see what types of tests we might need:

Op code functionality testing with optional internal state analysis. For this we won't need polymorphic testing since we don't really care if other engines correctly interpret an op code or not. More so, it is irrelevant to us (and somewhat difficult) to analyze the internal state of the other engines. Comparing something like Storage might be doable, but more than that is going take a lot of work.

There is something to be said about using other engines to CREATE our tests. However, if we use a macro to instantiate & run all engines for a test, should that prototyping code exist in the final code - does it bother us? How should we proceed with this?

Functionality testing (for example, testing the whole suite of SIMD instructions). It might be interesting to test against another engine, but it's still mostly irrelevant. For this we also have the wasm spec testsuite. Both wasmi and wasmtime use it.
Fuzzy testing. For this I believe it actually makes sense to compare our results to the other engines to see if differences appear. In this scenario all we would need to analyze is the final output (or at worst individual results after each function call) to determine "compliance".

In all of these cases, I don't see much more of a need then to instantiate all of the engines and put to extract results out of them, and assert that they are all equal (as per the final comments in PR #16 ). However, if we concentrate our efforts to what we will need on the short-term for an MVP, then I would consider Fuzzy Testing to not be relevant to us right now.

Boilerplate

As for the boilerplate in creating our own interpreter, I think I can quickly create a small macro that would look something like this:

let wat = r#" ... "#;
init!(wat, instance);

assert_eq!(10, instance::invoke_func::<i32,i32>("add_one", 9);  
...

That would just rewrite it as

let wat = r#" ... "#;
let wasm_bytes = wat::parse_str(wat).unwrap();
let validation_info = validate(&wasm_bytes).expect("validation failed");
let mut instance = RuntimeInstance::new(&validation_info).expect("instantiation failed");

assert_eq!(10, instance::invoke_func::<i32,i32>("add_one", 9);  
...

Is there something else we would need if we focus just on our intepreter? For multiple interpreter I've made some macros in this commit. I didn't open a PR for it since I want to first discuss if we need it.

WAST/Specification test parsing

On the topic of using the wasm spec testsuite, the .wast files in it has a format different from the usual .wat files. The assert_return, invoke, assert_malformed, etc. are not supported by the wat crate. I took a look at how wasmtime and how wasmi does it, and they both seem to use the wast crate for parsing, and then interpreting it.

After some digging, the .wast files seem to be "script" files, which have their format described here: https://github.com/WebAssembly/spec/tree/master/interpreter#scripts

I was thinking we could use it ourselves, but the downside is that we have to create a pre-verification step that interprets these new directives. The AST described for these secripts seems quite complicated, but from taking a quick glance at the .wast files in the specsuite, the more complicated directives like assert_exhaustion, script, input, output are rarely used.

What do you think? Should we move forward with wast, or should we do this another way?

wucke13 · 2024-07-05T11:45:07Z

In all of these cases, I don't see much more of a need then to instantiate all of the engines and put to extract results out of them, and assert that they are all equal (as per the final comments in PR #16 ). However, if we concentrate our efforts to what we will need on the short-term for an MVP, then I would consider Fuzzy Testing to not be relevant to us right now.

That is a quite sensible assessment IMHO.

Is there something else we would need if we focus just on our intepreter? For multiple interpreter I've made some macros in this commit. I didn't open a PR for it since I want to first discuss if we need it.

Not in the MVP, AFAICT. Of course on can always grow the scope. Just the macro as presented in you example only tests one interpreter; so it is a test creation convenience, but polymorphic. It becomes polymorphic from the moment where do two different things that should behave equivalent, i.e. call the same wasm code in two interpreters.

What do you think? Should we move forward with wast, or should we do this another way?

I think, WAST is the gold standard. If we pass 100% of the WAST tests, then we are quite sure that we do good. I'm a little afraid if I read about scripts and custom AST. That sounds like a lot of work. And it sounds like a lot of potential to invest time into on elaborate approach, only later to find a major roadblock with that approach. So, we should do it, but we should be very careful and thoughtful on how to do it.

george-cosma · 2024-07-18T13:55:21Z

Successful completion of #43 will help with this issues. wast testsuite files are interpreted at runtime and the argument's types are only known at runtime.

george-cosma mentioned this issue Jul 1, 2024

Implement polymorphic testing against WASMTime #16

Closed

3 tasks

This was referenced Jul 11, 2024

Enable function invocation by name #30

Merged

WASM Specification Testsuite #32

Merged

wucke13 added this to the Implement all Instructions milestone Jul 17, 2024

george-cosma self-assigned this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement plymorphic property testing with an existing interpreter #9

implement plymorphic property testing with an existing interpreter #9

wucke13 commented Jun 26, 2024 •

edited

Loading

george-cosma commented Jul 5, 2024

wucke13 commented Jul 5, 2024

george-cosma commented Jul 18, 2024

implement plymorphic property testing with an existing interpreter #9

implement plymorphic property testing with an existing interpreter #9

Comments

wucke13 commented Jun 26, 2024 • edited Loading

george-cosma commented Jul 5, 2024

Thoughts on polymorphic testing

Boilerplate

WAST/Specification test parsing

wucke13 commented Jul 5, 2024

george-cosma commented Jul 18, 2024

wucke13 commented Jun 26, 2024 •

edited

Loading