Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KCL: New simulation test pipeline #4351

Merged
merged 1 commit into from
Oct 30, 2024
Merged

KCL: New simulation test pipeline #4351

merged 1 commit into from
Oct 30, 2024

Conversation

adamchalmers
Copy link
Collaborator

@adamchalmers adamchalmers commented Oct 29, 2024

The idea behind this is to test all the various stages of executing KCL in separately, i.e.

  • Start with a program
  • Tokenize it
  • Parse those tokens into an AST
  • Recast the AST
  • Execute the AST, outputting
    • a PNG of the rendered model
    • serialized program memory

Each of these steps reads some input and writes some output to disk. The output of one step becomes the input to the next step. These intermediate artifacts are also snapshotted (like expectorate or 2020) to ensure we're aware of any changes to how KCL works. A change could be a bug, or it could be harmless, or deliberate, but keeping it checked into the repo means we can easily track changes.

If any step fails, we snapshot the error, and we don't run the subsequent steps in the pipeline. Each of these steps can be run in parallel (we still limit the number of engine connections that unit tests will run concurrently, but at least all the parser/tokenizer/recaster tests can be parallelized).

Note: UUIDs sent back by the engine are currently nondeterministic, so they would break all the snapshot tests. So, the snapshots use a regex filter and replace anything that looks like a uuid with [uuid] when writing program memory to a snapshot.

We run this pipeline on many different KCL programs. Each keeps its inputs (KCL programs), outputs (PNG, program memory snapshot) and intermediate artifacts (AST, token lists, etc) in that directory.

I also added a new just command to easily generate these tests. You can run just new-sim-test gear $(cat gear.kcl) to set up a new gear test directory and generate all the intermediate artifacts for the first time. This doesn't need any macros, it just appends some new lines of normal Rust source code to tests.rs, so it's easy to see exactly what the code is doing.

This uses insta for convenient snapshot testing of artifacts as JSON, and our very own 2020 for snapshotting PNGs.

This was heavily inspired by @obi1kenobi's talk at EuroRust 2024 about deterministic simulation testing, and how it can both reduce bugs and also reduce testing/CI time. Very grateful to him for chatting with me about this over the last couple of weeks.

Copy link

vercel bot commented Oct 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
modeling-app ✅ Ready (Inspect) Visit Preview Oct 30, 2024 4:01pm

Copy link

qa-wolf bot commented Oct 29, 2024

QA Wolf here! As you write new code it's important that your test coverage is keeping up.
Click here to request test coverage for this PR!

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 90.76923% with 12 lines in your changes missing coverage. Please review.

Project coverage is 85.92%. Comparing base (0d52851) to head (e418add).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/wasm-lib/kcl/src/simulation_tests.rs 91.66% 8 Missing ⚠️
src/wasm-lib/kcl/src/executor.rs 66.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4351      +/-   ##
==========================================
+ Coverage   85.86%   85.92%   +0.05%     
==========================================
  Files          77       78       +1     
  Lines       27434    27552     +118     
==========================================
+ Hits        23557    23674     +117     
- Misses       3877     3878       +1     
Flag Coverage Δ
wasm-lib 85.92% <90.76%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jtran
Copy link
Collaborator

jtran commented Oct 30, 2024

UUIDs sent back by the engine are currently nondeterministic, so they would break all the snapshot tests.

There's actually a way to control generated UUIDs since #4101, but I like your way better for this. I guess technically there could be a bug in swapping one UUID for another, but I don't think we should worry about it right now.

Copy link
Collaborator

@jtran jtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I like the generator instead of macros. In the future, I'll probably want to replace program memory with the artifact graph. But this is good for where we are now.

@adamchalmers
Copy link
Collaborator Author

UUIDs sent back by the engine are currently nondeterministic, so they would break all the snapshot tests.

There's actually a way to control generated UUIDs since #4101, but I like your way better for this. I guess technically there could be a bug in swapping one UUID for another, but I don't think we should worry about it right now.

Yeah but right now the engine is also returning random IDs, so we'd still need the [uuid] filter for engine responses. The goal is to get IDs in both requests and responses to be deterministic, then I can remove the filter.

@adamchalmers adamchalmers force-pushed the achalmers/detsim branch 2 times, most recently from 89e0647 to 7a2db76 Compare October 30, 2024 15:43
The idea behind this is to test all the various stages of executing KCL
separately, i.e.

 - Start with a program
 - Tokenize it
 - Parse those tokens into an AST
 - Recast the AST
 - Execute the AST, outputting
   - a PNG of the rendered model
   - serialized program memory

Each of these steps reads some input and writes some output to disk.
The output of one step becomes the input to the next step. These
intermediate artifacts are also snapshotted (like expectorate or 2020)
to ensure we're aware of any changes to how KCL works. A change could
be a bug, or it could be harmless, or deliberate, but keeping it checked
into the repo means we can easily track changes.

Note: UUIDs sent back by the engine are currently nondeterministic, so
they would break all the snapshot tests. So, the snapshots use a regex
filter and replace anything that looks like a uuid with [uuid] when
writing program memory to a snapshot. In the future I hope our UUIDs will
be seedable and easy to make deterministic. At that point, we can stop
filtering the UUIDs.

We run this pipeline on many different KCL programs. Each keeps its
inputs (KCL programs), outputs (PNG, program memory snapshot) and
intermediate artifacts (AST, token lists, etc) in that directory.

I also added a new `just` command to easily generate these tests.
You can run `just new-sim-test gear $(cat gear.kcl)` to set up a new
gear test directory and generate all the intermediate artifacts for the
first time. This doesn't need any macros, it just appends some new lines
of normal Rust source code to `tests.rs`, so it's easy to see exactly
what the code is doing.

This uses `cargo insta` for convenient snapshot testing of artifacts
as JSON, and 20/20 for snapshotting PNGs.

This was heavily inspired by Predrag Gruevski's talk at EuroRust 2024
about deterministic simulation testing, and how it can both reduce bugs
and also reduce testing/CI time. Very grateful to him for chatting with
me about this over the last couple of weeks.
@adamchalmers adamchalmers merged commit 0c6c646 into main Oct 30, 2024
32 checks passed
@adamchalmers adamchalmers deleted the achalmers/detsim branch October 30, 2024 17:14
@franknoirot franknoirot mentioned this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants