Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce experimental packaging format #278

Merged
merged 18 commits into from
Aug 27, 2024
Merged

Conversation

bitwalker
Copy link
Contributor

This PR is my work-in-progress branch while I work through fixing the last remaining blocker for the alpha release, an unknown bug related to linear memory.

In the process of diagnosing the bug, I'm finding various things which I'm addressing in the form of independent commits. For anything that is significant enough that we want it on next ASAP, I'll open those as separate PRs, but for now, what is in this branch is mostly modifications/improvements to various parts of the compiler and debugger.

@greenhat I'll ping you when this is ready for review, but feel free to scan through it for anything of interest.

@bitwalker bitwalker added the blocker This issue is one of our top priorities label Aug 16, 2024
@bitwalker bitwalker added this to the Alpha milestone Aug 16, 2024
@bitwalker bitwalker self-assigned this Aug 16, 2024
@bitwalker bitwalker changed the title fix: blockers for alpha release feat: introduce experimental packaging format Aug 17, 2024
@bitwalker bitwalker force-pushed the bitwalker/memory-fixes branch 2 times, most recently from 89e9169 to c21bf07 Compare August 19, 2024 02:34
@bitwalker bitwalker marked this pull request as ready for review August 19, 2024 04:38
@bitwalker
Copy link
Contributor Author

@greenhat This PR ended up turning into its own thing, so rather than using it for fixing the get_inputs test, I've renamed it and will start a new branch for get_inputs instead. You can ignore the original PR description, the title and commit messages cover the gist of it instead.

Copy link
Contributor

@greenhat greenhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

@@ -14,16 +15,31 @@ use crate::{
Session,
};

static STDLIB: LazyLock<StdLibrary> = LazyLock::new(StdLibrary::default);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa! Good catch! We loaded the libs on every compilation. 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test suite and it was suddenly taking ages to run, and I could not figure out what I had changed to make it happen. Turned out it was basically adding the stdlib to the link libraries set of every single module being compiled 😅. There are some changes in miden-assembly/miden-core that should make this way less of an issue, but it turns out that the overhead of hashing is really expensive!

Previous to this commit, we were expecting the rodata segments to be
encoded in a specific order, and placed on the advice stack before
anything else. With this commit, we now simply expect to the advice map
to contain each segment keyed by its commitment hash, which we then move
to the advice stack on demand, and immediately pipe to memory.

This means the order of the segments no longer matters, and the advice
stack is not sensitive to codegen changes or other influences which
might perturb the advice stack or otherwise disrupt our assumptions. It
also sets the stage for us to be able to initialize rodata after a
context switch, as at that point the advice stack will be in an unknown
condition, and using the advice map gives us certainty that we can
arrange to have exactly what we need on the advice stack, when we need
it.

Additionally, I've updated the `midenc debug` input config file, as well
as the usage documentation to reflect this.

The last related change to this, will be emitting the rodata segments to
disk in a convenient form, so that when the compiler emits the program,
it also emits the segments alongside it, making it convenient to run the
debugger against that program (or via the VM directly).
This commit implements a basic package format for Miden programs and
libraries, with the metadata needed to simplify using them together. The
format is experimental, and can be changed at any time, however it does
use a header that lets us identify whether a package was produced with
the same specification or not.

This also modifies the compiler and debugger to emit and consume,
respectively, the package format as their primary artifact. The debugger
uses this, for example, to initialize the VM with the necessary
libraries and rodata segments required by the program.

The package format is described using the structures in
`codegen/masm/src/packaging/package.rs`, and is encoded using a tiny,
but efficient binary format from the `bitcode` crate, using `serde` to
facilitate the lowering of our high-level types to that format. This
made it relatively straightforward to implement the format without
having to mess around with the low-level byte representation. The
`bitcode` crate is designed with stability in mind, though because we
are using it via `serde`, the stability is subject to being violated if
`serde` changes something that causes `bitcode` to serialize a type
differently. For now, this is a non-issue, longer term we might want to
actually implement `bitcode::{Encode, Decode}` for our types if we want
to gain more stability.
Some recent changes caused the `midenc compile` command to emit no
outputs when `--emit` was not provided.

Also fixed:

* Moved some unnecessary debug logs to trace level
* Added more debug logging to the compiler to aid in troubleshooting
* Fixed serialization of packages involving more complex inputs which
  were hitting an issue with fields marked as skippable via serde; this
  is not supported via bitcode, so those attributes were removed
It turns out we were assuming that Rust was laying out its memory using
the default LLVM layout, but rustc actually specifies a larger shadow
stack. We were not properly handling the defined memory in Wasm modules
parsed by the frontend, and so we ended up incorrectly placing global
variables and other data in the middle of memory that rust was using for
either the shadow stack or static data. Because of the size of the
shadow stack region reserved by Rust, this was not causing any
corruption in our tests, but it was causing other assumptions to be
invalid, and miscompilations were the result.

This commit treats the Wasm memory for a parsed module as reserved for
use by the code in that module, which causes midenc to place any global
variables it defines after that region, and computes the start of the
heap as immediately following any such data. It does so by propagating
the page size and reserved number of pages through the IR and to the
backend, taking that information into account as needed.
@bitwalker bitwalker merged commit 7ddd719 into next Aug 27, 2024
4 checks passed
@bitwalker bitwalker deleted the bitwalker/memory-fixes branch August 27, 2024 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue is one of our top priorities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants