Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(assembly): track source locations in debug mode #1419

Merged
merged 14 commits into from
Aug 5, 2024

Conversation

bitwalker
Copy link
Contributor

@bitwalker bitwalker commented Jul 29, 2024

This PR introduces a more robust implementation of source code management and source mapping for Miden Assembly instructions assembled to MAST. This is done by piggy-backing on the existing AssemblyOp decorator, extending it with an optional Location, which is only emitted to the serialized MAST when debug mode is enabled and there is source location information available.

A Location consists of a string representing the file path of the source (as known to the assembler), and a span, whose start and end bounds are byte indices in the original source file, representing a contiguous range of valid UTF-8 bytes corresponding to the relevant source code from which the given Miden Assembly operation was derived. This will be pretty much 1:1 when the source is itself MASM code, but when the MASM in question is lowered from a high-level language, such as Rust, the source span may correspond to many instructions, and the connection between a given span and a specific MASM instruction may be non-obvious at times - this is to be expected though, and is simply a result of translating a high-level language to a low-level one.

In addition to the mechanism by which we store debug info in the assembled MAST, there are a number of new primitives used to manage source code and spans related to each node in the AST, and to improve the precision of debug info for high-level languages. Up until now, spans were assumed to be all in the same source file within a procedure (and at the top level, within a module). This doesn't work at all for high-level languages, which due to inlining/macros/etc, may mix spans from many different files together in the same procedure. These new primitives address that. They are defined in miden-core, and exported from the debuginfo module, as well as re-exported from miden-assembly at the top level. To quickly summarize:

  • SourceManager is a trait which defines the core capabilities of source code management needed for various purposes throughout the project, and in the compiler. With a SourceManager in hand, you can create a SourceFile from a string or path to a file on disk (covered in a second), query managed files, and resolve spans to various useful types of metadata, and other helpful functionality.
  • A SourceFile corresponds to a single file of source code (either Miden Assembly, or some higher level language), it has a path/name, and metadata used for diagnostics (e.g. precomputed line offsets so we can resolve spans to file/line/col for display). A SourceFile is assigned a unique SourceId by its owning SourceManager.
  • A SourceSpan is a SourceId paired with a range of byte indices, and uniquely identifies a slice of source code in the file corresponding to the spans SourceId. This is what we associated with AST items, either as a field of the object (and implementing Spanned), or by wrapping them in Span<T>. In order to resolve a SourceSpan to the actual content, or to find out information like filename, we need the SourceManager in which the SourceId was allocated, so for this purpose we require one to be provided when constructing an Assembler or ModuleParser.

Two built-in SourceManager implementations are provided: SingleThreadedSourceManager and MultiThreadedSourceManager, with the key difference between them being whether or not a synchronization primitive is involved. In order to provide such a primitive on all our supported platforms, miden_core::utils::sync::RwLock is provided (which is an alias for parking_lot::RwLock with std enabled, otherwise an implementation based on atomics is used). The key difference essentially boils down to whether you need the SourceManager to be Send + Sync or not. By default, we generally use SingleThreadedSourceManager in this PR, since there aren't any cases in this repo where we are sharing the source manager across threads - however the compiler currently uses a multi-threaded source manager, and I'd like to switch to using the one provided here and unify things a bit.

Serialization

The serialization used here is not particularly efficient, but isn't horrible. In the future though, we may want to have a dedicated table representation for source locations, so that streams of instructions with a shared location can simply reference a single instance of the source location, rather than duplicating the information for every instruction. Additionally, we may want to allow including the actual source file content in a library containing debug information, rather than requiring end users to supply a path to a directory containing the correct files.

Future Changes

One issue to be aware of, is that, in general, Location will be serialized using absolute paths, but we may want to make paths relative when possible, so as to make the paths portable (otherwise, debug info embedded in the MAST is only useful on a machine that has the source code in the exact same filesystem location). Technically, this is in the hands of users of the assembler, by setting the source file paths to relative paths (e.g. --remap-path-prefix in rustc) - but in practice it would be quite awkward to force end users to handle this. Additionally, there are times where an absolute path may be desirable (such as when referencing well-known global paths, e.g. /usr/local/etc/..). In any case, that is not a problem we attempt to solve in this PR.

Related Issues


I was working on wiring up debug info in the compiler, and realized we were missing the last link in the chain (tracking current instruction source locations during execution). With this merged, it will be possible to render diagnostics with source spans when an instruction fails in the VM, so long as:

  • The code was assembled with debug mode enabled
  • The MASM module/procedure had a source file attached
  • The individual MASM instructions in a procedure had a non-default source span attached
  • At execution time, the source file is able to be loaded from disk (or from some kind of source manager abstraction, e.g. CodeMap in miden-diagnostics) using the source file path associated with the instruction of interest. Furthermore, the span we have must be in-bounds of that file, an out-of-bounds span is treated as if we do not have the source available.

If any of those requirements are unmet, then either source locations are disabled entirely during assembly, no source location was available for the given instruction, or we are unable to render a source span because we don't have the file content.

@bitwalker bitwalker added enhancement New feature or request assembly Related to Miden assembly labels Jul 29, 2024
@bitwalker bitwalker self-assigned this Jul 29, 2024
@bitwalker bitwalker force-pushed the bitwalker/asmop-source-spans branch 3 times, most recently from 2306927 to b410f48 Compare July 29, 2024 08:56
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! I left a couple of non-blocking comments/questions inline. I think the main thing is to create an issue to improve source location tracking in the future (i.e., to make it more efficient and potentially add an option to include source code in serialized libraries/programs).

Also, it seems like we now have two SourceLocation structs: the new one in core and the old one in the assembler. It seems like the old one is actually unused - so, maybe we should remove it.

assembly/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
core/src/mast/serialization/basic_block_data_decoder.rs Outdated Show resolved Hide resolved
@bitwalker
Copy link
Contributor Author

I'm reworking this PR a bit, as I found that we need more precise spans. Currently, spans are implicitly associated with a source file attached at the procedure level, but this is all but useless for code lowered from Rust, and won't work well once we start inlining code intraprocedurally. At least as far as troubleshooting issues in the compiler goes, this PR won't cut it, so I'd rather hold off on merging it for the moment.

As a result of my findings, I'm refining the way we represent source spans a bit, so that each span is associated with a file via a unique id that is assigned by a source manager, very much like how the compiler works. Once that is done, I'll switch this PR back from draft status and we can re-review it.

@bitwalker bitwalker marked this pull request as draft July 31, 2024 07:36
@bitwalker bitwalker force-pushed the bitwalker/asmop-source-spans branch 4 times, most recently from 78c4532 to 82de180 Compare August 1, 2024 09:12
@bitwalker bitwalker marked this pull request as ready for review August 1, 2024 09:12
@bitwalker
Copy link
Contributor Author

This should be ready for review now - see the original PR description for what is now implemented here. The same mechanism is used to associate debug info with instructions in MAST, but the entire system for managing source spans and similar information, has changed.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Not a full review yet, but I did leave some comments inline.

Also, probably not for this PR, but what do you think about "embedding" source files into the MastForest. This way, once a MastForest is compiled in debug mode, all relevant debug info is self-contained in it.

core/src/debuginfo/location.rs Outdated Show resolved Hide resolved
core/src/debuginfo/source_file.rs Show resolved Hide resolved
assembly/src/parser/mod.rs Outdated Show resolved Hide resolved
miden/src/cli/data.rs Outdated Show resolved Hide resolved
miden/src/cli/data.rs Outdated Show resolved Hide resolved
core/src/debuginfo/source_manager.rs Outdated Show resolved Hide resolved
core/src/utils/sync.rs Outdated Show resolved Hide resolved
core/Cargo.toml Outdated Show resolved Hide resolved
core/Cargo.toml Outdated Show resolved Hide resolved
This commit introduces an initial, minimal implementation of source
mapping for Miden Assembly instructions assembled to MAST. This is done
by piggy-backing on the existing `AssemblyOp` decorator, extending it
with an optional `SourceLocation`, which is only emitted to the
serialized MAST when debug mode is disabled _and_ there is source
location information available.

A `SourceLocation` consists of a string representing the file path of
the source (as known to the assembler), and a span, whose start and end
bounds are byte indices in the original source file, representing a
contiguous range of valid UTF-8 bytes corresponding to the relevant
source code from which the given Miden Assembly operation was derived.
This will be pretty much 1:1 when the source is itself MASM code, but
when the MASM in question is lowered from a high-level language, such as
Rust, the source span may correspond to many instructions, and the
connection between a given span and a specific MASM instruction may be
non-obvious at times - this is to be expected though, and is simply a
result of translating a high-level language to a low-level one.

The serialization used here is not particularly efficient, but isn't
horrible. In the future though, we may want to have a dedicated table
representation for source locations, so that streams of instructions
with a shared location can simply reference a single instance of the
source location, rather than duplicating the information for every
instruction. Additionally, we may want to allow including the actual
source file content in a library containing debug information, rather
than requiring end users to supply a path to a directory containing the
correct files.

One issue to be aware of, is that this implementation uses absolute
paths, but we may want to make paths relative when possible, so as to
make the paths portable. Technically, this is in the hands of users of
the assembler, by setting the source file paths to relative paths - but
in practice it would be quite awkard to force end users to handle this.
Additionally, there are times where an absolute path may be desirable
(such as when referencing well-known global paths, e.g.
`/usr/local/etc/..`). In any case, that is not a problem we attempt to
solve in this commit.
@bitwalker bitwalker force-pushed the bitwalker/asmop-source-spans branch from fdb4678 to 4c2540f Compare August 3, 2024 22:32
@bitwalker
Copy link
Contributor Author

@bobbinth I've rebased on next with the changes from #1422, just in case you want to review anything that may have changed. I've followed up on your first review with fixes, let me know if you have any other notes.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a few more small comments inline. I'll also write a reply to #1419 (comment) in the thread.

assembly/src/parser/lexer.rs Show resolved Hide resolved
assembly/src/library/mod.rs Show resolved Hide resolved
assembly/src/compile.rs Show resolved Hide resolved
miden/src/cli/data.rs Outdated Show resolved Hide resolved
Temporary until we can use the new `lints` section of Cargo.toml once we
have an MSRV of 1.80
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for adding the tests and comments for the RwLock! Just a few more small comments inline.

Comment on lines +57 to +59
# Enable once the VM reaches Rust 1.80+
#[lints.rust]
#unexpected_cfgs = { level = "warn", check-cfg = ['cfg(loom)'] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this right after this PR.

assembly/src/library/mod.rs Show resolved Hide resolved
assembly/src/compile.rs Show resolved Hide resolved
@bobbinth bobbinth mentioned this pull request Aug 5, 2024
* Remove `SingleThreadedSourceManager`
* Rename `MultiThreadedSourceManager` to `DefaultSourceManager`
* Replace all uses of `SingleThreadedSourceManager` with
  `DefaultSourceManager`, remove all clippy attributes used to disable
  the lint about wrapping non-Send objects in an Arc
* Give `ProgramFile` two constructors, one for the common case which
  constructs a default source manager, the other for specific use cases
  where the caller owns the source manager and needs to construct the
  `ProgramFile` with it
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth bobbinth merged commit c7cb4a9 into next Aug 5, 2024
9 checks passed
@bobbinth bobbinth deleted the bitwalker/asmop-source-spans branch August 5, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants