-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(assembly): track source locations in debug mode #1419
Conversation
2306927
to
b410f48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! I left a couple of non-blocking comments/questions inline. I think the main thing is to create an issue to improve source location tracking in the future (i.e., to make it more efficient and potentially add an option to include source code in serialized libraries/programs).
Also, it seems like we now have two SourceLocation
structs: the new one in core
and the old one in the assembler. It seems like the old one is actually unused - so, maybe we should remove it.
I'm reworking this PR a bit, as I found that we need more precise spans. Currently, spans are implicitly associated with a source file attached at the procedure level, but this is all but useless for code lowered from Rust, and won't work well once we start inlining code intraprocedurally. At least as far as troubleshooting issues in the compiler goes, this PR won't cut it, so I'd rather hold off on merging it for the moment. As a result of my findings, I'm refining the way we represent source spans a bit, so that each span is associated with a file via a unique id that is assigned by a source manager, very much like how the compiler works. Once that is done, I'll switch this PR back from draft status and we can re-review it. |
78c4532
to
82de180
Compare
This should be ready for review now - see the original PR description for what is now implemented here. The same mechanism is used to associate debug info with instructions in MAST, but the entire system for managing source spans and similar information, has changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! Not a full review yet, but I did leave some comments inline.
Also, probably not for this PR, but what do you think about "embedding" source files into the MastForest
. This way, once a MastForest
is compiled in debug mode, all relevant debug info is self-contained in it.
This commit introduces an initial, minimal implementation of source mapping for Miden Assembly instructions assembled to MAST. This is done by piggy-backing on the existing `AssemblyOp` decorator, extending it with an optional `SourceLocation`, which is only emitted to the serialized MAST when debug mode is disabled _and_ there is source location information available. A `SourceLocation` consists of a string representing the file path of the source (as known to the assembler), and a span, whose start and end bounds are byte indices in the original source file, representing a contiguous range of valid UTF-8 bytes corresponding to the relevant source code from which the given Miden Assembly operation was derived. This will be pretty much 1:1 when the source is itself MASM code, but when the MASM in question is lowered from a high-level language, such as Rust, the source span may correspond to many instructions, and the connection between a given span and a specific MASM instruction may be non-obvious at times - this is to be expected though, and is simply a result of translating a high-level language to a low-level one. The serialization used here is not particularly efficient, but isn't horrible. In the future though, we may want to have a dedicated table representation for source locations, so that streams of instructions with a shared location can simply reference a single instance of the source location, rather than duplicating the information for every instruction. Additionally, we may want to allow including the actual source file content in a library containing debug information, rather than requiring end users to supply a path to a directory containing the correct files. One issue to be aware of, is that this implementation uses absolute paths, but we may want to make paths relative when possible, so as to make the paths portable. Technically, this is in the hands of users of the assembler, by setting the source file paths to relative paths - but in practice it would be quite awkard to force end users to handle this. Additionally, there are times where an absolute path may be desirable (such as when referencing well-known global paths, e.g. `/usr/local/etc/..`). In any case, that is not a problem we attempt to solve in this commit.
fdb4678
to
4c2540f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a few more small comments inline. I'll also write a reply to #1419 (comment) in the thread.
Temporary until we can use the new `lints` section of Cargo.toml once we have an MSRV of 1.80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you for adding the tests and comments for the RwLock
! Just a few more small comments inline.
# Enable once the VM reaches Rust 1.80+ | ||
#[lints.rust] | ||
#unexpected_cfgs = { level = "warn", check-cfg = ['cfg(loom)'] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this right after this PR.
* Remove `SingleThreadedSourceManager` * Rename `MultiThreadedSourceManager` to `DefaultSourceManager` * Replace all uses of `SingleThreadedSourceManager` with `DefaultSourceManager`, remove all clippy attributes used to disable the lint about wrapping non-Send objects in an Arc * Give `ProgramFile` two constructors, one for the common case which constructs a default source manager, the other for specific use cases where the caller owns the source manager and needs to construct the `ProgramFile` with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Thank you!
This PR introduces a more robust implementation of source code management and source mapping for Miden Assembly instructions assembled to MAST. This is done by piggy-backing on the existing
AssemblyOp
decorator, extending it with an optionalLocation
, which is only emitted to the serialized MAST when debug mode is enabled and there is source location information available.A
Location
consists of a string representing the file path of the source (as known to the assembler), and a span, whose start and end bounds are byte indices in the original source file, representing a contiguous range of valid UTF-8 bytes corresponding to the relevant source code from which the given Miden Assembly operation was derived. This will be pretty much 1:1 when the source is itself MASM code, but when the MASM in question is lowered from a high-level language, such as Rust, the source span may correspond to many instructions, and the connection between a given span and a specific MASM instruction may be non-obvious at times - this is to be expected though, and is simply a result of translating a high-level language to a low-level one.In addition to the mechanism by which we store debug info in the assembled MAST, there are a number of new primitives used to manage source code and spans related to each node in the AST, and to improve the precision of debug info for high-level languages. Up until now, spans were assumed to be all in the same source file within a procedure (and at the top level, within a module). This doesn't work at all for high-level languages, which due to inlining/macros/etc, may mix spans from many different files together in the same procedure. These new primitives address that. They are defined in
miden-core
, and exported from thedebuginfo
module, as well as re-exported frommiden-assembly
at the top level. To quickly summarize:SourceManager
is a trait which defines the core capabilities of source code management needed for various purposes throughout the project, and in the compiler. With aSourceManager
in hand, you can create aSourceFile
from a string or path to a file on disk (covered in a second), query managed files, and resolve spans to various useful types of metadata, and other helpful functionality.SourceFile
corresponds to a single file of source code (either Miden Assembly, or some higher level language), it has a path/name, and metadata used for diagnostics (e.g. precomputed line offsets so we can resolve spans to file/line/col for display). ASourceFile
is assigned a uniqueSourceId
by its owningSourceManager
.SourceSpan
is aSourceId
paired with a range of byte indices, and uniquely identifies a slice of source code in the file corresponding to the spansSourceId
. This is what we associated with AST items, either as a field of the object (and implementingSpanned
), or by wrapping them inSpan<T>
. In order to resolve aSourceSpan
to the actual content, or to find out information like filename, we need theSourceManager
in which theSourceId
was allocated, so for this purpose we require one to be provided when constructing anAssembler
orModuleParser
.Two built-in
SourceManager
implementations are provided:SingleThreadedSourceManager
andMultiThreadedSourceManager
, with the key difference between them being whether or not a synchronization primitive is involved. In order to provide such a primitive on all our supported platforms,miden_core::utils::sync::RwLock
is provided (which is an alias forparking_lot::RwLock
withstd
enabled, otherwise an implementation based on atomics is used). The key difference essentially boils down to whether you need theSourceManager
to beSend + Sync
or not. By default, we generally useSingleThreadedSourceManager
in this PR, since there aren't any cases in this repo where we are sharing the source manager across threads - however the compiler currently uses a multi-threaded source manager, and I'd like to switch to using the one provided here and unify things a bit.Serialization
The serialization used here is not particularly efficient, but isn't horrible. In the future though, we may want to have a dedicated table representation for source locations, so that streams of instructions with a shared location can simply reference a single instance of the source location, rather than duplicating the information for every instruction. Additionally, we may want to allow including the actual source file content in a library containing debug information, rather than requiring end users to supply a path to a directory containing the correct files.
Future Changes
One issue to be aware of, is that, in general,
Location
will be serialized using absolute paths, but we may want to make paths relative when possible, so as to make the paths portable (otherwise, debug info embedded in the MAST is only useful on a machine that has the source code in the exact same filesystem location). Technically, this is in the hands of users of the assembler, by setting the source file paths to relative paths (e.g.--remap-path-prefix
inrustc
) - but in practice it would be quite awkward to force end users to handle this. Additionally, there are times where an absolute path may be desirable (such as when referencing well-known global paths, e.g./usr/local/etc/..
). In any case, that is not a problem we attempt to solve in this PR.Related Issues
I was working on wiring up debug info in the compiler, and realized we were missing the last link in the chain (tracking current instruction source locations during execution). With this merged, it will be possible to render diagnostics with source spans when an instruction fails in the VM, so long as:
CodeMap
inmiden-diagnostics
) using the source file path associated with the instruction of interest. Furthermore, the span we have must be in-bounds of that file, an out-of-bounds span is treated as if we do not have the source available.If any of those requirements are unmet, then either source locations are disabled entirely during assembly, no source location was available for the given instruction, or we are unable to render a source span because we don't have the file content.