-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement program linker #24
Conversation
This commit makes some fixes/improvements to the `GlobalVariableTable` data structure in order to accommodate it's usage in both `Module` and `Program` contexts correctly. * Declaring a new global variable no longer supports conflict resolution, as that was intended only for use when linking modules into a `Program`. * When declaring a new global variable, you can now provide an optional initializer at the same time. The existing `set_initializer` API still exists, but it is no longer necessary to use it when you are declaring and setting the initializer at the same time. * There is now a separate public API for inserting previously-declared `GlobalVariableData` into the table while performing conflict resolution on symbols with internal or odr linkage. This is used by `Linker` to import globals from a module into the program-wide table * An API was added to remove a global variable from the table, which is used by `Linker` to perform garbage collection of unused globals
This commit makes some adjustments to the way data segments are represented to improve the ergonomics of working with them: * The initializer of a data segment is now stored directly in the `DataSegment`, rather than in a separate `ConstantPool`, this avoids confusion around having multiple constant pools, and makes it easier to examine a given segment without having to have the constant pool on hand. Since data segments are very unlikely to have the same initializers, there is little benefit to using a constant pool for them. * Provide a similar pair of `declare` and `insert` APIs for the `DataSegmentTable` to what is provided by `GlobalVariableTable` * Extend `DataSegmentTable` with APIs for accessing the last segment of the table, and draining the table front-to-back.
Without having looked at the code, one question I wanted to ask: do we have a way to define "external functions" (besides the ones in the Miden standard library). By external functions I mean some functions which the complier will know the signature but not the implementation. Basically, I wonder if the compiler will be able to output something like:
Where |
This commit introduces [Linker], which is used to take a set of [Module] and perform many of the tasks that a typical system linker would handle. In the Miden world, the linker gets involved at a slightly different stage of compilation however, and does not have the final say in what code will ultimately get executed at runtime (with some exceptions). Notably, [Linker] is invoked on a set of modules in IR form, rather than in Miden Assembly (MASM) form - in other words, the linker is invoked before modules are lowered to MASM by the codegen backend. This is because one of the critical tasks performed by the linker, is the allocation and layout of linear memory for use by the compiled program. To do this, the linker must determine what set of data segments and globals will end up in the output, validate that there are no conflicts, and that all referenced symbols have corresponding definitions, and then lay out the data segments and globals in memory so that the codegen backend knows what at what address to find a given global, as well as where the start of the heap will fall for use by the program. The [Linker] is also responsible for garbage collecting unused declarations, but this is dependent on what type of artifact is being produced. Since MASM is itself an intermediate representation for the Miden VM, the compiler does not actually know whether additional modules will be included in the set provided to the VM at runtime, or even whether all of the emitted modules will be provided to the VM. That said, we've designed the linker to make certain assumptions based on what type of output is being emitted. For example, an executable program is considered "closed", i.e. we assume that no additional code will be introduced other than what is being linked (aside from the Miden standard library). On the other hand, a library program is considered "open", i.e. we assume that the library will be included as part of a larger executable program that will contain modules we don't necessarily control. Despite that, we do make an assumption that any additonal modules in that situation will play nicely with the memory layout of the library.
Types, immediates, and immediate instructions are now uniform, with the exception that we still have the isize/usize types, but the corresponding immediates/instructions have been removed in favor of using more explicit types. In the future we may remove isize/usize types as well, but for now it may still be useful for things like array indices and such where the precise type is not important, and the compiler may choose the most efficient type of its choice. There are no signed vs unsigned immediates and instructions, matching recent type system changes.
This commit adds a rather comprehensive test of the linker, as well as a variety of interesting operations in the IR in general (data segments, global variables, memory management, etc.) In the process, I've cleaned up the integration test module a bit, to make it easier to reuse some of the basic module setup across multiple tests. No changes to the pre-existing tests aside from moving stuff around a bit. The linker test is all new. A number of commits preceding this one implement useful functionality or fix bugs that came up when writing the linker test.
0bf8617
to
eaf4de6
Compare
Yep, though the compiler currently only supports this for standard library functions it knows about that are "baked in"; but if we can load metadata about those external functions from somewhere, then we could make it much more open-ended/flexible than it is now. Bottom line though, yes we can support this with what's there today. We could also allow specifying a list of modules/functions that are basically whitelisted so that any signature is considered valid for them (basically disabling the validation of those calls), but I think that would be the wrong approach in general. It might be a useful escape hatch though. Requiring some kind of metadata/signature file to be provided for a library you wish to link against would be safer/better IMO. |
I agree that going with metadata/signature file is a better approach. Let's create an issue for this, and we can discuss there file format etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of minor comments, which you can deal with as you see fit.
* Set up a separate `layout` module in the `miden-hir-type` crate which holds the `TypeRepr` implementation, as well as all of its supporting functionality. * Provide an `Alignable` trait which can be used to calculate alignment offsets, next aligned value, and next multiple of any unsigned primitive integral type * Rework the layout functions to use `Alignable` * Rework code that duplicates functionality provided by `Alignable` to use the trait instead
This commit introduces [Linker], which is used to take a set of [Module] and perform many of the tasks that a typical system linker would handle. In the Miden world, the linker gets involved at a slightly different stage of compilation however, and does not have the final say in what code will ultimately get executed at runtime (with some exceptions).
Notably, [Linker] is invoked on a set of modules in IR form, rather than in Miden Assembly (MASM) form - in other words, the linker is invoked before modules are lowered to MASM by the codegen backend. This is because one of the critical tasks performed by the linker, is the allocation and layout of linear memory for use by the compiled program. To do this, the linker must determine what set of data segments and globals will end up in the output, validate that there are no conflicts, and that all referenced symbols have corresponding definitions, and then lay out the data segments and globals in memory so that the codegen backend knows what at what address to find a given global, as well as where the start of the heap will fall for use by the program.
The [Linker] is also responsible for garbage collecting unused declarations, but this is dependent on what type of artifact is being produced.
Since MASM is itself an intermediate representation for the Miden VM, the compiler does not actually know whether additional modules will be included in the set provided to the VM at runtime, or even whether all of the emitted modules will be provided to the VM. That said, we've designed the linker to make certain assumptions based on what type of output is being emitted. For example, an executable program is considered "closed", i.e. we assume that no additional code will be
introduced other than what is being linked (aside from the Miden standard library). On the other hand, a library program is considered "open", i.e. we assume that the library will be included as part of a larger executable program that will contain modules we don't necessarily control. Despite that, we do make an assumption that any additonal modules in that situation will play nicely with the memory layout of the library.
This is a pre-requisite for subsequent codegen PRs that use the output of the linker to emit Miden Assembly from corresponding Miden IR.
/cc @greenhat - just keeping you in the loop here, as one of the changes here affects your Wasm frontend (very trivially, just a small tweak to the API for declaring global variables).