Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement program linker #24

Merged
merged 18 commits into from
Sep 22, 2023
Merged

feat: implement program linker #24

merged 18 commits into from
Sep 22, 2023

Conversation

bitwalker
Copy link
Contributor

This commit introduces [Linker], which is used to take a set of [Module] and perform many of the tasks that a typical system linker would handle. In the Miden world, the linker gets involved at a slightly different stage of compilation however, and does not have the final say in what code will ultimately get executed at runtime (with some exceptions).

Notably, [Linker] is invoked on a set of modules in IR form, rather than in Miden Assembly (MASM) form - in other words, the linker is invoked before modules are lowered to MASM by the codegen backend. This is because one of the critical tasks performed by the linker, is the allocation and layout of linear memory for use by the compiled program. To do this, the linker must determine what set of data segments and globals will end up in the output, validate that there are no conflicts, and that all referenced symbols have corresponding definitions, and then lay out the data segments and globals in memory so that the codegen backend knows what at what address to find a given global, as well as where the start of the heap will fall for use by the program.

The [Linker] is also responsible for garbage collecting unused declarations, but this is dependent on what type of artifact is being produced.

Since MASM is itself an intermediate representation for the Miden VM, the compiler does not actually know whether additional modules will be included in the set provided to the VM at runtime, or even whether all of the emitted modules will be provided to the VM. That said, we've designed the linker to make certain assumptions based on what type of output is being emitted. For example, an executable program is considered "closed", i.e. we assume that no additional code will be
introduced other than what is being linked (aside from the Miden standard library). On the other hand, a library program is considered "open", i.e. we assume that the library will be included as part of a larger executable program that will contain modules we don't necessarily control. Despite that, we do make an assumption that any additonal modules in that situation will play nicely with the memory layout of the library.


This is a pre-requisite for subsequent codegen PRs that use the output of the linker to emit Miden Assembly from corresponding Miden IR.

/cc @greenhat - just keeping you in the loop here, as one of the changes here affects your Wasm frontend (very trivially, just a small tweak to the API for declaring global variables).

This commit makes some fixes/improvements to the `GlobalVariableTable`
data structure in order to accommodate it's usage in both `Module` and
`Program` contexts correctly.

* Declaring a new global variable no longer supports conflict
  resolution, as that was intended only for use when linking modules
  into a `Program`.
* When declaring a new global variable, you can now provide an
  optional initializer at the same time. The existing `set_initializer`
  API still exists, but it is no longer necessary to use it when you
  are declaring and setting the initializer at the same time.
* There is now a separate public API for inserting previously-declared
  `GlobalVariableData` into the table while performing conflict
  resolution on symbols with internal or odr linkage. This is used by
  `Linker` to import globals from a module into the program-wide table
* An API was added to remove a global variable from the table, which is
  used by `Linker` to perform garbage collection of unused globals
This commit makes some adjustments to the way data segments are
represented to improve the ergonomics of working with them:

* The initializer of a data segment is now stored directly in the
  `DataSegment`, rather than in a separate `ConstantPool`, this avoids
  confusion around having multiple constant pools, and makes it easier
  to examine a given segment without having to have the constant pool on
  hand. Since data segments are very unlikely to have the same
  initializers, there is little benefit to using a constant pool for
  them.
* Provide a similar pair of `declare` and `insert` APIs for the
  `DataSegmentTable` to what is provided by `GlobalVariableTable`
* Extend `DataSegmentTable` with APIs for accessing the last segment of
  the table, and draining the table front-to-back.
@bitwalker bitwalker self-assigned this Sep 21, 2023
@bobbinth
Copy link
Contributor

bobbinth commented Sep 22, 2023

Without having looked at the code, one question I wanted to ask: do we have a way to define "external functions" (besides the ones in the Miden standard library). By external functions I mean some functions which the complier will know the signature but not the implementation.

Basically, I wonder if the compiler will be able to output something like:

use.my_library::my_module

begin
    ...
    exec.my_module::some_procedure
    ...
end

Where my_library is written in MASM but the compiler knows all procedure signatures in that library.

This commit introduces [Linker], which is used to take a set of [Module]
and perform many of the tasks that a typical system linker would handle.
In the Miden world, the linker gets involved at a slightly different
stage of compilation however, and does not have the final say in what
code will ultimately get executed at runtime (with some exceptions).

Notably, [Linker] is invoked on a set of modules in IR form, rather than
in Miden Assembly (MASM) form - in other words, the linker is invoked
before modules are lowered to MASM by the codegen backend. This is
because one of the critical tasks performed by the linker, is the
allocation and layout of linear memory for use by the compiled program.
To do this, the linker must determine what set of data segments and
globals will end up in the output, validate that there are no conflicts,
and that all referenced symbols have corresponding definitions, and then
lay out the data segments and globals in memory so that the codegen
backend knows what at what address to find a given global, as well as
where the start of the heap will fall for use by the program.

The [Linker] is also responsible for garbage collecting unused
declarations, but this is dependent on what type of artifact is being
produced.

Since MASM is itself an intermediate representation for the Miden VM,
the compiler does not actually know whether additional modules will be
included in the set provided to the VM at runtime, or even whether all
of the emitted modules will be provided to the VM. That said, we've
designed the linker to make certain assumptions based on what type of
output is being emitted. For example, an executable program is
considered "closed", i.e. we assume that no additional code will be
introduced other than what is being linked (aside from the Miden
standard library). On the other hand, a library program is considered
"open", i.e. we assume that the library will be included as part of a
larger executable program that will contain modules we don't necessarily
control. Despite that, we do make an assumption that any additonal
modules in that situation will play nicely with the memory layout of the
library.
Types, immediates, and immediate instructions are now uniform, with the
exception that we still have the isize/usize types, but the
corresponding immediates/instructions have been removed in favor of
using more explicit types. In the future we may remove isize/usize types
as well, but for now it may still be useful for things like array
indices and such where the precise type is not important, and the
compiler may choose the most efficient type of its choice.

There are no signed vs unsigned immediates and instructions, matching
recent type system changes.
This commit adds a rather comprehensive test of the linker, as well as a
variety of interesting operations in the IR in general (data segments,
global variables, memory management, etc.)

In the process, I've cleaned up the integration test module a bit, to
make it easier to reuse some of the basic module setup across multiple
tests. No changes to the pre-existing tests aside from moving stuff
around a bit. The linker test is all new.

A number of commits preceding this one implement useful functionality
or fix bugs that came up when writing the linker test.
@bitwalker
Copy link
Contributor Author

Without having looked at the code, one question I wanted to ask: do we have a way to define "external functions" (besides the ones in the Miden standard library). By external functions I mean some functions which the complier will know the signature but not the implementation.

Yep, though the compiler currently only supports this for standard library functions it knows about that are "baked in"; but if we can load metadata about those external functions from somewhere, then we could make it much more open-ended/flexible than it is now. Bottom line though, yes we can support this with what's there today.

We could also allow specifying a list of modules/functions that are basically whitelisted so that any signature is considered valid for them (basically disabling the validation of those calls), but I think that would be the wrong approach in general. It might be a useful escape hatch though. Requiring some kind of metadata/signature file to be provided for a library you wish to link against would be safer/better IMO.

@bobbinth
Copy link
Contributor

I agree that going with metadata/signature file is a better approach. Let's create an issue for this, and we can discuss there file format etc.

Copy link
Contributor

@jjcnn jjcnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of minor comments, which you can deal with as you see fit.

hir/src/builder.rs Outdated Show resolved Hide resolved
hir/src/builder.rs Outdated Show resolved Hide resolved
hir/src/globals.rs Outdated Show resolved Hide resolved
hir/src/program/linker.rs Outdated Show resolved Hide resolved
* Set up a separate `layout` module in the `miden-hir-type` crate which
  holds the `TypeRepr` implementation, as well as all of its supporting
  functionality.
* Provide an `Alignable` trait which can be used to calculate alignment
  offsets, next aligned value, and next multiple of any unsigned
  primitive integral type
* Rework the layout functions to use `Alignable`
* Rework code that duplicates functionality provided by `Alignable` to
  use the trait instead
@bitwalker bitwalker merged commit 0fe98ae into main Sep 22, 2023
2 checks passed
@bitwalker bitwalker deleted the bitwalker/linker branch September 22, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants