Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Library trait #1422

Merged
merged 18 commits into from
Aug 3, 2024
Merged

Remove Library trait #1422

merged 18 commits into from
Aug 3, 2024

Conversation

plafer
Copy link
Contributor

@plafer plafer commented Jul 30, 2024

Removes the old Library trait and MaslLibrary, and renames CompiledLibrary.

@plafer plafer force-pushed the plafer-remove-old-library branch from 37c898c to 64b444b Compare July 30, 2024 19:04
plafer and others added 12 commits July 30, 2024 15:26
This commit fixes the issue where re-exported procedures (aliases) were
not being handled properly during assembly. It now explicitly represents
them in the call graph, so that topological ordering of the call graph
will ensure that we only visit the aliases once the aliased procedure
has been compiled/visited.

As part of the solution implemented here, some refinements to
`AliasTarget` were made, in order to explicitly represent whether the
target is absolute or not (just like `InvocationTarget`). Additionally,
to avoid confusion, `FullyQualifiedProcedureName` was renamed to
`QualifiedProcedureName`, so as to make it clear that just because the
path is qualified, does not mean it is absolute (and thus "fully"
qualified).

Some conveniences were added to `LibraryNamespace`, `LibraryPath`, and
`AliasTarget` to make certain operations/conversions more ergonomic and
expressive.
This commit refactors `CompiledLibrary` a bit, to remove some
unnecessary restrictions leftover from the old MASL libraries:

* A `CompiledLibrary` no longer has a name, but it has a content digest
  obtained by lexicographically ordering the exported MAST roots of the
  library, and merging the hashes in order.
* As a consequence of being unnamed/having no namespace, a
  `CompiledLibrary` can now consist of procedures from many modules
  _and_ many namespaces. Any limitation we impose on top of that can be
  done via wrapper types, like how `KernelLibrary` is implemented.
* Support for re-exported procedures in a `CompiledLibrary` is
  implemented. It is assumed that all required libraries will be
  provided to the `Host` when executing a program.
* Some ergonomic improvements to APIs which accept libraries or sets of
  modules, to allow a greater variety of ways you can pass them.
Previously, we assumed that two procedures with the same MAST root, but
differing numbers of locals, was a bug in the assembler. However, this
is not the case, as I will elaborate on below.

If you compile a program like so:

```masm
use.std::u64

begin
  exec.u64::checked_and
end
```

The resulting MAST would look something like:

```mast
begin
  external.0x....
end
```

This MAST will have the exact same MAST root as `std::u64::checked_and`,
because `external` nodes have the same digest as the node they refer to.
Now, if the exec'd procedure has the same number of locals as the
caller, this is presumed to be a "compatible" procedure, meaning it is
fine to let both procedures refer to the same MAST. However, when the
number of procedure locals _differs_, we were raising a compatible
definition error, because it was assumed that due to the instructions
added when procedure locals are present, two procedures with differing
numbers of locals _could not_ have the same root by definition. This is
incorrect, let me illustrate:

```masm

export.foo.2
  ...
end

use.foo

proc.bar.3
   exec.foo::foo
end

begin
   exec.foo::foo
end
```

Assume that `foo.masm` is first compiled to a `CompiledLibrary`, which
is then added to the assembler when assembling an executable program
from `bar.masm`.

Also, recall that `begin .. end` blocks are effectively an exported
procedure definition, with zero locals, that has a special name - but in
all other ways it is just a normal procedure.

The above program is perfectly legal, but we would raise an error during
assembly of the program due to the `begin .. end` block compiling to an
`external` node referencing the MAST root of `foo::foo`, which has a
non-zero number of procedure locals.

The `bar` procedure is there to illustrate that even though it too
simply "wraps" the `foo::foo` procedure, it has a non-zero number of
procedure locals, and thus cannot ever have the same MAST root as a
wrapped procedure with a non-zero number of locals, due to the presence
of locals changing the root of the wrapping procedure. A check has been
kept around that ensures we catch if ever there are two procedures with
non-zero counts of procedure locals, with the same MAST root.
@bitwalker bitwalker marked this pull request as ready for review August 3, 2024 06:24
@bitwalker
Copy link
Contributor

@plafer I broke up the commits I added into a few discrete chunks, definitely read the commit description (when present), because I go into more detail on what that commit was for.

@bobbinth This is ready for your final review, and can be merged whenever IMO.

Re-exported procedures now work as expected again, and are now fully supported in compiled libraries, so we're one step ahead of where we thought we'd be.

Also relevant to both of you: I made some last minute changes to the way CompiledLibrary was implemented - nothing too major, but important:

  1. No more namespace for CompiledLibrary. Leave that up to something higher level than CompiledLibrary, but it just isn't needed, and is in fact detrimental. The main consequence of this, beyond having no canonical name, is that you can add modules from any namespace to a single CompiledLibrary, which is what we want anyway.
  2. Instead of a namespace, CompiledLibrary now has a "canonical" content digest to uniquely identify it. It is computed by merging the MAST roots of all exports of the library, using the lexicographical order of the digests. I think this is useful to have as a primitive, even though we do not yet make use of it anywhere, my first thought is for packaging.
  3. CompiledLibrary can now export procedure aliases from its interface. This is done by slightly tweaking how we represent procedures internally to the CompiledLibrary, but the result is that from an outside perspective, it "just works".
  4. The error raised when two procedures with the same MAST root and differing procedure local counts are observed, has been all but removed. It is only correct in a narrow, specific edge case, see the commit description for the commit referencing "conflicting definitions" for more details.

Also, FullyQualifiedProcedureName has been renamed as QualifiedProcedureName, to remove the implication that it is somehow absolute or fully-resolved, which I think makes it confusing. Now the semantics are better reflected, i.e. it is simply a procedure name qualified with a module path, but whether or not it represents an absolute path is provided by higher-level types or context.

The remaining changes I made were to fix various test failures I encountered while cleaning up/validating. Virtually all of them are artifacts of needing to set up the test environment a bit differently when referencing libraries, inlining of blocks being disabled in earlier commits on this branch, etc. If you see anything that catches your eye though, let me know and I can let you know what prompted the change.

Copy link
Contributor

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the parts of this that I didn't write, I'll leave it up to one of you two to approve my changes to it.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you! I left a few comments inline - but I'll merge as is and address them in a follow-up PR.

Another general comment: I think it would be nice to have a FullyQualifiedProcedureName type - but this may require a bigger refactoring of LibraryPath and associated types. So, also leaving it for the future.

Comment on lines +35 to 36
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CompiledLibrary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename this into just Library as we don't have a concept of "uncompiled" libraries any more.

Comment on lines +55 to 62
#[derive(Debug, Clone, PartialEq, Eq)]
#[repr(u8)]
enum Export {
/// The export is contained in the [MastForest] of this library
Local(MastNodeId),
/// The export is a re-export of an externally-defined procedure from another library
External(RpoDigest),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me whether we actually need to make this distinction. I think we should be able to treat them the same way (i.e., External procedure could be represented by a single External node in the MastForest).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I wanted to represent here was whether the export was part of the library, or re-exported from another library. If we make the change you suggest, this information would be obscured. We don't currently expose this information, but I suspect it will come in handy when building packaging infrastructure, so that our tooling can use information like this for analysis purposes.

That said, it's not critical, so I could go either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still be able to get this info. For example, we could provide something CompiledLibrary::get_reexports() -> Vec<(QualifiedProcedureName, RpoDigest)>. This method would just iterate over roots of the MAST forest and filter out roots which are not direct references to External nodes.

Comment on lines +157 to +158
/// Serialize to `target` using `options`
pub fn write_into_with_options<W: ByteWriter>(&self, target: &mut W, options: AstSerdeOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably rename AstSerdeOptions into something else as these no longer refer to abstract syntax tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, definitely feels more general now.

Comment on lines 140 to +141
/// Adds the compiled library to provide modules for the compilation.
pub fn add_compiled_library(&mut self, library: CompiledLibrary) -> Result<(), Report> {
pub fn add_compiled_library(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to one of the previous comments, I think this and similar methods could be called just add_library() since we don't have uncompiled libraries any more.

Comment on lines 10 to +11
/// TODO: add docs
pub struct StdLibrary(MaslLibrary);
pub struct StdLibrary(CompiledLibrary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we actually need to define a new struct. An alternative could to have this crate expose a single method - something like:

pub fn get_library() -> CompiledLibrary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nice to use wrapper structs that contain a CompiledLibrary, since that lets them be used in places where a CompiledLibrary is expected, without needing to extract the actual CompiledLibrary explicitly. In fact, I would suggest re-introducing some library traits, that looks like so:

/// Exposes `CompiledLibrary` functionality on any wrapper struct that impls AsRef,
/// but does not permit overriding the core `CompiledLibrary` APIs
///
/// Extra library-related functionality that allows customization, would be added here
pub trait Library: sealed::LibraryImpl {}

mod sealed {
    /// Ensures that only types that contain a `CompiledLibrary` can implement `Library`
    pub(super) trait LibraryImpl: AsRef<CompiledLibrary> {
        fn digest(&self) -> RpoDigest {
            self.as_ref().digest()
        }
        fn mast_forest(&self) -> &MastForest {
            self.as_ref().mast_forest()
       }
    }
    impl<T: AsRef<CompiledLibrary>> LibraryImpl for T {}
}

/// Marker trait for `Library` impls which are valid kernels
pub trait Kernel: Library + AsRef<KernelLibrary> {}

In virtually every place where we use CompiledLibrary now, we actually expect something that impls AsRef<CompiledLibrary>, but I would argue it makes sense to have a dedicated trait for this, so that we can allow custom library types to customize certain behaviors.

It would also make it possible to instantiate a static Library impl, as opposed to constructing it every time someone calls get_library().

Anyway, food for thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I'm hoping to do is define the assembler in such a way that we construct it with a set of libraries upfront. This way, we don't need to deal with situations when libraries can be added dynamically during the assembly process (which, I think, in practice should never happen).

With Library being a struct, we could have the following constructor for the assembler:

impl Assembler {

    pub fn new(
        source_manager: Arc<dyn SourceManager>,
        kernel: KernelLibrary,
        libraries: impl IntoIterator<Item = Library>,
    ) -> Result<Self, Report> {
        ...
    }

}

We could have other, simpler constructors (e.g., with_kernel()) which would internally call this constructor - but the main idea is that once we've instantiated the assembler, we can only add modules and assembler programs/libraries etc.

If Library is a trait - this becomes a bit more complicated, I think.

Comment on lines +30 to +31
// TODO: Add version to `Library`
let _version = env!("CARGO_PKG_VERSION").parse::<Version>().expect("invalid cargo version");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do this before the release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably define the semantics of library versioning first, since it isn't super clear to me just yet what one can do with a version that makes sense at the moment. The only thing useful at this point, is the digest computed for a CompiledLibrary, which tells you when the contents of the library has changed.

@bobbinth bobbinth merged commit 06fe116 into next Aug 3, 2024
9 checks passed
@bobbinth bobbinth deleted the plafer-remove-old-library branch August 3, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants