Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design the handling of function pointers #125

Merged
merged 1 commit into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 72 additions & 2 deletions docs/Memory Model.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,76 @@ follows:

### Function Pointers

This section is TBD, and will be filled in as part of the work on function pointers.
Function pointers are in common usage in LLVM, making it crucial that Hieratika is able to support
their usage. To that end, the following section outlines their integration with the memory model.

The precondition for this design is to first understand that we are not considering function
pointers to be _true pointers_ at this stage. While LLVM insists that they _are_—and offsets from
them can be used to access adjacent data such as
[prefix](https://llvm.org/docs/LangRef.html#prefix-data) or
[prologue](https://llvm.org/docs/LangRef.html#prologue-data) data—these features do not seem to be
used by Rust and hence can be safely ignored for now.

As a result, Hieratika treats function pointers _specially_, as follows:

- In LLVM IR, function pointer values are _always_ derived from function pointer constants,
referring directly to the encoded name of some function.
- The Hieratika compiler, at the point of generating function stubs, will generate global constants
for these function pointer constants, derived from the correct `blockaddress` expression,
containing a _relocatable_ reference to a block (`block_ptr`), and information about which module
the block was defined in (`module_id`).
- The compiler then generates _dispatch functions_ (named according to the
[mangling scheme](./Name%20Mangling.md)). These dispatch functions take the correct arguments
`A...` for their type, as well as the function pointer `ptr` and match on the provided function
pointer's `block_ptr` portion.
- If the `block_ptr` matches, the local dispatch function will call the correct target function (in
the current module) for this function pointer, passing the provided arguments and thereby calling
the correct function for that pointer.

Consider, by way of example, a dispatch function `__some_mangled_name`. It would have an
implementation similar to the following pseudocode.

```rust
fn __some_mangled_name(function_pointer: ptr, arg1: i8, arg2: i8) -> i16 {
match function_pointer.block_ptr {
0 => function_zero(arg1, arg2),
1 => function_one(arg1, arg2),
// ...
_ => panic!("Found function pointer {function_pointer} for function (i8, i8) -> i16 but no such target exists")
}
}
```

This results in a raft of dispatch functions for all possible function types in a module, which can
then be used to _locally_ discover the correct function to call through a function pointer.

This, however, is insufficient for _general-case_ function pointer dispatch. Where LLVM's
traditional [`blockaddress`](https://llvm.org/docs/LangRef.html#addresses-of-basic-blocks)
expressions cannot escape the local module, function pointers _easily_ can be passed between modules
and then called.

In order to solve this problem, the design for function pointers in Hieratika has a _second_ part,
known as the _meta_-dispatch table.

- These tables, generated by the _linker_—instead of the single-module hieratika compiler—are
similar to the local dispatch tables above, but instead of dispatching based on the `block_ptr`
portion of the function pointer, it instead dispatched based on the `module_id`.
- If the `module_id` matches, the meta dispatch function will call the correct target _local_
dispatch function for that module, passing the provided arguments and the function pointer, and
allowing the local dispatch function to select the correct implementation.

At first glance, this dual-layer dispatch seems quite expensive, essentially amounting to performing
comparisons of the function pointer to all possible targets in a program. However, Hieratika uses a
slightly more sensible search mechanism.

- Both `block_ptr` and `module_id` are (re-)allocated (at link time) such that they are each in a
contiguous block of integer identifiers.
- The dispatch function can then perform _binary search_ on the `module_id` to resolve to the
correct local dispatch function in $\mathbb{O}(\log_2 n)$ time, instead of $\mathbb{O}(n)$ time.

If, in the future, we work with languages that do make use of prefix or prologue data, this approach
will be entirely insufficient. To make _that_ work, we would need to re-work our handling of
function pointers to be _true_ pointers.

## Felt-Aligned Addressing - An Alternative Model

Expand All @@ -172,7 +241,8 @@ as a value of another type under byte-addressing and alignment rules—was rampa

As an example, it proved common to see IR that allocated `[4 x i8]` and then wrote an `i16` to the
first two bytes and read an `i16` from the other two. As, in the felt-aligned model, the first two
`i8`s would be written to individual felts, reading them back as an `i16` is significantly complex.
`i8`s would be written to individual felts, reading them back as an `i16` is significantly more
complex.

To that end, the project was forced to abandon this model in favor of a more-traditional
byte-aligned addressing model.
8 changes: 8 additions & 0 deletions docs/Name Mangling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Hieratika Name Mangling

We need to design a name mangling scheme for Hieratika to use. It should run on FLO's `Type` and
account for:

- Embedding type info (params and return type) for uniqueness.
- Embedding the function name.
- Embedding the module name.
Loading