-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm linker: aggressive rewrite towards Data-Oriented Design #22220
Open
andrewrk
wants to merge
61
commits into
master
Choose a base branch
from
wasm-linker
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+11,480
−11,958
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
andrewrk
force-pushed
the
wasm-linker
branch
from
December 14, 2024 22:04
c9bf6eb
to
4154612
Compare
andrewrk
force-pushed
the
wasm-linker
branch
2 times, most recently
from
December 19, 2024 04:18
327a795
to
ede3604
Compare
The goals of this branch are to: * compile faster when using the wasm linker and backend * enable saving compiler state by directly copying in-memory linker state to disk. * more efficient compiler memory utilization * introduce integer type safety to wasm linker code * generate better WebAssembly code * fully participate in incremental compilation * do as much work as possible outside of flush(), while continuing to do linker garbage collection. * avoid unnecessary heap allocations * avoid unnecessary indirect function calls In order to accomplish this goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily. For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding. This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated. This commit is not a complete implementation of all these goals; it is not even passing semantic analysis.
Makes linker functions have small error sets, required to report diagnostics properly rather than having a massive error set that has a lot of codes. Other linker implementations are not ported yet. Also the branch is not passing semantic analysis yet.
See #363. Please file issues rather than making TODO comments.
mainly, rework how relocations works. This is the point at which symbol indexes are known - not before. And don't emit unnecessary relocations! They're only needed when emitting an object file. Changes wasm linker to keep MIR around long-lived so that fixups can be reapplied after linker garbage collection. use labeled switch while we're at it
Still, the branch is not yet passing semantic analysis.
This branch is passing type checking now.
with this I get 5s compilations
fix some compilation errors for reworked Emit now that it's actually referenced introduce DataSegment.Id for sorting data both from object files and from the Zcu. introduce optimization: data segment sorting includes a descending sort on reference count so that references to data can be smaller integers leading to better LEB encodings. this optimization is skipped for object files. implement uav address access function which is based on only 1 hash table lookup to find out the offset after sorting.
and more disciplined type safety for output function indexes
in which case the values array is set to undefined
Recognize three distinct phases: * before prelink ("object phase") * after prelink, before flush ("zcu phase") * during flush ("flush phase") With this setup, we create data structures during the object phase, then mutate them during the zcu phase, and then further mutate them during the flush phase. In order to make the flush phase repeatable, the data structures are copied just before starting the flush phase. Further Zcu updates occur against the non-copied data structures. What's not implemented is frontend garbage collection, in which case some more changes will be needed in this linker logic to achieve a valid state with data invariants intact.
and expose object_host_name as an option for setting the lib name for object files, since the wasm linking standards don't specify a way to do it.
one hash table lookup per fixup
instead of recursion, callers of the function are responsible for checking the respective tables that might have new entries in them and then calling lowerZcuData again.
codegen can generate zcu data dependencies that need to be populated
it cannot be done earlier since ids are not stable yet
andrewrk
force-pushed
the
wasm-linker
branch
from
December 20, 2024 07:25
ede3604
to
3c4b45b
Compare
this strategy uses a "postponed" queue to handle codegen tasks that spawn too early. there's probably a better way.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goals of this branch are to:
In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.
For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.
This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.
Merge Checklist
Demo: Incremental Compilation
After this branch is ready to merge, I'll put a demo here.
Demo: Serializing and Deserializing Linker State
After this branch is ready to merge, I'll put a demo here.
Followup
After landing this branch I plan to set a firm release date for the 0.14.0 tag.
ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.
Post-Merge Roadmap: