Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
c3a2905
Add AST2 for now
rtfeldman Aug 16, 2025
5c64d1e
wip
rtfeldman Aug 16, 2025
9d56b33
wip
rtfeldman Aug 17, 2025
478f2e9
Delete NodeStore2 and Node2
rtfeldman Aug 17, 2025
def71cd
Get AST2 compiling
rtfeldman Aug 17, 2025
2409301
Make isBinOp exhaustive, delete Node.Start
rtfeldman Aug 17, 2025
a9ce312
Move some stuff into tokenize
rtfeldman Aug 17, 2025
2294db0
Get Parser2 going
rtfeldman Aug 17, 2025
c88d1a2
Add module headers and initCapacity/deinit to AST2
rtfeldman Aug 17, 2025
2e270fe
More Parser2 improvements
rtfeldman Aug 17, 2025
2a25d68
Make a bunch of Parser2 stuff real.
rtfeldman Aug 17, 2025
fd839e6
Move BytesSlice out of AST2
rtfeldman Aug 17, 2025
4be837d
wip
rtfeldman Aug 17, 2025
75f4485
wip 2
rtfeldman Aug 17, 2025
727e11e
some basic snapshots
rtfeldman Aug 17, 2025
b7bf851
revise s-expr rendering style a bit
rtfeldman Aug 17, 2025
c66cddf
repro some Parser2 bugs
rtfeldman Aug 17, 2025
918da87
various bug fixes
rtfeldman Aug 17, 2025
3648d41
Fix a union size issue
rtfeldman Aug 18, 2025
7ad83d3
Add .import and .underscore
rtfeldman Aug 18, 2025
73b39a0
Fix strings and add .ret and .crash
rtfeldman Aug 18, 2025
70ee601
Fix small strings in the parser
rtfeldman Aug 18, 2025
bf73fd5
Use variable-length encoding in ByteSlices
rtfeldman Aug 18, 2025
c42382c
Inline ByteSlice tests
rtfeldman Aug 18, 2025
5132e2f
Remove a branching conditional
rtfeldman Aug 18, 2025
5e08240
Remove an unnecessary newline
rtfeldman Aug 18, 2025
1e9589d
Regenerate snapshots
rtfeldman Aug 18, 2025
0173e52
Drop unused ModuleEnv import
rtfeldman Aug 18, 2025
379ea55
Add CIR2.zig
rtfeldman Aug 18, 2025
4c6e3e8
Empty tuples, don't store length in NodeSlices
rtfeldman Aug 19, 2025
db477da
Use Idx.NIL over empty variants
rtfeldman Aug 19, 2025
de10d25
Region fixes
rtfeldman Aug 19, 2025
41b32f8
Parse `for` and `while`
rtfeldman Aug 19, 2025
9c1035b
revise tokenize a bunch
rtfeldman Aug 19, 2025
50f737b
Add some tests of fn parsing behavior
rtfeldman Aug 19, 2025
a932fae
Update parser2 to use token iterator
rtfeldman Aug 19, 2025
47e380f
Fix module header parsing
rtfeldman Aug 20, 2025
58ece35
Fix some identifier parsing regressions
rtfeldman Aug 20, 2025
3da42a5
Fix string parsing implementation
rtfeldman Aug 20, 2025
bf37d24
Parser cleanup
rtfeldman Aug 20, 2025
c664e72
Fix some more parse issues
rtfeldman Aug 20, 2025
1f9eee5
Handle more parser cases
rtfeldman Aug 20, 2025
758ea7c
Fix remaining parser issues
rtfeldman Aug 20, 2025
7ce4d18
Replace recursion/nesting limits w/ labeled switch
rtfeldman Aug 20, 2025
0677f07
Fix record vs block parsing
rtfeldman Aug 21, 2025
c8aa8fa
Fix some missing language features
rtfeldman Aug 21, 2025
db36e9a
Fix handling of (disallowed) whitespace-applied types
rtfeldman Aug 21, 2025
a37cb89
Finish converting to tokenize_iter
rtfeldman Aug 21, 2025
4a54555
Fix remaining recursive functions in the parser
rtfeldman Aug 21, 2025
0936dea
wip
rtfeldman Aug 21, 2025
41c84d8
More parser cleanups
rtfeldman Aug 21, 2025
c273f28
more parser unification
rtfeldman Aug 21, 2025
9aaaf04
fix remaining parser stuff
rtfeldman Aug 21, 2025
767cba9
renames etc
rtfeldman Aug 21, 2025
7f6b85a
wip
rtfeldman Aug 22, 2025
db859d5
fix tests
rtfeldman Aug 22, 2025
4ca45ab
Parameterize NodeSlices
rtfeldman Aug 23, 2025
fca45b1
Remove debug stuff from tests for now
rtfeldman Aug 23, 2025
3692e76
Diagnostics
rtfeldman Aug 23, 2025
eb0048b
Expand CIR2
rtfeldman Aug 23, 2025
a2baf77
Add TypeCheck2.zig
rtfeldman Aug 23, 2025
58aa967
fix tests
rtfeldman Aug 23, 2025
6fc029e
Revise CIR tag calculations
rtfeldman Aug 23, 2025
7688362
Use AST2/CIR2 for snapshots
rtfeldman Aug 23, 2025
cd7db1a
Fix some missing stuff in tokenize_iter
rtfeldman Aug 23, 2025
cb302e2
fix more snapshot stuff
rtfeldman Aug 23, 2025
8efacf0
Fix a snapshot_tool bug
rtfeldman Aug 23, 2025
40d6d80
more snapshot fixes
rtfeldman Aug 23, 2025
9e65bda
cleanups
rtfeldman Aug 23, 2025
43cc4a2
many more fixes
rtfeldman Aug 23, 2025
2cdfd3f
Wire up type-checking to CIR2
rtfeldman Aug 23, 2025
81e1f2e
various fixes
rtfeldman Aug 24, 2025
93a48c5
More fixes
rtfeldman Aug 24, 2025
b162187
Additional fixes
rtfeldman Aug 24, 2025
fd0436c
More progress on snapshots
rtfeldman Aug 24, 2025
91a79c5
Update snapshots
rtfeldman Aug 24, 2025
a992029
Fix for malformed nodes
rtfeldman Aug 24, 2025
6c8066a
Canonicalize function application
rtfeldman Aug 24, 2025
68365f2
some cleanups
rtfeldman Aug 24, 2025
d08cfde
Implement a bunch of missing stuff
rtfeldman Aug 24, 2025
458c916
Update snapshots
rtfeldman Aug 24, 2025
da17f9d
Add infer_cir2.zig
rtfeldman Aug 24, 2025
c0626ff
Update formatter
rtfeldman Aug 24, 2025
903c29f
More formatter improvements
rtfeldman Aug 25, 2025
3636eab
Revise app header syntax: make `platform` a binop
rtfeldman Aug 25, 2025
e1bf729
Fix more formatter stuff
rtfeldman Aug 25, 2025
3f2aa79
More formatter and parser fixes
rtfeldman Aug 25, 2025
9d2383d
More formatter fixes
rtfeldman Aug 26, 2025
aac0f97
Fix some `match` formatting
rtfeldman Aug 26, 2025
ab8f26e
Store region end too
rtfeldman Aug 27, 2025
7258ee8
Fix some comment formatting etc.
rtfeldman Aug 28, 2025
746aaec
Fix more formatter stuff
rtfeldman Aug 28, 2025
b13017a
Make non-overlapping CIR tags
rtfeldman Aug 29, 2025
afebe4b
wip
rtfeldman Aug 29, 2025
cdff95e
wip resumable
rtfeldman Aug 29, 2025
e64491a
More fixes for resumable parser
rtfeldman Aug 30, 2025
14e5899
Fix blank line and comment handling
rtfeldman Aug 31, 2025
90c1bc1
Fix parsing bug
rtfeldman Aug 31, 2025
5e07f8d
Fix more canonical/formatting stuff
rtfeldman Aug 31, 2025
89e00e4
More parser/canonicalization fixes
rtfeldman Aug 31, 2025
13e638b
Add some more missing canonicalization
rtfeldman Aug 31, 2025
d2407af
Fix some formatter perf issues
rtfeldman Sep 1, 2025
1c84977
Fix lambda handling
rtfeldman Sep 1, 2025
9ace63d
Wire up record type-checking
rtfeldman Sep 1, 2025
683d95b
Fix record field formatting
rtfeldman Sep 1, 2025
61b6370
Type-check record field updates
rtfeldman Sep 1, 2025
b0e2a19
wip
rtfeldman Sep 2, 2025
d4f8e5a
minimal crash repro
rtfeldman Sep 2, 2025
90429dd
delete failing tests
rtfeldman Sep 2, 2025
1d6bf22
wip 2
rtfeldman Sep 2, 2025
c542779
remove minimal crash for now
rtfeldman Sep 2, 2025
68c0945
Replace old implementations with new ones
rtfeldman Sep 2, 2025
9ca8c0b
Fix lambda syntax
rtfeldman Sep 3, 2025
e159959
More fixes
rtfeldman Sep 3, 2025
c99c2e4
Fix some canonicalization and interpreter issues
rtfeldman Sep 3, 2025
c16bae6
Lots of eval fixes
rtfeldman Sep 4, 2025
56e3e38
Fix more tests
rtfeldman Sep 4, 2025
91e7b74
Reset snapshots
rtfeldman Sep 4, 2025
d873657
Revert crates/
rtfeldman Sep 4, 2025
d7bac2d
Remove snapshot2
rtfeldman Sep 4, 2025
877d58b
Merge origin/main - keep our AST/parser/tokenizer/canonicalizer changes
rtfeldman Sep 4, 2025
83a936a
Remove unused MultilineStringEnd token - multiline strings now contin…
rtfeldman Sep 4, 2025
bc1bc58
Delete modulo operator from benchmarks
rtfeldman Sep 4, 2025
26de472
Drop unused Region.empty
rtfeldman Sep 4, 2025
1f0f8d3
Reset some more stuff to origin/main
rtfeldman Sep 4, 2025
787884b
Merge origin/main - already have operand unification for comparison o…
rtfeldman Sep 5, 2025
ae64ab0
Revise CIR, check, and eval based on review
rtfeldman Sep 6, 2025
51d6144
Fix snapshots
rtfeldman Sep 6, 2025
9e7320e
Clean up some canonicalization conversions
rtfeldman Sep 6, 2025
502b681
Fix some unconverted CIR nodes
rtfeldman Sep 6, 2025
354f961
Fix some orphan AST nodes not being canonicalized
rtfeldman Sep 6, 2025
76ac5b2
Fix snapshots
rtfeldman Sep 6, 2025
3f57a41
Simplify orphan check
rtfeldman Sep 6, 2025
3316d62
Fix some tests
rtfeldman Sep 6, 2025
3a339ce
Add COMPILER_ARCHITECTURE.md
rtfeldman Sep 6, 2025
c427cc0
Fix shadowing reporting
rtfeldman Sep 6, 2025
18cf9f0
Fix some eval bugs
rtfeldman Sep 7, 2025
41abfa4
Fix more eval bugs
rtfeldman Sep 7, 2025
3a612fb
More fixes
rtfeldman Sep 7, 2025
35e00e5
Delete obsolete infer_cir.zig
rtfeldman Sep 7, 2025
5431078
Fix more tests
rtfeldman Sep 7, 2025
4fb4413
Delete non-iterating Tokenizer
rtfeldman Sep 7, 2025
9defcf6
Move tokens into its own module
rtfeldman Sep 7, 2025
44ce772
Clean up tokenization
rtfeldman Sep 8, 2025
bd5bc6c
wip
rtfeldman Sep 8, 2025
1cac8d7
Delete some obsolete Parser code
rtfeldman Sep 9, 2025
1bfbf94
Fix fmt
rtfeldman Sep 9, 2025
7abb3df
More fixes
rtfeldman Sep 9, 2025
1ab1ebe
Don't expect Payload to be a union(enum)
rtfeldman Sep 9, 2025
823aba6
Tokenizer improvements
rtfeldman Sep 9, 2025
520631a
Use SrcBytes in more places
rtfeldman Sep 9, 2025
d385ecc
Make SmallDec.parse return nullable
rtfeldman Sep 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
92 changes: 92 additions & 0 deletions src/COMPILER_ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Roc's Compiler Architecture

## Front-End

The _front-end_ of the compiler is the steps that happen when you run `roc check`,
with the exception of error reporting and compile-time evaluation of constants.
(Errors are _gathered_ by the front-end, they just aren't _reported_ by it.)

### Modules

Modules are the fundamental unit of compilation in Roc. We process and cache things
at a module level, and when a module needs to refer to something from another module,
an orchestrator outside the front-end will take care of copying all the relevant
information from that module (names, inferred types, etc.) into this module's local
data before proceeding to work on this module.

This architecture allows for module-level caching, and incremental compilation where
we only need to rebuild a minimum of one module (and possibly modules that depend on it).
It also means that we can use 32-bit identifiers for everything without worrying about
scaling issues, since even huge modules won't have billions of AST nodes (or types),
and even huge packages and applications won't have billions of modules inside them.

If cyclic module dependencies were allowed in this architecture:
* 32-bit identifiers might be too small on huge code bases, because module cycles would need to be combined into "mega-modules" behind the scenes and compiled as one unit, all sharing one giant 32-bit type ID space
* It would be easy to accidentally organize large Roc programs in a way that made module-level caching impossible (as large cycles would all have to be cached as one unit), potentially leading to bad compile times despite Roc's caching capabilities.
* There is obviously a convenience downside to disallowing module cycles, but at the same time, cyclic module errors sometimes reveal unintended coupling between two modules that results in a positive code architecural change even when build times aren't considered.

### Front-End Compilation Steps

The compilation steps in the front-end are:

1. Hashing: Source bytes -> BLAKE3 hash
* Use BLAKE3 to get a 256-bit hash of the raw source bytes (we use this later)
* Use SIMD to validate that the source file is UTF-8 (emit errors for invalid UTF-8)
2. Lexing: Source bytes -> Flat stream of tokens (with associated source regions)
* Intern strings
* Compact numbers
* Emit errors for invalid tokens (e.g. `0xZZ`)
3. Parsing: Tokens -> Tree (specifically AST)
* Resolve operator precedence (using Pratt parsing)
* Emit errors for invalid token sequences (e.g. `a + / ! b`)
4. Canonicalization: AST -> CIR
* Recategorize AST nodes as expressions, statements, patterns, or types
* Resolve lookups to the correct pattern (or else emit an error) using scoping rules
5. Inference: CIR -> CIR + Types
* Populate a database of types, where each CIR has a corresponding db entry
* Set initial types based on CIR nodes, as well as "symlinking" some types to others
* Use type unification to resolve all types based on symlinks, instantiation, etc.
* Run "occurs" checks to give errors for cyclic types instead of causing infinite loops

Once this process is complete, we have:
* BLAKE3 hash of the source bytes
* CIR tree representing the structure of the module
* Types database representing the types of each CIR node

### Caching Front-End Work on Disk

TODO: describe how caching works

## Interpreter

Once we've finished the front-end work, we can run an interpreter on the CIR + types,
either to run the entire program (e.g. in an unoptimized debug build) or just when
doing compile-time evaluation of constants within the Roc program itself, as part of
`roc check`.

TODO: describe how the interpreter works, especially with polymorphic function calls

### Running the interpreter in a host

TODO: describe how elaborate dance we do to get the shared memory in without polluting env vars, while dodging macOS security countermeasures.

## Optimizing Back-End

IMPORTANT NOTE: The following is the *plan* for the optimizing back-end, but it is not
yet implemented!

### Compilation Steps for Optimizing Back-End

The compilation steps in the optimizing back-end are: (or rather, are planned to be once we implement it)

1. Monomorphization: CIR -> MIR
* Convert polymorphic function calls into calls to monomorphic specializations of functions
* Insert reference counting instructions where appropriate
2. Lambda Set Inference: MIR -> MIR + Lambda Set Types
3. Lambda Set Monomorphization: MIR -> MIR + Monomorphic Lambda Set Types
4. LLVM IR Generation: MIR + Monomorphic Lambda Set Types -> LLVM IR
5. Machine Code Generation: LLVM IR -> Machine Code

At some point we plan to introduce our own optimization in here, on top of the ones LLVM
does. However, we're not yet sure where in these steps it will go, so I haven't
included it.
Loading
Loading