Halcyon is a work-in-progress compiler for a large subset of C, written in Haskell. It targets the x86_64 instruction set architecture. This project focuses on implementing the core compiler functionality while leveraging existing system tools for preprocessing, assembly, and linking.
The compiler currently handles C programs with unary operators and integer constants. For example:
int main(void) {
return ~(-42);
}
The compiler processes source code through the following stages:
- Lexical Analysis: Breaks source code into a sequence of tokens
- Parsing: Converts tokens into an Abstract Syntax Tree (AST)
- TACKY Generation: Transforms AST into TACKY intermediate representation
- Code Generation: Transforms AST into x86_64 assembly
- Code Emission: Outputs the assembly code to an executable
Programs are represented internally using a series of increasingly lower-level data structures:
- Abstract Syntax Tree (AST):
data Program = Program Function
data Function = Function
{ name :: Text
, body :: Statement
}
data Statement = Return Expr
data Expr
= Constant { value :: Int }
| Unary { operator :: UnaryOp, operand :: Expr }
data UnaryOp = Complement | Negate
- TACKY IR:
data Program = Program Function
data Function = Function
{ name :: Text
, body :: [Instruction]
}
data Instruction
= Return { value :: Val }
| Unary { operator :: UnaryOp, src :: Val, dst :: Val }
data Val = Constant Int | Var Text
data UnaryOp = Complement | Negate
- Assembly AST:
data Program = Program Function
data Function = Function
{ name :: Text
, instructions :: [Instruction]
}
data Instruction
= Mov { src :: Operand, dst :: Operand }
| Unary { operator :: UnaryOp, operand :: Operand }
| AllocateStack { bytes :: Int }
| Ret
data Operand
= Imm Int
| Register Reg
| Pseudo Text
| Stack Int
data UnaryOp = Neg | Not
data Reg = Ax | R10
.
├── app/ # Application entry point
│ └── Main.hs
├── bin/ # Binary outputs
├── lib/ # Main library code
│ ├── Halcyon.hs # Library entry point
│ └── Halcyon/ # Core modules
│ ├── Backend/ # Code generation and emission
│ │ ├── Codegen.hs # TACKY to Assembly conversion
│ │ ├── Emit.hs # Assembly to text output
│ │ └── ReplacePseudos.hs # Register/stack allocation
│ ├── Core/ # Core data types and utilities
│ │ ├── Assembly.hs # Assembly representation
│ │ ├── Ast.hs # C language AST
│ │ ├── Monad.hs # Compiler monad stack
│ │ ├── Settings.hs # Compiler settings and types
│ │ ├── Tacky.hs # TACKY IR definition
│ │ └── TackyGen.hs # AST to TACKY transformation
│ ├── Driver/ # Compiler driver
│ │ ├── Cli.hs # Command line interface
│ │ └── Pipeline.hs # Compilation pipeline
│ └── Frontend/ # Parsing and analysis
│ ├── Lexer.hs # Lexical analysis
│ ├── Parse.hs # Parsing
│ └── Tokens.hs # Token definitions
├── test/ # Test suite
│ ├── Main.hs
│ └── Test/
│ ├── Lexer.hs
│ ├── Parser.hs
│ ├── Tacky.hs
│ ├── Assembly.hs
│ ├── Pipeline.hs
│ └── Common.hs
├── CHANGELOG.md # Version history
├── LICENSE # Project license
├── README.md # Project documentation
├── flake.nix # Nix build configuration
└── halcyon.cabal # Cabal build configuration
The compiler uses a monad transformer stack to handle IO operations and error management:
newtype CompilerT m a = CompilerT
{ unCompilerT :: ExceptT CompilerError m a }
type Compiler = CompilerT IO
This provides:
- Error handling through
ExceptT
- IO capabilities through the underlying monad
- Clean separation of pure and effectful code
- Structured error reporting and recovery
halcyon [OPTIONS] FILE
Options:
--lex Run lexical analysis only
--parse Run parsing only
--codegen Run through code generation
--tacky Run through TACKY generation
-S Stop after assembly generation
-h,--help Show help text
# Build the project
cabal build
# Run the compiler
cabal run halcyon -- [OPTIONS] input.c
# Example: Compile a file
cabal run halcyon -- input.c
# Example: Run only the lexer
cabal run halcyon -- --lex input.c
Halcyon uses Hspec and Tasty for its test suite. The tests cover all stages of compilation:
# Run all tests
cabal test
# Run tests with output
cabal test --test-show-details=direct
# Run a specific test module
cabal test --test-pattern "Lexer"
The test suite includes:
- Unit tests for each compiler stage
- Integration tests for the full pipeline
- Helper utilities for building test cases
Tests are organized by compiler stage in test/Test/
:
Lexer.hs
: Token generationParser.hs
: AST constructionTacky.hs
: TACKY IR generationAssembly.hs
: Assembly generationPipeline.hs
: Full compilation pipelineCommon.hs
: Shared test utilities
Halcyon relies on the following system tools:
- GCC: For preprocessing C source files (
gcc -E
) - Assembler: For converting assembly to object files
- Linker: For producing final executables
Make sure these tools are installed and available in your system path.
The compiler provides detailed error reporting for:
- Lexical errors (invalid characters, malformed numbers)
- Syntax errors (invalid program structure)
- Semantic errors (coming soon)
- System errors (file I/O, external tool failures)
- A minimal compiler
- Unary operators
- Binary operators
- Logical and relational operators
- Local variables
- if statements and conditional expressions
- Compound statements
- Loops
- Functions
- File scope variable declarations and storage-class specifiers
- Long integers
- Unsigned integers
- Floating-point numbers
- Pointers
- Arrays and pointer arithmetic
- Characters and strings
- Supporting dynamic memory
- Structures
- Optimizing TACKY programs
- Register Allocations
This is a personal learning project following the book "Writing a C Compiler" by Nora Sandler. While it's not currently open for contributions, feel free to use it as a reference for your own compiler projects.