langcc
can serve as a replacement for
lex+yacc, but is much more powerful. langcc
enables a completely new paradigm for developing programming languages based on iteratively improving the intuitive declarative specification of your language instead of laboriously updating your hand-written compiler frontend for every grammar change.
langcc
generates efficient, linear-time parsers for an extremely broad class of grammars (a strict superset of canonical LR). In particular, the project includes complete specifications of Python 3.9.12 (grammars/py.lang) and Golang 1.17.8 (grammars/go.lang), and generates parsers for each language that are faster than their respective standard parsers (resp., 1.2x and 4.3x faster).langcc
requires only a single input file in its declarative.lang
format to generate a full compiler frontend, including AST struct definitions and associated traversals, hashing, pretty-printing, and automatic integration with the generated parser (no need to sprinkle C++ code into your grammar as in lex+yacc).langcc
provides a built-in "conflict tracing" algorithm which traces LR conflicts back to "confusing input pairs", which provide explicit example strings instead of opaque shift/reduce conflicts.langcc
extends the existing LR paradigm to include both recursive-descent parsing actions (which result in much smaller and more intuitive automata) and also per-symbol attribute constraints (which provide support for many industrial language constructs).langcc
implements a novel transformation for LR grammars we call "continuation-passing style", which significantly expands the class of grammars the tool supports.- Finally,
langcc
ships with a standalone datatype compiler calleddatacc
, which generates full C++ implementations of algebraic datatypes (including sum types) from a simple declarative language whose spec is provided in data.lang.
langcc
is self-hosting. The file
grammars/meta.lang contains the
specification for the "language of languages" in which langcc
expects .lang
files to be written.
In this canonical implementation, the file
bootstrap.sh uses langcc
to
generate its own compiler front-end.
langcc
is stable and feature-complete. It is ready to be used as a research prototype to facilitate
rapid exploration of new compilers and programming languages. It is free and open-source and available under
the Apache 2.0 License.
langcc
is also described in the following companion technical reports, which provide the theoretical basis for
its functionality.
- Zimmerman, Joe. Practical LR Parser Generation. arXiv, 2022.
- Zimmerman, Joe. langcc: A Next-Generation Compiler Compiler. arXiv, 2022.
This build has been verified to work on Ubuntu 22.04 and MacOS 12.5.
For Ubuntu 22.04:
./deps_ubuntu.sh
make -j8
sudo make install
For macOS 12.5 (requires Homebrew):
./deps_macos.sh
make -j8
sudo make install
And, in order to bootstrap the langcc
front-end itself, subsequently run:
./bootstrap.sh
Once langcc
(and its companion datacc
) have been built, you can run the various provided examples:
- The most fully featured example is in
examples/lox
, which includes a complete interpreter for a full scripting language called Lox designed by Robert Nystrom at Crafting Interpreters. The Lox example is explored in detail in the user manual and is the recommended way to learn aboutlangcc
's feature set. examples/basic
illustrates the basic syntax of a declarative language spec, whileexamples/calc
implements a calculator for arithmetic expressions, including user-friendly error reporting.grammars/go.lang
andgrammars/py.lang
provide complete specifications for the syntax of Golang 1.17.8 and Python 3.9.12, respectively. They showcase the full feature set oflangcc
and are compiled during the main build process itself, which produces testsbuild/go_standalone_test
andbuild/py_standalone_test
. (Note: These binaries require repositories for Golang 1.17.8 and Python 3.9.12 located in the directories../go
and../cpython
.)- The language of datatypes,
grammars/data.lang
, describes the input format expected by the standalone datatype compilerdatacc
for specifying algebraic datatypes. - Finally, the language of languages itself,
grammars/meta.lang
, describes the input format expected bylangcc
. This language also serves as basic documentation, as it enumerates all of its own features.
For full documentation, please see the user manual.