New lexed tokens must be added to
token_kind.def. CARBON_SYMBOL_TOKEN
and
CARBON_KEYWORD_TOKEN
both provide some built-in lexing logic, while
CARBON_TOKEN
requires custom lexing support.
TokenizedBuffer::Lex is the main dispatch for lexing, and calls that need to do custom lexing will be dispatched there.
A parser feature will have state transitions that produce new parse nodes.
The resulting parse nodes are in
parse/node_kind.def and
typed_nodes.h. When choosing node structure,
consider how semantics will process it in post-order; this will rule out some
designs. Adding a parse node kind will also require a handler in the Check
step.
The state transitions are in parse/state.def. Each
CARBON_PARSER_STATE
defines a distinct state and has comments for state
transitions. If several states should share handling, name them
FeatureAsVariant
.
Adding a state requires adding a Handle<name>
function in an appropriate
parse/handle_*.cpp
file, possibly a new file. The macros are used to generate
declarations in the header, so only extra helper functions should be added
there. Every state handler pops the state from the stack before any other
processing.
As of #3534:
TODO: Convert this chart to Mermaid.
-
common/enum_base.h defines the
EnumBase
CRTP class extendingPrintable
from common/ostream.h, along withCARBON_ENUM
macros for making enumerations -
parse/node_kind.h includes common/enum_base.h and defines an enumeration
NodeKind
, along with bitmask enumNodeCategory
.-
The
NodeKind
enumeration is populated with the list of all parse node kinds using parse/node_kind.def (using the .def file idiom) declared in this file using a macro from common/enum_base.h -
NodeKind
has a member typeNodeKind::Definition
that extendsNodeKind
and adds aNodeCategory
field (and others in the future). -
NodeKind
has a methodDefine
for creating aNodeKind::Definition
with the same enumerant value, plus values for the other fields. -
HasKindMember<T>
at the bottom of parse/node_kind.h uses field detection to determine if the typeT
has aNodeKind::Definition Kind
static constant member.- Note: both the type and name of these fields must match exactly.
-
Note that additional information is needed to define the
category()
method (and other methods in the future) ofNodeKind
. This information comes from the typed parse node definitions in parse/typed_nodes.h (described below).
-
-
parse/node_ids.h defines a number of types that store a node id that identifies a node in the parse tree
-
NodeId
stores a node id with no restrictions -
NodeIdForKind<Kind>
inherits fromNodeId
and stores the id of a node that must have the specifiedNodeKind
"Kind
". Note that this is not used directly, instead aliasesFooId
forNodeIdForKind<NodeKind::Foo>
are defined for every node kind using parse/node_kind.def (using the .def file idiom). -
NodeIdInCategory<Category>
inherits fromNodeId
and stores the id of a node that must overlap the specifiedNodeCategory
"Category
". Note that this is not typically used directly, instead this file defines aliasesAnyDeclId
,AnyExprId
, ...,AnyStatementId
. -
Similarly
NodeIdOneOf<T, U>
andNodeIdNot<V>
inherit fromNodeId
and stores the id of a node restricted to either matchingT::Kind
orU::Kind
or not matchingV::Kind
. -
In addition to the node id type definitions above, the struct
NodeForId<T>
is declared but not defined.
-
-
parse/typed_nodes.h defines a typed parse node struct type for each kind of parse node.
-
Each one defines a static constant named
Kind
that is set using a call toDefine()
on the corresponding enumerant member ofNodeKind
from parse/node_kind.h (which is included by this file). -
The fields of these types specify the children of the parse node using the types from parse/node_ids.h.
-
The struct
NodeForId<T>
that is declared in parse/node_ids.h is defined in this file such thatNodeForId<FooId>::TypedNode
is theFoo
typed parse node struct type. -
This file will fail to compile unless every kind of parse node kind defined in parse/node_kind.def has a corresponding struct type in this file.
-
-
parse/node_kind.cpp includes both parse/node_kind.h and parse/typed_nodes.h
-
Uses the macro from common/enum_base.h, the enumerants of
NodeKind
are defined using the list of parse node kinds from parse/node_kind.def (using the .def file idiom). -
NodeKind::definition()
is defined. It has a static table ofconst NodeKind::Definition*
indexed by the enum value, populated by taking the address of theKind
member of each typed parse node struct type, using the list from parse/node_kind.def. -
NodeKind::category()
is defined usingNodeKind::definition()
. -
Tested assumption: the tables built in this file are indexed by the enum values. We rely on the fact that we get the parse node kinds in the same order by consistently using parse/node_kind.def.
-
-
parse/tree.h includes parse/node_ids.h. It does not depend on parse/typed_nodes.h to reduce compilation time in those files that don't use the typed parse node struct types.
-
Defines
Tree::Extract
... functions that take a node id and return a typed parse node struct type from parse/typed_nodes.h. -
Uses
HasKindMember<T>
to restrict callingExtractAs
except on typed nodes defined in parse/typed_nodes.h. -
Tree::Extract
usesNodeForId<T>
to get the corresponding typed parse node struct type for aFooId
type defined in parse/node_ids.h.- Note that this is done without a dependency on the typed parse node
struct types by using the forward declaration of
NodeForId<T>
from parse/node_ids.h.
- Note that this is done without a dependency on the typed parse node
struct types by using the forward declaration of
-
The
Tree::Extract
... functions ultimately callTree::TryExtractNodeFromChildren<T>
, which is a templated function only declared in this file. Its definition is in parse/extract.cpp.
-
-
parse/extract.cpp includes parse/tree.h and parse/typed_nodes.h
-
Defines struct
Extractable<T>
that defines how to extract a field of typeT
from aTree::SiblingIterator
pointing at the corresponding child node. -
Extractable<T>
is defined for the node id types defined in parse/node_ids.h. -
In addition,
Extractable<T>
is defined for standard typesstd::optional<U>
andllvm::SmallVector<V>
, to support optional and repeated children. -
Uses struct reflection to support aggregate struct types containing extractable fields. This is used to support typed parse node struct types as well as struct fields that they contain.
-
Uses
HasKindMember<Foo>
to detect accidental uses of a parse node type directly as fields of typed parse node struct types -- in those placesFooId
should be used instead. -
Defines
Tree::TryExtractNodeFromChildren<T>
and explicitly instantiates it for every typed parse node struct type defined in parse/typed_nodes.h using parse/node_kind.def (using the .def file idiom). By explicitly instantiating this function only in this file, we avoid redundant compilation work, which reduces build times, and allow us to keep all the extraction machinery as a private implementation detail of this file.
-
-
parse/typed_nodes_test.cpp validates that each typed parse node struct type has a static
Kind
member that defines the correct correspondingNodeKind
, and that thecategory()
function agrees between theNodeKind
andNodeKind::Definition
.
Note: this is broadly similar to SemIR typed instruction metadata implementation.
Each parse node kind requires adding a Handle<kind>
function in a
check/handle_*.cpp
file.
If the resulting SemIR needs a new instruction:
-
add a new kind to sem_ir/inst_kind.def
- Add a
CARBON_SEM_IR_INST_KIND(NewInstKindName)
line in alphabetical order
- Add a
-
a new struct definition to sem_ir/typed_insts.h, such as:
struct NewInstKindName { static constexpr auto Kind = InstKind::NewInstKindName.Define( // the name used in textual IR "new_inst_kind_name" // Optional: , TerminatorKind::KindOfTerminator ); // Optional: omit if not associated with a parse node. Parse::Node parse_node; // Optional: omit if this sem_ir instruction does not produce a value. TypeId type_id; // 0-2 id fields, with types from sem_ir/ids.h or sem_ir/builtin_kind.h // For example, fields would look like: StringId name_id; InstId value_id; };
Adding an instruction will also require a handler in the Lower step.
Most new instructions will automatically be formatted reasonably by the SemIR formatter.
If the resulting SemIR needs a new built-in, add it to builtin_inst_kind.def.
How does this work? As of #3310:
TODO: Convert this chart to Mermaid.
-
common/enum_base.h defines the
EnumBase
CRTP class extendingPrintable
from common/ostream.h, along withCARBON_ENUM
macros for making enumerations -
sem_ir/inst_kind.h includes common/enum_base.h and defines an enumeration
InstKind
, along withInstValueKind
andTerminatorKind
.-
The
InstKind
enumeration is populated with the list of all instruction kinds using sem_ir/inst_kind.def (using the .def file idiom) declared in this file using a macro from common/enum_base.h -
InstKind
has a member typeInstKind::Definition
that extendsInstKind
and adds their_name
string field, and aTerminatorKind
field. -
InstKind
has a methodDefine
for creating aInstKind::Definition
with the same enumerant value, plus values for the other fields.
-
-
Note that additional information is needed to define the
ir_name()
,value_kind()
, andterminator_kind()
methods ofInstKind
. This information comes from the typed instruction definitions in sem_ir/typed_insts.h. -
sem_ir/typed_insts.h defines a typed instruction struct type for each kind of SemIR instruction, as described above.
- Each one defines a static constant named
Kind
that is set using a call toDefine()
on the corresponding enumerant member ofInstKind
from sem_ir/inst_kind.h (which is included by this file).
- Each one defines a static constant named
-
HasParseNodeMember<TypedInst>
andHasTypeIdMember<TypedInst>
at the bottom of sem_ir/typed_insts.h use field detection to determine ifTypedInst
has aParse::Node parse_node
or aTypeId type_id
field respectively.- Note: both the type and name of these fields must match exactly.
-
sem_ir/inst_kind.cpp includes both sem_ir/inst_kind.h and sem_ir/typed_insts.h
-
Uses the macro from common/enum_base.h, the enumerants of
InstKind
are defined using the list of instruction kinds from sem_ir/inst_kind.def (using the .def file idiom) -
InstKind::value_kind()
is defined. It has a static table ofInstValueKind
values indexed by the enum value, populated by applyingHasTypeIdMember
from sem_ir/typed_insts.h to every instruction kind by using the list from sem_ir/inst_kind.def. -
InstKind::definition()
is defined. It has a static table ofconst InstKind::Definition*
indexed by the enum value, populated by taking the address of theKind
member of eachTypedInst
, using the list from sem_ir/inst_kind.def. -
InstKind::ir_name()
andInstKind::terminator_kind()
are defined usingInstKind::definition()
. -
Tested assumption: the tables built in this file are indexed by the enum values. We rely on the fact that we get the instruction kinds in the same order by consistently using sem_ir/inst_kind.def.
-
This file will fail to compile unless every kind of SemIR instruction defined in sem_ir/inst_kind.def has a corresponding struct type in sem_ir/typed_insts.h.
-
-
TypedInstArgsInfo<TypedInst>
defined in sem_ir/inst.h uses struct reflection to determine the other fields fromTypedInst
. It skips theparse_node
andtype_id
fields usingHasParseNodeMember<TypedInst>
andHasTypeIdMember<TypedInst>
.- Tested assumption: the
parse_node
andtype_id
are the first fields inTypedInst
, and there are at most two more fields.
- Tested assumption: the
-
sem_ir/inst.h defines templated conversions between
Inst
and each of the typed instruction structs:-
Uses
TypedInstArgsInfo<TypedInst>
,HasParseNodeMember<TypedInst>
, andHasTypeIdMember<TypedInst>
, and local lambda. -
Defines a templated
ToRaw
function that converts the various id field types to anint32_t
. -
Defines a templated
FromRaw<T>
function that converts anint32_t
toT
to perform the opposite conversion. -
Tested assumption: The
parse_node
field is first, when present, and thetype_id
is next, when present, in eachTypedInst
struct type.
-
-
The "tested assumptions" above are all tested by sem_ir/typed_insts_test.cpp
Each SemIR instruction requires adding a Handle<kind>
function in a
lower/handle_*.cpp
file.
Tests are run in bulk as bazel test //toolchain/...
. Many tests are using the
file_test infrastructure; see
testing/file_test/README.md for information.
There are several supported ways to run Carbon on a given test file. For
example, with toolchain/parse/testdata/basics/empty.carbon
:
bazel test //toolchain/testing:file_test --test_arg=--file_tests=toolchain/parse/testdata/basics/empty.carbon
- Executes an individual test.
bazel run //toolchain/parse:testdata/basics/empty.carbon.run
- Runs
carbon
on the file with standard arguments, printing output to console. - This form will often be most useful when iterating over a specific test.
- Runs
bazel run //toolchain/parse:testdata/basics/empty.carbon.verbose
- Similar to the previous command, but with the
-v
flag implied.
- Similar to the previous command, but with the
bazel run //toolchain/driver:carbon -- compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon
- Explicitly runs
carbon
with the provided arguments.
- Explicitly runs
bazel-bin/toolchain/driver/carbon compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon
- Similar to the previous command, but without using
bazel
.
- Similar to the previous command, but without using
The toolchain/autoupdate_testdata.py
script can be used to update output. It
invokes the file_test
autoupdate support. See
testing/file_test/README.md for file syntax.
Using autoupdate_testdata.py
can be useful to produce deltas during the
development process because it allows git status
and git diff
to be used to
examine what changed.
The -v
flag can be passed to trace state, and should be specified before the
subcommand name: carbon -v compile ...
. CARBON_VLOG
is used to print output
in this mode. There is currently no control over the degree of verbosity.
While the iterative processing pattern means function stack traces will have
minimal context for how the current function is reached, we use LLVM's
PrettyStackTrace
to include details about the state stack. The state stack
will be above the function stack in crash output.