gocgo (go-c-go), implemented in Go, includes lexer, parser, and visitors for the C programming language.
This small project started as a demo of ANTLR and the visitor design pattern. It turned out that having a C parser available was valuable for a few other cases on our side, thus, we decided to make it public as it might be of interest to others as well.
gocgo is (currently) suitable for anlaysis of C code. While some transformation is possible, having a flexible transformation API could be added later.
Below is an example of a visitor that collects C code into a buffer and along the way removes function bodies.
type FuncDeleteVisitor struct {
*BaseVisitorImpl
in bool
Buff bytes.Buffer
}
func NewFuncDeleteVisitor() *FuncDeleteVisitor {
return &FuncDeleteVisitor{in: false}
}
func (v *FuncDeleteVisitor) VisitTerminal(n antlr.TerminalNode) error {
v.Buff.WriteString(n.GetSymbol().GetText())
v.Buff.WriteString(" ")
return nil
}
func (v *FuncDeleteVisitor) VisitFunctionDefinition(ctx *parsing.FunctionDefinitionContext) (bool, error) {
v.in = true
return true, nil
}
func (v *FuncDeleteVisitor) VisitFunctionDefinitionEnd(ctx *parsing.FunctionDefinitionContext) error {
v.in = false
return nil
}
func (v *FuncDeleteVisitor) VisitBlockItemList(ctx *parsing.BlockItemListContext) (bool, error) {
if v.in {
return false, nil
} else {
return true, nil
}
}
We provide a complete example of running the aforementioned visitor as a command in this repo. You can run the following:
go run ./cmd/gocgo/gocgo.go cprogs/for.c
You can take a look at the command for details, but it is rather simple code.
Each visitor should embed a pointer to BaseVisitorImpl
, which
provides the default implementation of Visit
methods.
There is a VisitX
method for each type of AST node (e.g.,
VisitFunctionDefinition
for a function definition). Consider this
example:
func (v *FuncDeleteVisitor) VisitFunctionDefinition(ctx *parsing.FunctionDefinitionContext) (bool, error) {
v.in = true
return true, nil
}
This Visit
method is invoked for every function definition in parsed
C code. There are two return values. The first value (bool
) says
if children should be visited. The second value (error
) is the
error
Go interface.
Each VisitX
method has a corresponding VisitXEnd
method invoked
once all children of the node are visited. From our earlier example:
func (v *FuncDeleteVisitor) VisitFunctionDefinitionEnd(ctx *parsing.FunctionDefinitionContext) error {
v.in = false
return nil
}
VisitXEnd
methods have a single return value (error
).
This architecture is common in many tools, e.g., visitors in the Eclipse implementation.
gocgo cuts a few steps for you if you were planning to use ANTLR to obtain a parser for the C programming language.
We used the grammar from the ANTLR repo.
If you wish to reproduce the steps to get lexer, parser, and the original visitors, you can run the following command in the root of this repo:
go generate ./...
We made a few changes to simplify the usage of visitors and address some issues (e.g., Issue 4398). Below is the summary of key changes / additions:
-
Addressed the issue 4398, by introducing (locally in this repo)
VisitChildren
. -
Removed the generated
c_base_visitor.go
, because it limits changing behavior of anyVisit
method. -
Introduced our own visitor interface (
BaseVisitor
) and its implementation (BaseVisitorImpl
) that enable proper struct embedding. Your visitor can now change behavior of (a subset of)Visit
methods. -
Updated API design, such that each
Visit
method returnsbool
anderror
(rather thaninterface{}
). We find the new approach more Go appropriate.
Feel free to get in touch if you have any comments: Milos Gligoric
<[email protected]>
.