Skip to content

Commit

Permalink
Polish English Tutorial
Browse files Browse the repository at this point in the history
Polish the English text of the following tutorial chapters:

* `tutorial/en/0-Preface.md`
* `tutorial/en/1-Skeleton.md`
* `tutorial/en/2-Virtual-Machine.md`

Improve markdown docs:

* Add Level 1 title to docs ("Preface", "1. Skeleton", etc.).
* Replaces smiles with markdown emojis.
* Fix titles casing using Chicago Manual of Style capitalization rules:
  https://capitalizemytitle.com/style/chicago#
* Convert inline-link to reference-style links (DRY!).
* Add a few links to external references.
  • Loading branch information
tajmone committed Mar 14, 2021
1 parent f9f633e commit a77ef6d
Show file tree
Hide file tree
Showing 3 changed files with 325 additions and 264 deletions.
166 changes: 94 additions & 72 deletions tutorial/en/0-Preface.md
Original file line number Diff line number Diff line change
@@ -1,104 +1,126 @@
This series of articles is a tutorial for building a C compiler from scratch.
# Preface

I lied a little in the above sentence: it is actually an _interpreter_ instead
of _compiler_. I lied because what the hell is a "C interpreter"? You will
however, understand compilers better by building an interpreter.
This is multi-part tutorial on how to build a C compiler from scratch.

Yeah, I wish you can get a basic understanding of how a compiler is
constructed, and realize it is not that hard to build one. Good Luck!
Well, I lied a little in the previous sentence: it's actually an _interpreter_,
not a _compiler_. I had to lie, because what on earth is a "C interpreter"?
You will however gain a better understanding of compilers by building an
interpreter.

Finally, this series is written in Chinese in the first place, feel free to
correct me if you are confused by my English. And I would like it very much if
you could teach me some "native" English :)
Yeah, I want to provide you with a basic understanding of how a compiler is
constructed, and realize that it's not that hard to build one, after all.
Good Luck!

We won't write any code in this chapter, feel free to skip it if you are
desperate to see some code...
This tutorial was originally written in Chinese, so feel free to correct me if
you're confused by my English. Also, I would really appreciate it if you could
teach me some "native" English. :smile:

## Why you should care about compiler theory?
We won't be writing any code in this chapter; so if you're eager to see some code, feel free to skip it.

Because it is **COOL**!

And it is very useful. Programs are built to do something for us, when they
are used to translate some forms of data into another form, we can call them
a compiler. Thus by learning some compiler theory we are trying to master a very
powerful technique of solving problems. Isn't that cool enough to you?
## Why Should I Care about Compiler Theory?

Because it's **COOL**!

And it's also very useful. Programs are designed to do something for us; when
they are used to translate some form of data into another form, we can call
them compilers. Thus, by learning some compiler theory, we are trying to
master a very powerful problem solving technique. Doesn't this sound cool
enough to you?

People used to say that understanding how a compiler works would help you to
write better code. Some would argue that modern compilers are so good at
optimizing that you shouldn't care any more. Well, that's true, most people
don't need to learn compiler theory to improve code performance — and by "most
people" I mean _you_!

People used to say understanding how a compiler works would help you to write
better code. Some would argue that modern compilers are so good at
optimization that you should not care any more. Well, that's true, most people
don't need to learn compiler theory only to improve the efficency of the code.
And by most people, I mean you!

## We Don't Like Theory Either

I have always been in awe of compiler theory because that's what makes
programing easy. Anyway can you imaging building a web browser in only
assembly language? So when I got a chance to learn compiler theory in college,
I was so excited! And then... I quit, not understanding what that it.
I've always been in awe of compiler theory because that's what makes programing
easy. Anyway, can you imagine building a web browser entirely in assembly
language? So when I got a chance to learn compiler theory in college, I was so
excited! And then ... I quit! And left without understanding what it's all
about.

Normally a course of compiler will cover:
Normally compiler course covers the following topics:

1. How to represent syntax (such as BNF, etc.)
2. Lexer, with somewhat NFA(Nondeterministic Finite Automata),
DFA(Deterministic Finite Automata).
3. Parser, such as recursive descent, LL(k), LALR, etc.
1. How to represent syntaxes (i.e. BNF, etc.)
2. Lexers, using NFA (Nondeterministic Finite Automata) and
DFA (Deterministic Finite Automata).
3. Parsers, such as recursive descent, LL(k), LALR, etc.
4. Intermediate Languages.
5. Code generation.
6. Code optimization.

Perhaps more than 90% students will not care anything beyond the parser, and
what's more, we still don't know how to build a compiler! Even after all the
effort learning the theories. Well the main reason is that what "Compiler
Thoery" trys to teach is "How to build a parser generator", namely a tool that
consumes syntax gramer and generates a compiler for you. lex/yacc or
flex/bison or things like that.
Perhaps more than 90% of the students won't really care about any of that,
except for the parser, and what's more, we'd still won't know how to actually
build a compiler! even after all the effort of learning the theory. Well, the
main reason is that what "Compiler Theory" tries to teach is "how to build a
parser generator" — i.e. a tool that consumes a syntax grammar and generates a
compiler for you, like lex/yacc or flex/bison, or similar tools.

These theories try to teach us how to solve the general challenges of
generating compilers automatically. Once you've mastered them, you're able to
deal with all kinds of grammars. They are indeed useful in the industry.
Nevertheless, they are too powerful and too complicated for students and most
programmers. If you try to read lex/yacc's source code you'll understand what
I mean.

These theories try to teach us how to solve the general problems of generating
compilers automatically. That means once you've mastered them, you are able to
deal with all kinds of grammars. They are indeed useful in industry.
Nevertheless they are too powerful and too complicated for students and most
programmers. You will understand that if you try to read lex/yacc's source
code.
The good news is that building a compiler can be much simpler than you ever
imagined. I won't lie, it's not easy, but definitely not hard.

Good news is building a compiler can be much simpler than you ever imagined.
I won't lie, not easy, but definitely not hard.

## Birth of this project
## How This Project Began

One day I came across the project [c4](https://github.com/rswier/c4) on
Github. It is a small C interpreter which is claimed to be implemented by only
4 functions. The most amazing part is that it is bootstrapping (that interpret
itself). Also it is done with about 500 lines!
One day I came across the project [c4] on Github, a small C interpreter
claiming to be implemented with only 4 functions. The most amazing part is
that it's [bootstrapping] (i.e. it can interpret itself). Furthermore, it's
being done in around 500 lines of code!

Meanwhile I've read a lot of tutorials about compiler, they are either too
simple(such as implementing a simple calculator) or using automation
tools(such as flex/bison). c4 is however implemented all from scratch. The
sad thing is that it try to be minimal, that makes the code quite a mess, hard
to understand. So I started a new project to:
Meanwhile, I've read many tutorials on compilers design, and found them to be
either too simple (such as implementing a simple calculator) or using
automation tools (such as flex/bison). [C4], however, is implemented entirely
from scratch. The sad thing is that it aims to be "an exercise in minimalism,"
which makes the code quite messy and hard to understand. So I started a new
project, in order to:

1. Implement a working C compiler(interpreter actually)
2. Write a tutorial of how it is built.
1. Implement a working C compiler (an interpreter, actually).
2. Write a step-by-step tutorial on how it was built.

It took me 1 week to re-write it, resulting 1400 lines including comments. The
project is hosted on Github: [Write a C Interpreter](https://github.com/lotabout/write-a-C-interpreter).
It took me one week to re-write it, resulting in 1400 lines of code (including
comments). The project is hosted on Github: [Write a C Interpreter].

Thanks rswier for bringing us a wonderful project!
Thanks [@rswier] for sharing with us [c4], it's such a wonderful project!

## Before you go

Implementing a compiler could be boring and it is hard to debug. So I hope you
can spare enough time studying, as well as type the code. I am sure that you
will feel a great sense of accomplishment just like I do.
## Before You Begin

Implementing a compiler can be boring and hard to debug. So I hope you can
spare enough time studying, and typing code. I'm sure that you will feel a
great sense of accomplishment, just like I do.


## Good Resources

1. [Let’s Build a Compiler](http://compilers.iecc.com/crenshaw/): a very good
tutorial of building a compiler for fresh starters.
2. [Lemon Parser Generator](http://www.hwaci.com/sw/lemon/): the parser
generator that is used in SQLite. Good to read if you want to understand
compiler theory with code.
1. _[Let’s Build a Compiler]_: a very good tutorial of building a compiler,
written for beginners.
2. [Lemon Parser Generator]: the parser generator used by SQLite.
Good to read if you want to understand compiler theory with code.

In the end, I am just a person with a general level of expertise, so there
will inevitably be some mistakes in my articles and code (and also in my
English). Feel free to correct me!

I hope you'll enjoy it.

In the end, I am human with a general level, there will be inevitably wrong
with the articles and codes(also my English). Feel free to correct me!
<!-----------------------------------------------------------------------------
REFERENCE LINKS
------------------------------------------------------------------------------>

Hope you enjoy it.
[@rswier]: https://github.com/rswier "Visit @rswier's GitHub profile"
[bootstrapping]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) "Wikipedia » Bootstrapping (compilers)"
[c4]: https://github.com/rswier/c4 "Visit the c4 repository on GitHub"
[Lemon Parser Generator]: http://www.hwaci.com/sw/lemon/ "Visit Lemon homepage"
[Let’s Build a Compiler]: http://compilers.iecc.com/crenshaw/ "15-part tutorial series, by Jack Crenshaw"
[Write a C Interpreter]: https://github.com/lotabout/write-a-C-interpreter "Visit the 'Write a C Interpreter' repository on GitHub"
129 changes: 72 additions & 57 deletions tutorial/en/1-Skeleton.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,69 @@
In this chapter we will have an overview of the compiler's structure.
# 1. Skeleton

Before we start, I'd like to restress that it is **interperter** that we want
to build. That means we can run a C source file just like a script. It is
chosen mainly for two reasons:
In this chapter we'll present an overview of the compiler's structure.

1. Interpreter differs from Compiler only in code generation phase, thus we'll
still learn all the core techniques of building a compiler(such as lexical
analyzing and parsing).
2. We will build our own virtual machine and assembly instructions, that would
help us to understand how computers work.
Before we start, let me stress again that will be building an **interperter**.
This means we'll be able to run a C source file as if it was a script. The main
reasons behind this choice are twofold:

## Three Phases
1. An interpreter differs from a compiler only in the code generation phase,
thus we'll still learn all the core techniques of building a compiler
(such as lexical analyzing and parsing).
2. We will build our own virtual machine and [assembly instruction set];
this will help us understand how computers work.

Given a source file, normally the compiler will cast three phases of
processing:

1. Lexical Analysis: converts source strings into internal token stream.
2. Parsing: consumes token stream and constructs syntax tree.
3. Code Generation: walk through the syntax tree and generate code for target
platform.
## The Three Phases of Compiling

Compiler Construction had been so mature that part 1 & 2 can be done by
automation tools. For example, flex can be used for lexical analysis, bison
for parsing. They are powerful but do thousands of things behind the scene. In
order to fully understand how to build a compiler, we are going to build them
all from scratch.
Given a source file, the compiler usually carries out three processing phases:

Thus we will build our interpreter in the following steps:
1. **Lexical Analysis**:
converts source strings into an internal stream of tokens.
2. **Parsing**: consumes the tokens stream and constructs a syntax tree.
3. **Code Generation**:
walks through the syntax tree and generates code for target platform.

1. Build our own virtual machine and instruction set. This is the target
platform that will be using in our code generation phase.
2. Build our own lexer for C compiler.
3. Write a recusion descent parser on our own.
Compiler Construction is so mature that phases one and two can be done by
automation tools. For example, flex can be used for lexical analysis, bison for
parsing. These are powerful tools, which do thousands of things behind the
scene. In order to fully understand how to build a compiler, we're going to
handcraft all three phases, from scratch.

## Skeleton of our compiler
Therefore, we'll build our interpreter in the following steps:

1. Build our own virtual machine and instruction set.
This will be our target platform in the code generation phase.
2. Build our own lexer for C compilers.
3. Write a [recursive descent parser] on our own.

Modeling after c4, our compiler includes 4 main functions:

1. `next()` for lexical analysis; get the next token; will ignore spaces tabs
etc.
2. `program()` main entrance for parser.
3. `expression(level)`: parser expression; level will be explained in later
chapter.
4. `eval()`: the entrance for virtual machine; used to interpret target
instructions.
## The Skeleton of Our Compiler

Why would `expression` exist when we have `program` for parser? That's because
the parser for expressions is relatively independent and complex, so we put it
into a single module(function).
Modeled after [c4], our compiler includes four main functions:

The code is as following:
1. `next()`
for lexical analysis; fetches the next token; ignores spaces, tabs, etc.
2. `program()` — parser main entry point.
3. `expression(level)`
expressions parser; it will be explained in a later chapter.
4. `eval()`
virtual machine entry point; used to interpret target instructions.

Why do we need `expression()` when we already have `program()` for the parser?
That's because the expressions parser is relatively independent and complex,
so we put it into a single module (function).

The code is as follows:

```c
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#define int long long // work with 64bit target
#define int long long // work with 64-bit target

int token; // current token
char *src, *old_src; // pointer to source code string;
char *src, *old_src; // pointer to source code string
int poolsize; // default size of text/data/stack
int line; // line number

Expand Down Expand Up @@ -119,34 +122,46 @@ int main(int argc, char **argv)
}
```
That's quite some code for the first chapter of the article. Nevertheless it
is actually simple enough. The code tries to reads in a source file, character
by character and print them out.
That's quite some code for the first chapter of the tutorial. Nevertheless it's
actually quite simple. The code tries to reads a source file, character by
character, and print them out.
Currently the lexer `next()` does nothing but returning the characters as they
are in the source file. The parser `program()` doesn't take care of its job
either, no syntax trees are generated, no target codes are generated.
Currently, the lexer function `next()` does nothing except returning the
characters as they are encountered in the source file. The parser's `program()`
doesn't take care of its job either — it doesn't generate any syntax trees, nor
target code.
The important thing here is to understand the meaning of these functions and
how they are hooked together as they are the skeleton of our interpreter.
We'll fill them out step by step in later chapters.
how they are hooked together, since they constitute the skeleton of our
interpreter. We'll fill them out step by step, in the upcoming chapters.
## Code
## Source Code
The code for this chapter can be downloaded from
[Github](https://github.com/lotabout/write-a-C-interpreter/tree/step-0), or
clone by:
[GitHub](https://github.com/lotabout/write-a-C-interpreter/tree/step-0),
or cloned via:
```
git clone -b step-0 https://github.com/lotabout/write-a-C-interpreter
```
Note that I might fix bugs later, and if there is any incosistance between the
artical and the code branches, follow the article. I would only update code in
the master branch.
> **NOTE** — I might fix bugs later; if you notice any inconsistencies between
the tutorial and the code branches, follow the tutorial. I will only update
code in the master branch.
## Summary
After some boring typing, we have the simplest compiler: a do-nothing
compiler. In next chapter, we will implement the `eval` function, i.e. our own
After some boring typing, we now have the simplest compiler: a do-nothing
compiler. In next chapter, we'll implement the `eval` function, i.e. our own
virtual machine. See you then.
<!-----------------------------------------------------------------------------
REFERENCE LINKS
------------------------------------------------------------------------------>
[assembly instruction set]: https://en.wikipedia.org/wiki/Instruction_set_architecture "Wikipedia » Instruction set architecture"
[c4]: https://github.com/rswier/c4 "Visit the c4 repository on GitHub"
[recursive descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser "Wikipedia » Recursive descent parser"
Loading

0 comments on commit a77ef6d

Please sign in to comment.