Skip to content

Latest commit

 

History

History
129 lines (80 loc) · 8.79 KB

README.md

File metadata and controls

129 lines (80 loc) · 8.79 KB

Qamar

A while ago I went through the first part of Crafting Interpreters and implemented Lox in Java as the book did and shared it on GitHub in MyLox Repository, which inspired me to create a similar programming language.

Note: I changed the name of this repository multiple times, I wanted to call it Onyx but the name was taken so I went with Qamar.

About This Repository

In this repository, I will be going through the second part of the book, and try to implement my own programming language and document my journey.

My Current Progress

Chunks of Bytecode

So far, I've implemented the Bytecode along with a couple of instructions. Additionally, I created a simple disassembler that takes those chunks and disassembles them for debugging purposes. This photo shows a hardcoded example in main for the disassembly of the instructions:

image

A Virtual Machine

I have also developed the Virtual Machine upon which the interpreter operates. Additionally, I have introduced new Bytecode instructions supported by a disassembler and integrated them into the interpreter loop. As a result, I am now able to evaluate complex arithmetic expressions seamlessly. This screenshot shows a disassembled chunk of bytecode calculating the expression -((1.2 + 3.4) / 5.6)

image

Scanning (Lexing)

After completing a significant portion of the backend for the compiler, I have now begun working on the frontend. Today, I focused on the initial component essential for a compiler's frontend: the Scanner (also known as the Lexer). I successfully implemented all the lexical grammars, resulting in an artifact capable of processing source code and generating corresponding tokens.

This screenshot shows the output of the lexer where:

  • 1 is the line number
  • | means that this token is also at line 1
  • 36 being TOKEN_VAR defined in the TokenType Enum in scanner.h
  • 19, 13, 21, 10, 8, and 39 being TOKEN_IDENTIFIER, TOKEN_EQUAL, TOKEN_NUMBER, TOKEN_STAR, TOKEN_SEMICOLON, and TOKEN_EOF respectively

image

Compiling Expressions

Today, I completed the final segment of the VM's execution pipeline and developed a compiler capable of parsing high-level source code using Vaughan Pratt’s “top-down operator precedence parsing” technique. The compiler outputs a series of low-level instructions (Bytecode instructions) that can be executed by the VM. The types of instructions I can parse and execute include:

  • Number literals
  • Parentheses for grouping
  • Unary Expressions
  • The Four Horsemen of the Arithmetic +, -, *, /

So basically now I've got myself an over-engineered calculator

image

New Datatypes and Operators

So far, our language has only had one data type (double), represented as the typedef Value. However, in the past few days, I've been working on implementing new data types. I successfully added three new data types: NIL, BOOL, and STRING. Additionally, I've included some helper macros to facilitate type-checking and conversion between the language's representation of these data types and their C counterparts.

image

Additionally, I've introduced several new operators to handle operations associated with the new data types. These include logical operators such as <, <=, >, and ==, among others. Furthermore, I've implemented string concatenation using the + operator.

image

Global Variables

After working on the hash table implementation for the language, I utilized it to implement and support global variables. This allows us to define variables using the keyword var, assign them values, and subsequently modify their values. The retrieval of variables is entirely facilitated through the hash table mechanism.

image

Local Variables & Lexical Scoping

So far, all variables supported by the language have been global. Recently, I focused on extending the language to incorporate blocks, block scopes, and local variables. Additionally, I introduced a completely new runtime representation for variables using the new struct Compiler that keeps track of the scope depth, and the number of the local variables in that scope.

So given this example test program provided in block.qmr

image

If we run the code it will produce the following output as expected

image

Our programming language is finally starting to look like a legitimate programming language.

if & if else Statements

I don't feel like typing a lot now, but I implemented if statements and if else, so given this program in if_statement.qmr:

image

It produces the following output:

image

while loops

There's not much to explain here except for the fact that we've added a new feature: looping, specifically while loops, along with a new test file that contains a code snippet to verify its functionality while_loop.qmr that counts down from 10 to 1

image

It produces the following output:

image

for loops

This is an example of a for loop that prints even numbers from 1 through 10 provided in for_loop.qmr:

image

It produces the following output:

image

With all of these control structures in place, Onyx is finally considered to be a Turing-complete programming language.

Functions & Recursion

I have implemented support for defining and calling functions. Additionally, I have incorporated Native Functions, which are functions defined in C that can be accessed by users (similar to standard library functions). As an example, I have implemented a clock function that provides the elapsed time since the program started. The following example is provided in fib.qmr:

image

Running it produces the following:

image

More Native Functions

Given this program in native.qmr:

image

It produces the following output:

image

Closures & Upvalues

In Qamar, closures are functions that can capture and retain references to variables from their surrounding scope, enabling them to operate on these variables even after their outer functions have completed execution. Upvalues are the captured variables themselves, which are bound to a closure. Each closure maintains its own set of upvalues, allowing it to preserve the environment in which it was defined. Given this example from closures.qmr:

image

It produces the following output:

image