Skip to content

An experimental, purely functional language built in Rust using QBE as its backend

Notifications You must be signed in to change notification settings

acquitelol/elle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

₊˚ Elle ♡︎

An experimental, purely functional language built in Rust using QBE as its backend

‎ ‎ ╱|、
(˚ˎ 。7
|、˜〵
じしˍ,)ノ

If you like this project, consider giving it a star!

Hello, World!

Writing a hello world program in Elle is super easy.
Consider this basic example of a Hello World program in Elle:

pub fn main() {
    puts("Hello world!");
}

Let's dissect the code:

  • The pub keyword declares the main function as public/exported

  • The fn keyword declares the identifier as a function

  • The word main defines the function as the entry point of our program.

  • The function call puts is part of the C ABI. It takes the 0th argument and writes it to the standard output.

  • That's it! ✩

Elle uses the QBE compiler backend. This means that files compile into QBE's intermediate language before being executed.

Let's also take a look at the QBE IR source:

export function w $main() {
@start
    %tmp_2 =w call $puts(l $main_1)
    ret
}
data $main_1 = { b "Hello world!", b 0 }
  • The main_1 data segment is used to store the literal string later used in puts

  • The function is exported, as denoted with the export keyword

  • The function returns a w (word), is called main, and uses the $ sigil to denote it is global.

  • The @start directive describes the beginning of the function

  • We then use the call operation and the global puts function with the l (long) data section we stored earlier.

  • The compiler falls back to returning the literal 0 if no specific return value is specified. Therefore, we ret at the end.

  • Simple enough! ♡


If statements

  • An if statement is an expression that evaluates a block if the condition is non-zero, with an optional else block which is evaluated if the condition is zero.

  • You can define an if statement and then an optional else statement

  • If statement expressions can be wrapped in () but this is not mandatory

  • There is currently no else if or similar. A workaround is to just define another if statement with your new condition.

  • Example:

int a = 0; // Variables must be initialized.

if expression {
    a++;
} else {
    a--;
}

While loops

  • A while loop is an expression that evaluates the block specified only if the condition is non-zero, otherwise breaks and continues execution on the primary branch.

  • Even though you can loop via recursion, the while loop primitive may be simpler to understand and use in many cases, therefore it is provided in Elle.

  • While loop expressions can be wrapped in () but this is not mandatory

  • There is no do while or finally functionality at the time of writing this.

  • Example

while expression {
    // do code
}
  • You also have access to block scoped variables inside of this loop. This means you can create a pseudo for loop with the following code:
int i = 0;

while (i < 10) {
    printf("%d\n", i);
    i++;
}

Please keep in mind that you also have access to the break and continue keywords while inside of a loop, which break execution early or continue to the next iteration respectively.


For loops

  • A for loop is an expression that has 3 main parts:
  1. Variable declaration - Declaring an iterator to be used in the loop
  2. Condition - The condition to break out of the loop
  3. Variable step - The amount that the variable should increase on each iteration.

Essentially, the loop creates the variable defined in (1), and evaluates the block specified, aswell as the step in (3), until the condition defined in (2) is zero, when it returns to the main branch and continues execution.

  • For loop expressions can be wrapped in () but this is not mandatory
  • Basic example of a for loop that prints the digits 0-9 to the stdout:
for (int i = 0; i < 10; i++) {
    printf("%d\n", i);
}
  • More advanced example:
fn fact(long n) -> long {
    if n <= 1 {
        return 1;
    }

    return n * fact(n - 1);
}

fn get_e() {
    double res = 0.0;

    for long i = 0; i < 50; i++ {
        res += 1.0 / fact(i);
    }

    return res;
}

pub fn main() {
    double e = get_e();
    printf("e = %.50f\n", e);
    return 0;
}

Please keep in mind that you also have access to the break and continue keywords while inside of a loop, which break exeuction early or continue to the next iteration respectively.


Standalone blocks

  • A standalone block is somewhat equivalent to an if true statement, although they are not implemented exactly the same internally. It creates a block of code that is executed on a seperate "branch" to the main code in the function. This means that if you run something like defer inside of a standalone block it would call that when the standalone block leaves scope, not the function itself.

Here's a simple example:

pub fn main() {
    int a = 0;

    {
        a++;
        // If we do *something* here like calling defer then
        // the defer would run when this block leaves its scope
    }

    return 0;
}

And it is relatively clear how this code is essentially equal to:

pub fn main() {
    int a = 0;

    if true {
        a++;
        // If we do *something* here like calling defer then
        // the defer would run when this block leaves its scope
    }

    return 0;
}

Variadic Functions

  • A variadic function is a function that can take in a variable amount of arguments. This works similar to C except that there are macros which allow you to get the argument size.

Here's a basic example of a variadic function which takes in any amount of arguments and returns their sum:

fn add(int size, ...) {
    variadic args[size * #size(int)];
    int res = 0;

    for int i = 0; i < size; i++ {
        res += args yield int;
    }

    return res;
}

Let's go through an explanation for how this works:

  • L1: Declare the function signature. We declare an int size as a static positional argument, then the ... denotes that the function is variaidic.
  • L2: Declare the args variable as a pointer to the start of the variadic arguments. This is denoted by variadic name[size * #size(ty)]. Ensure that whatever type you pass into #size is the type you're going to yield later. This call internally stack allocates memory of the size specified and then calls vastart on the returned pointer.
  • L3: Initialize the result at 0.
  • L5: Declare a for loop with an unused iterator from 0 to the size - 1. This will allow you to loop through all of the arguments that will be provided by the user. This is necessary because you can yield arguments forever, however if you don't know how many there are then you will enter uninitialized memory. A method of passing the argument length will be shown later at the call site.
  • L6: Yield the next argument from the args pointer as an Int type, and add it to the result value
  • L9: Return the summed value. Right before this point, the free call that we deferred earlier would be called.

At the call-site, using this function is easy due to the syntactic sugar provided by Elle. It can be done like this:

pub fn main() {
    int res = add.(1, 2, 3, 4);
    printf("%d\n", res);
    return 0;
}

Notice the add.(a, b) syntax. This is a compile time macro which automatically adds the argument length as the 0th argument of the function, substituting it for the size of the variadic function. This means that calling add.(a, b, c) is actually identical to calling add(3, a, b, c), you simply no longer need to pass the argument length manually, like in C.

Examples that contain variadic functions include concat.elle and variadic.elle.


Exact literals

  • An exact literal is Elle's way of implementing inline IR into the language. This basically means that you can write intermediate language code directly in Elle which compiles without any type, size, scope, or name context.

You can create an "exact literal" by wrapping the inline IR with $$ on both sides of the expression, and ensuring you include a semicolon at the end.

You can also use the manual return directive, which states that Elle should NOT include an automatic return if the function does not return anything by default. You can do this by writing $$__MANUAL_RETURN__$$ at any point in the top level of your function declaration (not in an inner block like an if statement or loop).

Here is a basic example that dereferences an int * to the underlying int:

fn deref(int *ptr) -> int {
    $$__MANUAL_RETURN__$$;
    $$%deref.val =w loadsb %ptr.1$$;
    $$ret %deref.val$$;
}

pub fn main() {
    int some_buffer[1];
    some_buffer[0] = 123;

    // Print the value at the 0th index (pointer start + 0)
    // This is identical to `some_buffer[0]`
    printf("%d\n", deref(some_buffer + 0));
    return 0;
}
  • These expressions will expand into the exact characters you type into the intermediate language code.

  • Typing $$storeb 0, %tmp_12$$ will write exactly storeb 0, %tmp_12 into the intermediate language, completely ignoring types, sigils, etc.

  • Only use this for basic operations, it is not intended as a replacement for writing Elle code as block-scoped variables are written with a temporary counter and cannot be referenced directly from exact literals.

  • The function syntax func.(a, b, c) works as follows:

    • func(a, b, c) expands to func(a, b, c)
    • func.(a, b, c, d) expands to func(4, a, b, c, d)
    • func.(a, b, c) expands to func(3, a, b, c)
    • func.(a, b) expands to func(2, a, b)

The number placed at the 0th argument is the number of arguments. This is a simple way to allow to get the number of arguments that were passed into the function without needing to manually specify them.


Static buffers

  • A static buffer is a basic allocation of stack memory with a specified, static size (you cannot put variables here, only literals like numbers).
  • You can allocate a buffer with the type buf[size] syntax.
  • This expands into the following IR:
data $buf_5 = { z (size * type_size) }
  • Assuming you wrote the above code, you would be able to reference the buf variable which is a pointer to the data section that holds the buffer. The buffer may be called anything though, of course.
  • Example:
char out[128];
gets(out); // Keep in mind that this is unsafe
puts(out);

Defer statements

  • A defer statement is commonly used to group together memory allocation and deallocation. A simple explanation is that it consumes whatever operation is defined inside and only runs this code when the function is about to go out of scope, ie during a return in a block/if statement/while loop, or an implicit return due to the function scope being left.

A very simple example of this is declaring a variable and deferring printing its value, like this:

fn print_int(int num) {
    printf("%d\n", num);
}

pub fn main() {
    int i = 0;

    // If this were not in a defer statement, then this would print 0
    // However, it will print 25 instead.
    defer print_int(i);

    i += 5;
    i *= i;

    return 0;
}

You can see how this only calls print_int right before it returns 0, which is indeed after the i variable has had changes made to it. This also works if you return in other scopes, such as if statements, while loops, standalone blocks, etc, as stated above. Any defer statements in inner blocks will not be called on any return, rather will only be called when the inner block is about to leave scope.

This also means that if you, hypothetically, design a program like this

fn print_int(int num) {
    printf("%d\n", num);
}

pub fn main() {
    int i = 0;
    defer print_int(i);

    {
        defer print_int(i);
        i += 2;
    }

    i *= i;
    return 0;
}

The expected output is 2, then 4. This is because it will call print_int once when the standalone block will leave scope, at which point i is 2, then it will call print_int again when the function itself will leave scope, at which it will be 4 because i was squared (i *= i).

You can also write something like this:

pub fn main() {
    int i = 0;
    defer print_int(i);

    {
        defer print_int(i);
        i += 2;

        {
            return 0;
        }
    }

    i *= i;
    return 0;
}

Here we expect i (2) to be printed to the console twice. Why? When the function returns, the scope created by the standalone block is also inherently about to be left. Hence, we also need to call all non-root deferrers here.

The most useful application of deferring is for memory management, however.

Consider this code:

pub fn main() {
    long size = 10;
    long *numbers = malloc(size * 8); // 8 = size of a Long
    defer free(numbers);

    for (long i = 0; i < size - 1; i++) {
        numbers[i] = i * 2;
        long res = numbers[i];
        printf("numbers[%ld] = %ld\n", i, res);
    }

    // (Ignore that this never runs)
    if numbers[2] + 1 * 5 == 10 {
        // Calls `free` here
        return 1;
    }

    // Calls `free` here
    return 0;
}

Without deferring, you would have to call free at every single place where you return. Not only is this inefficient, but also very easy to forget.

Of course for a function like the above, you are able to determine what path the code will take at compile time, however if you use something like rand() you no longer have the ability to do this, so you need to call free manually at all points where the function leaves its scope. This is an elegant method to prevent that.


Type definitions

  • A type definition is used to differentiate between the scope and size of different variables. You must define a type when declaring variables, taking variables as arguments in a function, and yielding the next value from a variadic argument pointer.

Elle's types are quite similar to C in terms of their definition. They can be a recursive pointer type too such as char *. Although C has a limit on the number of pointers that a type can have (it is 2 in the C spec), Elle does not because a type is an Option internally and as such, the concept of a void * or nil * does not exist.

These are the mappings of types in Elle:

  • nil - No type. Can be used to give a function that doesn't return anything an explicit return type. Keep in mind that this is purely semantic and cannot be used as a standalone type for variables.
  • bool - A mapping to int, and works purely as a semantic for boolean literals like true or false that expand to 1 or 0 respectively.
  • char - A mapping to byte representing a character in ASCII.
  • int - A "word", also knows as a 32 bit signed integer.
  • long - A signed integer of the size specified by your computer's architecture. On x64 computers this is a 64-bit signed integer, while on x86 computers this is a 32-bit signed integer.
  • single - A 32-bit signed floating point number.
  • float - A mapping to a single.
  • double - A 64-bit signed floating point number, providing double the precision of a single.
  • function - A type that maps to a byte. This is intended to be used as a pointer to the first byte of a function definition.
  • pointer - Denoted by <type> * -> As pointers are just a number, an address in memory, a pointer in Elle is just a long that holds extra context by holding another type so that it can use its size to calculate an offset when indexing an array.
  • string - A mapping to a char *, which is a pointer to the start of the array of characters.

Type Conversion / Casting

  • A type conversion consists of converting a variable from one type to another, usually compromising precision if converting to a type with a lower size (double -> single) or having more precision if promoting a type (int -> long).

You can cast a type in a similar manner to C.

Here is an example that casts a float to an integer to add it to another integer:

pub fn main() {
    float a = 1.5;
    int b = (int)a + 2;
}

Casting is not necessary here, because the Elle compiler is smart enough to automatically cast the float to an int when compiling the arithmetic operation, based on a weight that each type is assigned.


You can also cast to pointer types, however note that, unlike C, casting to a pointer type when using malloc is not necessary because malloc is not interfaced in the Elle internals.

This means you can write:

pub fn main() {
    double *a = malloc(1024 * 8); // Where 8 = size of a double
}

and Elle will not complain. You can, if you wish, cast it, however it will have no effect at the moment.


Unary operators

  • A unary operator is a token used as a prefix to a literal or identifer to apply some operation to it, like negating it.

There are 4 unary operators in Elle:
!, &, -, and +.

Any identifier or literal can be prefixed by one of these operators.

Example of using bitwise NOT:

pub fn main() {
    bool myBool = false;

    if !myBool {
        puts("Hello world!");
    }
}

This can also be used for negative or positive values:

const long MAX_SIGNED_LONG = 9_223_372_036_854_775_807;
const long MIN_SIGNED_LONG = -MAX_SIGNED_LONG - 1;

Using unary - will multiply the expression by -1 while unary + will multiply the expression by 1.

The unary & operator is used to get the memory address of a local variable in a function. Here is an example:

fn other(int *something) {
    printf("%d\n", *something);
}

pub fn main() {
    int a = 39;
    other(&a);
    return 0;
}

Here we declare a as 39, then we pass the "address" of a to other as a pointer to an int, then this pointer is dereferenced. Keep in mind that you can only take the address of an identifier. A & operator must ALWAYS have an identifier following it.


The unary * operator is used to dereference a pointer to a value:

fn other(int *a, string *str) {
    printf("(fn other)\n\ta = %d\n\tstr = %s\n", *a, *str);
    *a = 542;
}

fn main() {
    int a = 39;
    string str = "Hello world!";

    other(&a, &str);
    printf("(fn main)\n\ta = %d\n", a);
}

The example also implies that you can store values at those dereferenced addresses. You can put as many tokens as you want after the operator. It will yield until:

  • it matches a semicolon (;)
  • it matches an arithmetic operator
  • it reaches the end of the token vector

This means that if you want to manipulate the address before it is dereferenced, you can wrap it in ().

This code:

printf("%d\n", *a + 1);

will dereference a and then add 1 to the result.

This code, however:

printf("%d\n", *(a + 1));

will first add 1 to the address of a, and then will dereference that address.


Arithmetic operations

  • All arithmetic operations are declared with 2 expressions on the left and right of an operator. This means you can call functions, do other arithmetic operations inside of operations, etc.

This is the mapping defined by Elle:

  • ^ - Xor
  • * - Multiply
  • / - Divide
  • + - Add
  • - - Subtract
  • % - Modulus

Keep in mind that you can also assign these to other values. This means the following code is valid:

pub fn main() {
    int a = 1;
    a ^= 1; // a is now 0;
    printf("%d", a);

    return 0;
}

And of course, this works for every arithmetic operator, not just xor ^.

Elle follows the standard order of operations described by mathematics (typically defined as BIDMAS or PEMDAS), which means you can also wrap expressions in () to evaluate them before other expressions that may have a higher precedence.

Example of a program that calculates the xor ^ and add + of some values:

pub fn main() {
    int a = 1 + (5 ^ 2); // Xor has a lower precedence than addition

    // We're expecting this to be 8 because
    //  5 ^ 2 = 7 and 7 + 1 = 8, however
    // without the brackets it would be 4
    // because it would evaluate to 6 ^ 2 = 4
    printf("%d", a);

    return 0;
}

Array literals

  • An array literal is a simple and intuitive syntax to automatically allocate stack memory for a buffer and assign values at each offset based on the literal definition. Essentially, an expression like this:
int some_arr[] = {512, 1, -3};

would first allocate memory to a buffer and store that in a variable called some_arr with the size of 3 * 4 = 12 (because there are 3 items and the size of an int is 4 bytes) and then it would offset the pointer returned and store each value specified at that address.

So it would first store 512 at some_arr + 0, then it would store 1 at some_arr + 4 (offset accounting the size of the array type), then finally would store -3 at some_arr + 8.


Here is an example of an array that holds 3 longs:

const long MAX_SIGNED_LONG = 9_223_372_036_854_775_807;
const long MIN_SIGNED_LONG = -MAX_SIGNED_LONG - 1;

pub fn main() {
    long test[] = {MAX_SIGNED_LONG, MIN_SIGNED_LONG, -39};

    for int i = 0; i < #arrlen(test); i++ {
        printf("test[%d] = %ld\n", i, test[i]);
    }

    return 0;
}

You can also specify an enforced size in between the square brackets, if you prefer not to fill the whole array at declaration-time:

  • long test[3] = {MAX_SIGNED_LONG, MIN_SIGNED_LONG, -39};

Size directives

  • A "size directive" is similar to a sizeof builtin in C. It returns the size of various definitions verbatim.

There are currently 2 size directives in Elle: #size() and #arrlen()

You can put both types and expressions inside of the #size() directive and it returns the size of the statement verbatim.

You can only place expressions inside of the #arrlen() directive as it returns the size of the buffer divided by the size of each type. This is exactly equivalent to #size(arr) / #size(arr_type). It will crash if you try to use it on a buffer that wasn't defined in the current function.

For example, take this snippet:

fn other(int *buf) {
    printf(
        "(fn other)\n\t#size(buf) = %d\n\t#arrlen(buf) = %d\n",
        #size(buf),
        #arrlen(buf)
    );
}

pub fn main() {
    int buf[100];
    buf[0] = 123;

    printf(
        "(fn main)\n\t#size(buf) = %d\n\t#arrlen(buf) = %d\n",
        #size(buf),
        #arrlen(buf)
    );

    other(buf);
    return 0;
}

At this part:

printf(
    "(fn other)\n\t#size(buf) = %d\n\t#arrlen(buf) = %d\n",
    #size(buf),
    #arrlen(buf)
);

Elle will throw a compilation error. It only has a pointer to the buffer in the other function, which means the #size call will return the size of a pointer (8 bytes on 64-bit architectures and 4 bytes on 32-bit architectures). On the other hand, trying to get the length of this array will fail because that makes no sense conceptually. In this function, the size of the array is not available, so returning anything for the #arrlen function when you have no size makes no sense.

In this example:

fn other(int *buf) {
    printf("(fn other)\n\t#size(buf) = %d\n", #size(buf));
}

pub fn main() {
    int buf[100];
    buf[0] = 123;

    printf(
        "(fn main)\n\t#size(buf) = %d\n\t#arrlen(buf) = %d\n",
        #size(buf),
        #arrlen(buf)
    );

    other(buf);
    return 0;
}

The code will compile successfully, because #arrlen is no longer used on a buffer that isn't defined in the current function.


Finally, here is a basic example of using #arrlen to loop through an array of strings and print its values:

pub fn main() {
    char *some_array[] = {"abc", "meow", "test"};

    for int i = 0; i < #arrlen(some_array); i++ {
        // Here typeof(some_array[i]) = char * = string
        printf("some_array[%d] = %s\n", i, some_array[i]);
    }

    return 0;
}

Constants

  • A constant is a value that cannot be redeclared. In Elle, constants can only be defined at the top level of files, and the reverse is also true, where the top level of files can only be constants and functions.
  • Constants can be public, declared using the pub keyword.
  • A constant internally creates a function that is automatically called when the constant is referenced.
  • Constants that create pointers (such as string literals) are referenced as the first statement of each function to bring them in scope.

Consider this example that uses constants:

const int WIDTH = 100;
const int HEIGHT = 24;
const int SIZE = WIDTH * HEIGHT;

pub fn main() {
    printf("%d\n", SIZE);
    return 0;
}

In the above code, all of the constants are technically function definitions that return the value after the = sign. However, when they're referenced, the function is automatically called. Therefore, you dont need to type SIZE() or similar, you can just directly reference SIZE as if it was a constant.

It is labelled as a "constant", because although it can return a different value (it can call any function), it cannot be redeclared.


Argc and argv

  • These are variables that can be taken as arguments from the main function that allow you to pass extra data to the executable. Conventionally, the first argument, argc, is the number of arguments, and the second argument, argv, is an array of the arguments themselves, or rather a pointer to them.

Due to Elle's compilation to QBE-IR which implements the C ABI, getting input from argc and argv is actually exactly the same as C. There is practically no difference.

Consider this function which accepts argv and prints them all to the console:

pub fn main(int argc, char **argv) {
    for int i = 0; i < argc; i++ {
        printf("argv[%d] = %s\n", i, argv[i]);
    }

    return 0;
}

It accepts argc as a signed 32-bit integer and argv as an array of char * (denoted by char **, basically a pointer to the start of the array holding char * which is a pointer to the start of each string). These arguments are optional, as you may have noticed from code examples above.

You can also accept char **envp (and char **apple on MacOS/Darwin platforms, which provides arbitrary OS information, such as the path to the executing binary).


External symbols

  • An external symbol is a definition for a function or constant that was defined elsewhere (such as in C) and is implicitly defined in Elle. This is mostly used for inferring the index of the start of variadic arguments in functions.

As Elle has no modules currently, to effectively use printf and similar functions you must declare their interface at the top level.

You can do this with the following example:

external fn printf(char *formatter, ...);

It essentially tells Elle where it should put the variadic argument starter. You could exclude this, if you like, but you will have to explicitly declare where the variadic arguments begin, because Elle no longer has this context.

You can also make these statements public:

pub external fn fprintf(long fd, char *formatter, ...);

In fact the order of prefixes before fn is not enforced, you can write external pub fn and achieve the same result.

If you do not want to declare the function's interface, you can still use the function as long as you do not pass any arguments that are variadic. If you attempt to do so, and do not declare the interface, all of the data printed will be garbage. You also have this alternative to use the function with variadic arguments but without an interface definition:

printf("%d, %d\n", $$...$$, a, b);

Even though this achieves the same behavior, manually specifying this index each time you are calling the function is not very efficient.

Keep in mind that all other examples in this README.md have the printf interface implicitly defined, however any examples in the /examples directory will have this interface (and any other necessary ones) defined explicitly.

Technical note: This declaration does not emit any IR code. It simply exists to provide more context to functions called in Elle that may not have been originally declared in Elle.


♡ If you have any questions, please raise an issue :3

All contributions to this project are welcome and I love talking about this stuff!


How to run

  • Ensure you have Rust and the QBE compiler backend.

  • Elle has no third-party dependencies, so you can simply do the following:

      $ git clone https://github.com/acquitelol/elle
    
      $ cd elle
    
      $ make compile
    • You're done!

You can now make compile to compile the compiler into an executable, then run make run <name> to run a file from the /examples directory. For example, you can run make run donut and it will run /examples/donut.elle.


Licensing


⇡ Back to top️!

About

An experimental, purely functional language built in Rust using QBE as its backend

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages