Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further semantic analyses #12

Merged
merged 16 commits into from
Feb 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/design/tracker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# The symbol tracker
As with most compilers & programming languages, Chirp compiler needs to track various names declared by the user program.
It needs to resolve identifiers to variables, parameters, namespaces, functions, types (TODO), etc.
It also needs to track various attributes and properties of symbols, like full access path (if available), whether it is global, etc.
All of that is provided and managed by the symbol tracker.

## Symbols
Symbols keep a record of various program entities. The complete list is as follows:
- Top scope - The root scope of the whole program
- Global variables - Variables that live in the top scope or in namespace scope
- Global functions
- Namespaces
- Nested scopes - Usually unnamed entities that live within some other entity's scope, like compound block scopes.

Every symbol has a parent symbol (except the root symbol, which has none), an optional local name (name that is accessible in the symbol's scope), and an optional global name (which is the path used to access the symbol from global scope).
A symbol can also be tied to a particular AST node which introduced it, so further information can be obtained (like type, kind of symbol, etc.).

## Scope
A scope defines a subset of visible symbols that are referencable at a particular point in a program. No two symbols of the same name (unless they're describing the same symbol) can exist within the same scope's local set.

A scope's local set specifies symbols and hides (temporary exclusions of symbols from the scope), which are unique to that scope. A name defined in this set is not visible outside of it, but it's visible in subscopes defined within this scope (unless a subscope hides it).

A name shadowing occurs when a scope defines a symbol (or lack therof, i.e. a hide) with the same name as a name in an enclosing scope.

Scopes are tied to symbols (which can be unnamed and local). This allows for lookup within a scope when a name is encountered.

## Name lookup
There are two kinds of lookup: unqualified and qualified. An unqualified lookup usually occurs when an identefier is to be resolved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Qualified lookup occurs when a name is to be resolved in the scope of another symbol.

When a qualified identifier is looked up, the first part of the indentifier is resolved by means of unqualified lookup, and then the following parts are resolved in scope of the previous resolved symbol by qualified lookup.

### Unqualified lookup
When an unqualified lookup occurs, the following steps are taken. All scopes, starting at the current nested scope, are examined in the order of less and less nested, for the searched name. When a symbol is found (or a lack of one is explicitly defined in a scope), the lookup stops, and the result (whether a symbol was found or not) is returned. When the search reaches the end of list of scopes, the lookup fails with no symbol returned.

### Qualified lookup
Qualified lookup considers only the scope within it occurs. When no name is defined, lookup fails with no symbol. When a name is found, the result (a symbol or not) becomes the result of the lookup.

# Tracker API
The tracker tracks all symbols used in the file currently processed. The first symbol corresponds to the root of the program syntax tree.

Creating and binding symbols is done with these instance methods:

```c++
symbol* decl_sym();
symbol* decl_sym(identifier const& name, decl& target);
```
This method creates a new symbol, and optionally assigns a name (makes it named: see `has_name`) and a target node it.

```c++
bool bind_sym(symbol* sym);
```
This method binds a symbol to the current scope. It returns true on success. On failure, returns false & reports the proper diagnostics where appropiate.

Looking up symbols is done through following instance methods:

```c++
symbol* find_sym_cur(identifier const& name);
```
This is a low-level function that only searches the current scope. If the symbol is not found within the current scope, returns null.

```c++
symbol* lookup_sym(identifier const& name);
symbol* lookup_sym_qual(qual_identifier const& name);
```
These two methods perform unqualified and qualified lookup, respectively, on identifiers. The first one doesn't report diagnostics on failure, but the second one does.

```c++
symbol* lookup_decl_sym(decl const& decl_scope, identifier const& name);
```
This low-level function performs a qualified lookup inside the given symbol to find a name. It considers only the scope of provided symbol. It reports diagnostics on failure.

These functions deal with scopes:

```c++
void push_scope(symbol* sym);
```
This method creates and enters a new scope, described by the provided symbol.

```c++
void pop_scope();
```
This method exits current nested scope and goes back to the one directly embedding it. Exiting the main program (global) scope is undefined.

## Symbol attributes
Each symbol has a following set of attributes
| Name | Type | Default value | Description |
| --- | --- | --- | :-- |
| `has_name` | `bool` | `false` | Has a `name` |
| `is_global` | `bool` | `true` | Lives in global scope (can be potentially exported) |
| `has_storage` | `bool` | `false` | Defines a concrete entity that exists in produced object code (variables, functions) |
| `is_entry` | `bool` | `false` | Is an entry declaration |
| `is_scope` | `bool` | `false` | Defines a scope (see: [Scope](#scope)) |
22 changes: 22 additions & 0 deletions lib/io.chp
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Include this or not?
namespace io
{
extern "fputs"
func int __libc_fputs(ptr const char: string, ptr: fileio);
extern "fputc"
func int __libc_fputc(int: ch, ptr: fileio);
extern "stdout"
ptr: __libc_stdout;

func none write(ptr const char: string)
{
__libc_fputs(string, __libc_stdout);
}

func none print(ptr const char: string)
{
__libc_fputs(string, __libc_stdout);
# Newline
__libc_fputc(10, __libc_stdout);
}
}
5 changes: 5 additions & 0 deletions lib/mem.chp
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namespace mem
{
extern "malloc"
func ptr alloc(unsigned long: size);
}
26 changes: 16 additions & 10 deletions samples/features.chp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import "math.trig";
namespace mem
{
extern "malloc"
func ptr alloc(long: size);
func ptr alloc(unsigned long: size);
}

namespace io
Expand All @@ -28,8 +28,8 @@ namespace oslib

# Forward function declarations
func ptr const char type_name(int: id);
func ptr const char find_cstr_end(ptr const char); # Unnamed parameters (TODO)
func none memcpy(ptr: dest, ptr const: src, int: count);
func ptr const char find_cstr_end(ptr const char); # Unnamed parameters
func ptr memcpy(ptr: dest, ptr const: src, unsigned long: count);

# Functions
func ptr const char type_name(int: id)
Expand Down Expand Up @@ -64,18 +64,21 @@ func ptr const char find_cstr_end(ptr const char: s)
ret s;
}

func none memcpy(ptr: dest, ptr const: src, int: count)
func ptr memcpy(ptr: dest, ptr const: src, unsigned long: count)
{
# Pointer convertions (TODO: type-check)
ptr byte: _dest = dest as ptr char;
ptr const byte: _src = src as ptr const char;
# Convert to bool (TODO)
ptr byte: _dest = dest as ptr byte;
ptr const byte: _src = src as ptr const byte;
# Convert to bool
while count
{
# lvalue assignment (TODO)
deref _dest = deref _src;
count = count - 1;
_dest = _dest + 1;
_src = _src + 1;
}
ret dest;
}

# Program entry point
Expand All @@ -84,12 +87,15 @@ entry
# Function calls
ptr const char: my_string = "hello";
ptr const char: my_string_end = find_cstr_end(my_string);
const int: my_string_size = my_string_end - my_string + 2;
const int: my_string_size = my_string_end - my_string + 1;
ptr char: my_heap_str = mem.alloc(my_string_size);
memcpy(my_heap_str, my_string, my_string_size);
const char: a = 0; # Empty string
ptr char: a = alloca(char) 3;
deref a = 'h';
deref (a + 1) = 'i';
deref (a + 2) = 0;
io.print(my_heap_str);
io.print(" ");
io.print(type_name(1));
io.print(ref a);
io.print(a);
}
65 changes: 60 additions & 5 deletions samples/fib.chp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
func int fib(int: n)
func unsigned long fib(unsigned int: n)
{
int: a = 0;
int: b = 1;
unsigned long: a = 0;
unsigned long: b = 1;
while n != 0
{
const int: tmp = b;
Expand All @@ -15,10 +15,65 @@ func int fib(int: n)
import "io"
namespace io
{
func none print(ptr const char: msg, int: param);
func none write(ptr const char: str);
func none print(ptr const char: str);

# Returns number of chars needed to store the string, including zero-terminator
func unsigned long ulong_to_string(unsigned long: val, ptr char: str, unsigned long: buflen)
{
unsigned long: size = 1;
unsigned long: _val = val;
unsigned long: ct = 1;
unsigned long: idx = 0;
while _val / ct >= 10
{
ct = ct * 10;
size = size + 1;
}
while ct != 1
{
if idx == buflen
{
ct = 1;
}
else
{
ct = ct / 10;
deref(str + idx) = _val / ct + '0';
_val = _val - _val / ct * ct;
idx = idx + 1;
}
}
if idx != buflen
{
deref(str + idx) = 0;
}
ret size + 1;
}
}

entry
{
io.print("Result of fib(5): ", fib(5));
ptr char: small_buf = alloca(char) 5;
unsigned long: buf_size = 5;
unsigned long: value = fib(5);
unsigned long: size = io.ulong_to_string(value, small_buf, buf_size);
if size > buf_size
{
buf_size = size;
small_buf = alloca(char) buf_size;
io.ulong_to_string(value, small_buf, buf_size);
}
io.write("Result of fib(5): ");
io.print(small_buf);
value = fib(50);
size = io.ulong_to_string(value, small_buf, buf_size);
if size > buf_size
{
buf_size = size;
small_buf = alloca(char) buf_size;
io.ulong_to_string(value, small_buf, buf_size);
}
io.write("Result of fib(50): ");
io.print(small_buf);
}
Loading