Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolk Language: next-generation FunC #1345

Merged
merged 12 commits into from
Nov 2, 2024
Merged

Tolk Language: next-generation FunC #1345

merged 12 commits into from
Nov 2, 2024

Conversation

tolk-vm
Copy link
Contributor

@tolk-vm tolk-vm commented Nov 2, 2024

Tolk is a new language for writing smart contracts in TON. Think of Tolk as the "next‑generation FunC". Tolk compiler is literally a fork of FunC compiler, introducing familiar syntax similar to TypeScript, but leaving all low-level optimizations untouched.

Motivation behind Tolk

FunC is awesome. It is really low-level and encourages a programmer to think about compiler internals. It gives full control over TVM assembler, allowing a programmer to make his contract as effective as possible. If you get used to it, you love it.

But there is a problem. FunC is "functional C", and it's for ninja. If you are keen on Lisp and Haskell, you'll be happy. But if you are a JavaScript / Go / Kotlin developer, its syntax is peculiar for you, leading to occasional mistakes. A struggle with syntax may decrease your motivation for digging into TON.

Imagine, what if there was a language, also smart, also low-level, but not functional and not like C? Leaving all beauty and complexity inside, what if it would be more similar to popular languages at first glance?

That's what Tolk is about.

Meaning of the name "Tolk"

"Tolk" is a very beautiful word.

In English, it's consonant with talk. Because, generally, what do we need a language for? We need it to talk to computers.

In all slavic languages, the root tolk and the phrase "to have tolk" means "to make sense"; "to have deep internals".

But actually, TOLK is an abbreviation.
You know, that TON is The Open Network.
By analogy, TOLK is The Open Language K.

What is K, will you ask? Probably, "kot" — the nick of Nikolay Durov? Or Kolya? Kitten? Kernel? Kit? Knowledge?
The right answer — none of this. This letter does not mean anything. It's open.
The Open Letter K

History of Tolk origin

In June 2024, I created a pull request FunC v0.5.0. Besides this PR, I've written a roadmap — what can be enhanced in FunC, syntactically and semantically.

All in all, instead of merging v0.5.0 and continuing developing FunC, we decided to fork it. To leave FunC untouched, as it is. As it always was. And to develop a new language, driven by a fresh and new name.

For several months, I have worked on Tolk privately. I have implemented a giant list of changes. And it's not only about the syntax. For instance, Tolk has an internal AST representation, completely missed in FunC.

On TON Gateway, on 1-2 November in Dubai, I had a speech presenting Tolk to the public, and we released it the same day. The video is available on YouTube.

The first version of the Tolk Language is v0.6, a metaphor of FunC v0.5 that missed a chance to occur.

Tolk vs FunC: in short

Tolk is much more similar to TypeScript and Kotlin than to C and Lisp. But it still gives you full control over TVM assembler, since it has a FunC kernel inside.

  1. Functions are declared via fun, get methods via get, variables via var (and val for immutable), putting types on the right; parameter types are mandatory; return type can be omitted (auto inferred), as well as for locals; specifiers inline and others are @ attributes
global storedV: int;

fun parseData(cs: slice): cell {
    var flags: int = cs.loadMessageFlags();
    ...
}

@inline
fun sum(a: int, b: int) {   // auto inferred int
    val both = a + b;       // same
    return both;
}

get currentCounter(): int { ... }
  1. No impure, it's by default, compiler won't drop user function calls
  2. Not recv_internal and recv_external, but onInternalMessage and onExternalMessage
  3. 2+2 is 4, not an identifier; identifiers are alpha-numeric; use naming const OP_INCREASE instead of const op::increase
  4. Logical operators AND &&, OR ||, NOT ! are supported
  5. Syntax improvements:
    • ;; comment// comment
    • {- comment -}/* comment */
    • #includeimport, with a strict rule "import what you use"
    • ~ found!found (for true/false only, obviously) (true is -1, like in FunC)
    • v = null()v = null
    • null?(v)v == null, same for builder_null? and others
    • ~ null?(v)c != null
    • throw(excNo)throw excNo
    • catch(_, _)catch
    • catch(_, excNo)catch(excNo)
    • throw_unless(excNo, cond)assert(cond, excNo)
    • throw_if(excNo, cond)assert(!cond, excNo)
    • return ()return
    • do ... until (cond)do ... while (!cond)
    • elseifelse if
    • ifnot (cond)if (!cond)
  6. A function can be called even if declared below; forward declarations not needed; the compiler at first does parsing, and then it does symbol resolving; there is now an AST representation of source code
  7. stdlib functions renamed to verbose clear names, camelCase style; it's now embedded, not downloaded from GitHub; it's split into several files; common functions available always, more specific available with import "@stdlib/tvm-dicts", IDE will suggest you
  8. No ~ tilda methods; cs.loadInt(32) modifies a slice and returns an integer; b.storeInt(x, 32) modifies a builder; b = b.storeInt() also works, since it not only modifies, but returns; chained methods work identically to JS, they return self; everything works exactly as expected, similar to JS; no runtime overhead, exactly same Fift instructions; custom methods are created with ease; tilda ~ does not exist in Tolk at all

Tooling around:

  • JetBrains plugin exists
  • VS Code extension exists
  • WASM wrapper for blueprint exists
  • Documentation and migration guide exists
  • And even a converter from FunC to Tolk exists

Tolk vs FunC: in detail

A very huge list below. Will anyone have enough patience to read it up to the end?..

✅ Traditional comments :)

FunC Tolk
;; comment // comment
{- multiline comment -} /* multiline comment */

2+2 is 4, not an identifier. Identifiers can only be alpha-numeric

In FunC, almost any character can be a part of identifier. For example, 2+2 (without a space) is an identifier. You can even declare a variable with such a name.

In Tolk, spaces are not mandatory. 2+2 is 4, as expected. 3+~x is 3 + (~ x), and so on.

FunC Tolk
return 2+2; ;; undefined function `2+2` return 2+2; // 4

More precisely, an identifier can start from [a-zA-Z$_] and be continued with [a-zA-Z0-9$_]. Note, that ?, :, and others are not valid symbols, found? and op::increase are not valid identifiers.

You can use backticks to surround an identifier, and then it can contain any symbols (similar to Kotlin and some other langs). Its potential usage is to allow keywords be used as identifiers, in case of code generation by a scheme, for example.

FunC Tolk
const op::increase = 0x1234; const OP_INCREASE = 0x1234;
;; even 2%&!2 is valid
int 2+2 = 5;
// don\'t do like this :)
var \`2+2\` = 5;

✅ Impure by default, compiler won't drop user function calls

FunC has an impure function specifier. When absent, a function is treated as pure. If its result is unused, its call was deleted by the compiler.

Though this behavior is documented, it is very unexpected to newcomers. For instance, various functions that don't return anything (throw an exception on mismatch, for example), are silently deleted. This situation is spoilt by the fact that FunC doesn't check and validate function body, allowing impure operations inside pure functions.

In Tolk, all functions are impure by default. You can mark a function pure with annotation, and then impure operations are forbidden in its body (exceptions, globals modification, calling non-pure functions, etc.).

✅ New functions syntax: fun keyword, @ attributes, types on the right (like in TypeScript, Kotlin, Python, etc.)

FunC Tolk
cell parse_data(slice cs) { } fun parse_data(cs: slice): cell { }
(cell, int) load_storage() { } fun load_storage(): (cell, int) { }
() main() { ... } fun main() { ... }

Types of variables — also to the right:

FunC Tolk
slice cs = ...; var cs: slice = ...;
(cell c, int n) = parse_data(cs); var (c: cell, n: int) = parse_data(cs);
global int stake_at; global stake_at: int;

Modifiers inline and others — with annotations:

FunC Tolk

int f(cell s) inline {
@inline
fun f(s: cell): int {

() load_data() impure inline_ref {
@inline_ref
fun load_data() {
global int stake_at; global stake_at: int;

forall — this way:

FunC Tolk
forall X -> tuple cons(X head, tuple tail) fun cons<X>(head: X, tail: tuple): tuple

asm implementation — like in FunC, but being properly aligned, it looks nicer:

@pure
fun third<X>(t: tuple): X
    asm "THIRD";

@pure
fun iDictDeleteGet(dict: cell, keyLen: int, index: int): (cell, slice, int)
    asm(index dict keyLen) "DICTIDELGET NULLSWAPIFNOT";

@pure
fun mulDivFloor(x: int, y: int, z: int): int
    builtin;

There is also a @deprecated attribute, not affecting compilation, but for a human and IDE.

get instead of method_id

In FunC, method_id (without arguments) actually declared a get method. In Tolk, you use a straightforward syntax:

FunC Tolk
int seqno() method_id { ... } get seqno(): int { ... }

Both get methodName() and get fun methodName() are acceptable.

For method_id(xxx) (uncommon in practice, but valid), there is an attribute:

FunC Tolk
() after_code_upgrade(cont old_code) 
              impure method_id(1666)
@method_id(1666)
fun afterCodeUpgrade(oldCode: continuation)

✅ It's essential to declare types of parameters (though optional for locals)

// not allowed
fun do_smth(c, n)
// types are mandatory
fun do_smth(c: cell, n: int)

There is an auto type, so fun f(a: auto) is valid, though not recommended.

If parameter types are mandatory, return type is not (it's often obvious of verbose). If omitted, it means auto:

fun x() { ... }  // auto infer return

For local variables, types are also optional:

var i = 10;                      // ok, int
var b = beginCell();             // ok, builder
var (i, b) = (10, beginCell());  // ok, two variables, int and builder

// types can be specified manually, of course:
var b: builder = beginCell();
var (i: int, b: builder) = (10, beginCell());

✅ Variables are not allowed to be redeclared in the same scope

var a = 10;
...
var a = 20;  // error, correct is just `a = 20`
if (1) {
    var a = 30;  // it's okay, it's another scope
}

As a consequence, partial reassignment is not allowed:

var a = 10;
...
var (a, b) = (20, 30);  // error, releclaration of a

Note, that it's not a problem for loadUint() and other methods. In FunC, they returned a modified object, so a pattern var (cs, int value) = cs.load_int(32) was quite common. In Tolk, such methods mutate an object: var value = cs.loadInt(32), so redeclaration is unlikely to be needed.

fun send(msg: cell) {
    var msg = ...;  // error, redeclaration of msg

    // solution 1: intruduce a new variable
    var msgWrapped = ...;
    // solution 2: use `redef`, though not recommended
    var msg redef = ...;

✅ Changes in the type system

Type system in the first Tolk release is the same as in FunC, with the following modifications:

  • void is effectively an empty tensor (more canonical to be named unit, but void is more reliable); btw, return (without expression) is actually return (), a convenient way to return from void functions
fun setContractData(c: cell): void
    asm "c4 POP";
  • auto mean "auto infer"; in FunC, _ was used for that purpose; note, that if a function doesn't specify return type, it's auto, not void
  • self, to make chainable methods, described below; actually it's not a type, it can only occur instead of return type of a function
  • cont renamed to continuation

✅ Another naming for recv_internal / recv_external

fun onInternalMessage
fun onExternalMessage
fun onTickTock
fun onSplitPrepare
fun onSplitInstall

All parameter types and their order rename the same, only naming is changed. fun main is also available.

✅ #include → import. Strict imports

FunC Tolk
#include "another.fc"; import "another.tolk"

In Tolk, you can not used a symbol from a.tolk without importing this file. In other words, "import what you use".

All stdlib functions are available out of the box, downloading stdlib and #include "stdlib.fc" is not needed. See below about embedded stdlib.

There is still a global scope of naming. If f is declared in two different files, it's an error. We "import" a whole file, no per-file visibility and export keyword is now supported, but probably will be in the future.

✅ #pragma → compiler options

In FunC, "experimental" features like allow-post-modifications were turned on by a pragma in .fc files (leading to problems when some files contain it, some don't). Indeed, it's not a pragma for a file, it's a compilation option.

In Tolk, all pragmas were removed. allow-post-modification and compute-asm-ltr were merged into Tolk sources (as if they were always on in FunC). Instead of pragmas, there is now an ability to pass experimental options.

As for now, there is one experimental option introduced — remove-unused-functions, which doesn't include unused symbols to Fift output.

#pragma version xxx was replaced by tolk xxx (no >=, just a strict version). It's good practice to annotate compiler version you are using. If it doesn't match, Tolk will show a warning.

tolk 0.6

✅ Late symbols resolving. AST representation

In FunC (like in С) you can not access a function declared below:

int b() { a(); }   ;; error
int a() { ... }    ;; since it's declared below

To avoid an error, a programmer should create a forward declaration at first. The reason is that symbols resolving is performed right at the time of parsing.

Tolk compiler separates these two steps. At first it does parsing, and then it does symbol resolving. Hence, a snippet above would not be erroneous.

Sounds simple, but internally, it's a very huge job. To make this available, I've introduced an intermediate AST representation, completely missed in FunC. That's an essential point of future modifications and performing semantic code analisys.

null keyword

Creating null values and checking variables on null looks very pretty now.

FunC Tolk
a = null() a = null
if (null?(a)) if (a == null)
if (~ null?(b)) if (b != null)
if (~ cell_null?(c)) if (c != null)

Note, that it does NOT mean that Tolk language has nullability. No, you can still assign null to an integer variable — like in FunC, just syntactically pleasant. A true nullability will be available someday, after hard work on the type system.

throw and assert keywords

Tolk greatly simplifies working with exceptions.

If FunC has throw(), throw_if(), throw_arg_if(), and the same for unless, Tolk has only two primitives: throw and assert.

FunC Tolk
throw(excNo) throw excNo
throw_arg(arg, excNo) throw (excNo, arg)
throw_unless(excNo, condition) assert(condition, excNo)
throw_if(excNo, condition) assert(!condition, excNo)

Note, that !condition is possible since logical NOT is available, see below.

There is a long (verbose) syntax of assert(condition, excNo):

assert(condition) throw excNo;
// with possibility to include arg to throw

Also, Tolk swaps catch arguments: it's catch (excNo, arg), both optional (since arg is most likely empty).

FunC Tolk
try { } catch (_, _) { } try { } catch { }
try { } catch (_, excNo) { } try { } catch(excNo) { }
try { } catch (arg, excNo) { } try { } catch(excNo, arg) { }

do ... untildo ... while

FunC Tolk
do { ... } until (~ condition); do { ... } while (condition);
do { ... } until (condition); do { ... } while (!condition);

Note, that !condition is possible since logical NOT is available, see below.

✅ Operator precedence became identical to C++ / JavaScript

In FunC, such code if (slices_equal() & status == 1) is parsed as if( (slices_equal()&status) == 1 ). This is a reason of various errors in real-world contracts.

In Tolk, & has lower priority, identical to C++ and JavaScript.

Moreover, Tolk fires errors on potentially wrong operators usage to completely eliminate such errors:

if (flags & 0xFF != 0)

will lead to a compilation error (similar to gcc/clang):

& has lower precedence than ==, probably this code won't work as you expected.  Use parenthesis: either (... & ...) to evaluate it first, or (... == ...) to suppress this error.

Hence, the code should be rewritten:

// either to evaluate it first (our case)
if ((flags & 0xFF) != 0)
// or to emphasize the behavior (not our case here)
if (flags & (0xFF != 0))

I've also added a diagnostic for a common mistake in bitshift operators: a << 8 + 1 is equivalent to a << 9, probably unexpected.

int result = a << 8 + low_mask;

error: << has lower precedence than +, probably this code won't work as you expected.  Use parenthesis: either (... << ...) to evaluate it first, or (... + ...) to suppress this error.

Operators ~% ^% /% ~/= ^/= ~%= ^%= ~>>= ^>>= no longer exist.

✅ Immutable variables, declared via val

Like in Kotlin: var for mutable, val for immutable, optionally followed by a type. FunC has no analogue of val.

val flags = msgBody.loadMessageFlags();
flags &= 1;         // error, modifying an immutable variable

val cs: slice = c.beginParse();
cs.loadInt(32);     // error, since loadInt() mutates an object
cs.preloadInt(32);  // ok, it's a read-only method

Parameters of a function are mutable, but since they are copied by value, called arguments aren't changed. Exactly like in FunC, just to clarify.

fun some(x: int) {
    x += 1;
}

val origX = 0;
some(origX);      // origX remains 0

fun processOpIncrease(msgBody: slice) {
    val flags = msgBody.loadInt(32);
    ...
}

processOpIncrease(msgBody);  // by value, not modified

In Tolk, a function can declare mutate parameters. It's a generalization of FunC ~ tilda functions, read below.

✅ Deprecated command-line options removed

Command-line flags -A, -P, and others, were removed. Default behavior

/path/to/tolk {inputFile}

is more than enough. Use -v to print version and exit. Use -h for all available command-line flags.

Only one input file can be passed, others should be import'ed.

✅ stdlib functions renamed to verbose clear names, camelCase style

All naming in standard library was reconsidered. Now, functions are named using longer, but clear names.

FunC Tolk
cur_lt()
car(l)
get_balance().pair_first()
raw_reserve(count)
dict~idict_add?(...)
dict~udict::delete_get_max()
t~tpush(triple(x, y, z))
s.slice_bits()
~dump(x)
...
getLogicalTime()
listGetHead(l)
getMyOriginalBalance()
reserveToncoinsOnBalance(count)
dict.iDictSetIfNotExists(...)
dict.uDictDeleteLastAndGet()
t.tuplePush([x, y, z])
s.getRemainingBitsCount()
debugPrint(x)
...

A former "stdlib.fc" was split into multiple files: common.tolk, tvm-dicts.tolk, and others.

✅ stdlib is now embedded, not downloaded from GitHub

FunC Tolk
  1. Download stdlib.fc from GitHub

  2. Save into your project

  3. #include "stdlib.fc";

  4. Use standard functions

1. Use standard functions

In Tolk, stdlib a part of distribution. Standard library is inseparable, since keeping a triple "language, compiler, stdlib" together is the only correct way to maintain release cycle.

It works in such a way. Tolk compiler knows how to locate a standard library. If a user has installed an apt package, stdlib sources were also downloaded and exist on a hard disk, so the compiler locates them by system paths. If a user uses a WASM wrapper, they are provided by tolk-js. And so on.

Standard library is split into multiple files: common.tolk (most common functions), gas-payments.tolk (calculating gas fees), tvm-dicts.tolk, and others. Functions from common.tolk are available always (a compiler implicitly imports it). Other files are needed to be explicitly imported:

import "@stdlib/tvm-dicts"   // ".tolk" optional

...
var dict = createEmptyDict();
dict.iDictSet(...);

Mind the rule "import what you use", it's applied to @stdlib/... files also (with the only exception of "common.tolk").

JetBrains IDE plugin automatically discovers stdlib folder and inserts necessary imports as you type.

✅ Logical operators && ||, logical not !

In FunC, there are only bitwise operators ~ & | ^. Developers making first steps, thinking "okay, no logical, I'll use bitwise in the same manner", often do errors, since operator behavior is completely different:

a & b a && b
sometimes, identical:
0 & X = 0 0 & X = 0
-1 & X = -1 -1 & X = -1
but generally, not:
1 & 2 = 0 1 && 2 = -1 (true)
~ found !found
sometimes, identical:
true (-1) → false (0) -1 → 0
false (0) → true (-1) 0 → -1
but generally, not:
1 → -2 1 → 0 (false)
condition & f() condition && f()
f() is called always f() is called only if condition
condition | f() condition || f()
f() is called always f() is called only if condition is false

Tolk supports logical operators. They behave exactly as you get used to (right column). For now, && and || sometimes produce not optimal Fift code, but in the future, Tolk compiler will become smarter in this case. It's negligible, just use them like in other languages.

FunC Tolk
if (~ found?) if (!found)
if (~ found?) {
    if (cs~load_int(32) == 0) {
        ...
    }
}
if (!found && cs.loadInt(32) == 0) {
    ...
}
ifnot (cell_null?(signatures)) if (signatures != null)
elseifnot (eq_checksum) else if (!eqChecksum)

Keywords ifnot and elseifnot were removed, since now we have logical not (for optimization, Tolk compiler generates IFNOTJMP, btw). Keyword elseif was replaced by traditional else if.

Note, that it does NOT mean that Tolk language has bool type. No, comparison operators still return an integer. A bool type support will be available someday, after hard work on the type system.

Remember, that true is -1, not 1. Both in FunC and Tolk. It's a TVM representation.

✅ No tilda ~ methods, mutate keyword instead

This change is so huge that it's described in a separate section:

Tolk mutate vs FunC ~ tilda functions

TLDR:

  • no ~ tilda methods
  • cs.loadInt(32) modifies a slice and returns an integer
  • b.storeInt(x, 32) modifies a builder
  • b = b.storeInt() also works, since it not only modifies, but returns
  • chained methods work identically to JS, they return self
  • everything works exactly as expected, similar to JS
  • no runtime overhead, exactly same Fift instructions
  • custom methods are created with ease
  • tilda ~ does not exist in Tolk at all

This is a drastic change. If FunC has .methods() and ~methods(), Tolk has only dot, one and only way to call a .method(). A method may mutate an object, or may not. Unlike the list "in short", it's a behavioral and semantic difference from FunC.

The goal is to have calls identical to JS and other languages:

FunC Tolk
int flags = cs~load_uint(32);
var flags = cs.loadUint(32);
(cs, int flags) = cs.load_uint(32);
var flags = cs.loadUint(32);
(slice cs2, int flags) = cs.load_uint(32);
var cs2 = cs;
var flags = cs2.loadUint(32);
slice data = get_data()
             .begin_parse();
int flag = data~load_uint(32);
val flag = getContractData()
           .beginParse()
           .loadUint(32);
dict~udict_set(...);
dict.uDictSet(...);
b~store_uint(x, 32);
b.storeInt(x, 32);
b = b.store_int(x, 32);
b.storeInt(x, 32);

// also works
b = b.storeUint(32);
b = b.store_int(x, 32)
     .store_int(y, 32);
b.storeInt(x, 32)
 .storeInt(y, 32);

// b = ...; also works

In order to make this available, Tolk offers a mutability conception, which is a generalization of what a tilda means in FunC.

By default, all arguments are copied by value (identical to FunC)

fun someFn(x: int) {
    x += 1;
}

var origX = 0;
someFn(origX);  // origX remains 0
someFn(10);     // ok, just int
origX.someFn(); // still allowed (but not recommended), origX remains 0

Same goes for cells, slices, whatever:

fun readFlags(cs: slice) {
    return cs.loadInt(32);
}

var flags = readFlags(msgBody);  // msgBody is not modified
// msgBody.loadInt(32) will read the same flags

It means, that when you call a function, you are sure that original data is not modified.

mutate keyword and mutating functions

But if you add mutate keyword to a parameter, a passed argument will be mutated. To avoid unexpected mutations, you must specify mutate when calling it, also:

fun increment(mutate x: int) {
    x += 1;
}

// it's correct, simple and straightforward
var origX = 0;
increment(mutate origX);  // origX becomes 1

// these are compiler errors
increment(origX);         // error, unexpected mutation
increment(10);            // error, not lvalue
origX.increment();        // error, not a method, unexpected mutation
val constX = getSome();
increment(mutate constX); // error, it's immutable, since `val`

Same for slices and any other types:

fun readFlags(mutate cs: slice) {
    return cs.loadInt(32);
}

val flags = readFlags(mutate msgBody);
// msgBody.loadInt(32) will read the next integer

It's a generalization. A function may have several mutate parameters:

fun incrementXY(mutate x: int, mutate y: int, byValue: int) {
    x += byValue;
    y += byValue;
}

incrementXY(mutate origX, mutate origY, 10);   // both += 10

You may ask — is it just passing by reference? It effectively is, but since "ref" is an overloaded term in TON (cells and slices have refs), a keyword mutate was chosen.

self parameter turning a function into a method

When a first parameter is named self, it emphasizes that a function (still a global one) is a method and should be called via dot.

fun assertNotEq(self: int, throwIfEq: int) {  
    if (self == throwIfEq) {  
        throw 100;
    }
}

someN.assertNotEq(10);
10.assertNotEq(10);      // also ok, since self is not mutating
assertNotEq(someN, 10);  // still allowed (but not recommended)

self, without mutate, is immutable (unlike all other parameters). Think of it like "read-only method".

fun readFlags(self: slice) {
    return self.loadInt(32);  // error, modifying immutable variable
}

fun preloadInt32(self: slice) {
    return self.preloadInt(32);  // ok, it's a read-only method
}

Combining mutate and self, we get mutating methods.

mutate self is a method, called via dot, mutating an object

As follows:

fun readFlags(mutate self: slice) {
    return self.loadInt(32);
}

val flags = msgBody.readFlags(); // pretty obvious

fun increment(mutate self: int) {
    self += 1;
}

var origX = 10;
origX.increment();    // 11
10.increment();       // error, not lvalue

// even this is possible
fun incrementWithY(mutate self: int, mutate y: int, byValue: int) {  
    self += byValue;
    y += byValue;  
}

origX.incrementWithY(mutate origY, 10);   // both += 10

If you take a look into stdlib, you'll notice, that lots of functions are actually mutate self, meaning they are methods, modifying an object. Tuples, dictionaries, and so on. In FunC, they were usually called via tilda.

@pure
fun tuplePush<X>(mutate self: tuple, value: X): void  
    asm "TPUSH";

t.tuplePush(1);

return self makes a method chainable

Exactly like return self in Python or return this in JavaScript. That's what makes methods like storeInt() and others chainable.

fun storeInt32(mutate self: builder, x: int): self {
    self.storeInt(x, 32);
    return self;

    // this would also work as expected (the same Fift code)
    // return self.storeInt(x, 32);
}

var b = beginCell().storeInt(1, 32).storeInt32(2).storeInt(3, 32);
b.storeInt32(4);     // works without assignment, since mutates b
b = b.storeInt32(5); // and works with assignment, since also returns

Pay attention to the return type, it's self. Currently, you should specify it. Being left empty, compilation will fail. Probably, in the future it would be correct.

mutate self and asm functions

While it's obvious for user-defined functions, one could be interested, how to make an asm function with such behavior? To answer this question, we should look under the hood, how mutation works inside the compiler.

When a function has mutate parameters, it actually implicitly returns them, and they are implicitly assigned to arguments. It's better by example:

// actually returns (int, void)
fun increment(mutate x: int): void { ... }

// actually does: (x', _) = increment(x); x = x'
increment(mutate x);  

// actually returns (int, int, (slice, cell))
fun f2(mutate x: int, mutate y: int): (slice, cell) { ... }

// actually does: (x', y', r) = f2(x, y); x = x'; y = y'; someF(r)
someF(f2(mutate x, mutate y));

// when `self`, it's exactly the same
// actually does: (cs', r) = loadInt(cs, 32); cs = cs'; flags = r
flags = cs.loadInt(32);

So, an asm function should place self' onto a stack before its return value:

// "TPUSH" pops (tuple) and pushes (tuple')
// so, self' = tuple', and return an empty tensor
// `void` is a synonym for an empty tensor
fun tuplePush<X>(mutate self: tuple, value: X): void  
    asm "TPUSH";

// "LDU" pops (slice) and pushes (int, slice')
// with asm(-> 1 0), we make it (slice', int)
// so, self' = slice', and return int
fun loadMessageFlags(mutate self: slice): int  
    asm(-> 1 0) "4 LDU";

Note, that to return self, you don't have to do anything special, just specify a return type. Compiler will do the rest.

// "STU" pops (int, builder) and pushes (builder')
// with asm(op self), we put arguments to correct order
// so, self' = builder', and return an empty tensor
// but to make it chainable, `self` instead of `void`
fun storeMessageOp(mutate self: builder, op: int): self  
    asm(op self) "32 STU";

It's very unlikely you'll have to do such tricks. Most likely, you'll just write wrappers around existing functions:

// just do like this, without asm, it's the same effective

@inline
fun myLoadMessageFlags(mutate self: slice): int {
    return self.loadUint(4);
}

@inline
fun myStoreMessageOp(mutate self: builder, flags: int): self {
    return self.storeUint(32, flags);
}

Do I need @inline for simple functions/methods?

For now, better do it, yes. In most examples above, @inline was omitted for clarity. Currently, without @inline, it will be a separate TVM continuation with jumps in/out. With @inline, a function will be generated, but inlined by Fift (like inline specifer in FunC).

In the future, Tolk will automatically detect simple functions and perform a true inlining by itself, on AST level. Such functions won't be even codegenerated to Fift. The compiler would decide, better than a human, whether to inline, to make a ref, etc. But it will take some time for Tolk to become so smart :) For now, please specify the @inline attribute.

But self is not a method, it's still a function! I feel like I've been cheated

Absolutely. Like FunC, Tolk has only global functions (as of v0.6). There are no classes / structures with methods. There are no methods hash() for slice and hash() for cell. Instead, there are functions sliceHash() and cellHash(), which can be called either like functions or by dot (preferred):

fun f(s: slice, c: cell) {
    // not like this
    s.hash();  
    c.hash();
    // but like this
    s.sliceHash();
    c.cellHash();
    // since it's the same as
    sliceHash(s);
    cellHash(s);
}

In the future, after a giant work on the type system, having fully refactored FunC kernel inside, Tolk might have an ability of declaring structures with real methods, generalized enough for covering built-in types. But it will take a long journey to follow.

Tolk vs FunC gas consumption

TLDR: Tolk gas consumption could be a bit higher, because it fixes unexpected arguments shuffling in FunC. It's negligible in practice. In the future, Tolk compiler will become smart enough to reorder arguments targeting less stack manipulations, but still avoiding a shuffling problem.

FunC compiler could unexpectedly shuffle arguments when calling an assembly function:

some_asm_function(f1(), f2());

Sometimes, f2() could be called before f1(), and it's unexpected. To fix this behavior, one could specify #pragma compute-asm-ltr, forcing arguments to be always evaluated in ltr-order. This was experimental, and therefore turned off by default.

This pragma reorders arguments on a stack, often leading to more stack manipulations than without it. In other words, in fixes unexpected behavior, but increases gas consumption.

Tolk puts arguments onto a stack exactly the same as if this pragma turned on. So, its gas consumption is sometimes higher than in FunC if you didn't use this pragma. Of course, there is no shuffling problem in Tolk.

In the future, Tolk compiler will become smart enough to reorder arguments targeting less stack manipulations, but still avoiding a shuffling problem.

Some technical details

Here I keep a list of points not seen by a user's eye, but related to implementation.

  • Tolk compiler is a fork of FunC compiler; literally: the first commit is copying all FunC sources renaming "FunC" to "Tolk".
  • It means, that all FunC intelligence and complexity (and probable bugs, huh) are also a part of Tolk.
  • Tolk still outputs Fift code, Fift compiler is assumed to be invoked afterward.
  • All compiler sources are in {repo}/tolk. All tests are in {repo}/tolk-tester.
  • I have fully rewritten everything about lexing (see lexer.cpp), it's not unified with TL/B. Spaces in .tolk files are not mandatory (2+2 is 4, identifiers are alpha-numeric), lexing works based on Trie. A new lexer is faster than an old one (though lexing part is negligible in all the process, of course).
  • Tolk has an AST representation of source code. In FunC, there is no AST: while lexing, symbols are registered, types are inferred, and so on. There is no way to perform any more or less semantic analysis. In Tolk, I've implemented parsing .tolk files into AST at first, and then converting this AST into legacy representation (Expr/Op). Consider ast.h for comments.
  • Lots of sources after transforming AST to Expr/Op are unchanged (CodeBlob, StackTransform, etc.), I name it "legacy". In the future, more and more code analysis will be moved out of legacy to AST-level.
  • Mutating functions are a generalization of FunC tilda functions, but they are successfully converted to Expr/Op representation.
  • All C++ global variables spread over FunC sources are combined into CompilerState G, the only C++ global variable in Tolk compiler. See compiler-state.h.
  • Type system remains unchanged, even te_ForAll (just having new <T> syntax), but it will be resonsidered some day.
  • Asm.fif was not modified. Tolk entrypoint is onInternalMessage (not recv_internal), but whereas method_id for recv_internal is generated by Fift, method_id for onInternalMessage is generated by Tolk compiler itself with DECLMETHOD in fif output.
  • Logical operators && || are expressed as ternary expressions: a && ba ? !!b : 0, a || ba ? -1 : !!b, later generated as IFJMP to Fift. For simple cases, codegeneration could avoid jumps, but I had no time for optimizing it. So, logical operators exist and work, but not gas-optimal in simple cases. To be improved in future releases.
  • Tolk stdlib is split into multiple files, see crypto/smartcont/tolk-stdlib/ folder. It's placed there, because smartcont is copied as-is into apt packages.
  • The first thing Tolk compiler does on start is locating stdlib folder (the goal is to make stdlib a part of distribution, not be downloaded from GitHub). It works by searching in predefined paths relative to an executable binary. For example, if the user launches Tolk compiler from a package installed (e.g. /usr/bin/tolk), locate stdlib in /usr/share/ton/smartcont. When it's built from sources (e.g. ~/ton/cmake-build-debug/tolk/tolk), check the ~/ton/crypto/smartcont folder. If a user has non-standard installation, he may pass TOLK_STDLIB env variable. It's standard practice for compilers, though it could be a bit simplified if we used CPack. See tolk-main.cpp.
  • WASM wrapper also exists, see tolk-wasm.cpp. It's similar to funcfiftlib, but supports more options. A GitHub repo tolk-js is a npm package with wasm distribution. It also contains stdlib. So, when a user takes tolk-js or blueprint, all stdlib functions are still available out of the box.
  • Tolk has no dependency on ton_block and ton_crypto CMake targets.
  • Tolk binary is more lightweight than a FunC one, same for wasm.
  • Tolk compiler has a rich testing framework and contains more than 100 tests for now.

A framework for testing Tolk compiler

In FunC, there is an auto-tests folder with several .fc files, specifying provided input and expected output. For example:

... some FunC code

{-    
    method_id | in            | out
TESTCASE | 0  | 1 1 1 -1 10 6 | 8 2
-}

There is a run_tests.py which traverses each file in a folder, detects such lines from comments, compiles to fif, and executes every testcase, comparing output.

It is okay, it works, but... This framework is very-very poor. I am speaking not about the amount of tests, but what exactly we can test using such possibilities.

For example, as a compiler developer, I want to implement functions inlining:

fun myCell() { return beginCell(); }  // want to test it's inlined

...
myCell()...  // that this call is replaced by beginCell()

But even without inlining, all tests for input-output will work :) Because what we really want to test, it that

  1. myCell() is not codegenerated (no DECLPROC and similar)
  2. usages of myCell() are replaced with NEWC (not CALLDICT)

None of these cases could be explained in terms of input-output.

I have fully rewritten an internal testing framework and added lots of capabilities to it. Let's look though.

@compilation_should_fail — checks that compilation fails, and it's expected (this is called "negative tests").
@stderr — checks, when compilation fails, that stderr (compilation error) is expected.
Example:

fun main(s: auto) {  
  var (z, t) = ;  
  
/**  
@compilation_should_fail  
@stderr expected <expression>, got `;`  
@stderr var (z, t) = ;  
*/

@fif_codegen — checks that contents of compiled.fif matches the expected pattern.
@fif_codegen_avoid — checks that it does not match the pattern.
The pattern is a multiline piece of fift code, optionally with "..." meaning "any lines here". It may contain //stack_comments, they will also be checked.
Example:

... some Tolk code

/**
@fif_codegen
"""
test1 PROC:<{  
  //  
  NEWC        //  _5  
  ENDC        //  varName  
  NEWC        //  varName _8
  ...
  TRIPLE      //  _27
}>
"""

@fif_codegen_avoid DECLPROC myCell
*/

@code_hash — checks that hash of compiled output.fif matches the provided value. It's used to "record" code boc hash and to check that it remains the same on compiler modifications. Being much less flexible than @fif_codegen, it nevertheless gives a guarantee of bytecode stability.
Example:

... some Tolk code

/**
@code_hash 13830542019509784148027107880226447201604257839069192762244575629978154217223
*/

Of course, different tags can be mixed up in a single file: multiple @testcase, multiple @fif_codegen, etc.

Also, I've implemented tolk-tester.js, fully in sync with tolk-tester.py. It means, that now we can test fif codegen, compilation errors and so on for WASM also.

Consider tolk-tester/ folder for an implementation and coverage.

Moreover, I've downloaded sources of 300 verified FunC contracts from verifier.ton.org, converted them to Tolk, and written a tool to launch Tolk compiler on a whole database after every commit. That makes me sure that all future changes in the compiler won't break compilation of "flagship" codebase, and when Fift output is changed, I look through to ensure that changes are expected. That codebase lives outside of ton-blockchain repository.

Tolk roadmap

The first released version of Tolk will be v0.6, emphasizing missing FunC v0.5.

Here are some (yet not all and not ordered in any way) points to be investigated:

  • type system improvements: boolean type, nullability, dictionaries
  • structures, with auto-packing to/from cells, probably integrated with message handlers
  • structures with methods, probably generalized to cover built-in types
  • some integrations with TL scheme, either syntactical or via code generation
  • human-readable compiler errors
  • easier messages sending
  • better experience for common use-cases (jettons, nft, etc.)
  • gas and stack optimizations, AST inlining
  • extending and maintaining stdlib
  • think about some kind of ABI (how explorers "see" bytecode)
  • think about gas and fee management in general

Note, that most of the points above are a challenge to implement. At first, FunC kernel must be fully refactored to "interbreed" with abilities it was not designed for.

Also, I see Tolk evolution partially guided by community needs. It would be nice to talk to developers who have created interconnected FunC contracts, to absorb their pain points and discuss how things could be done differently.

What fate awaits FunC?

We decided to leave FunC untouched. Carved in stone, exactly the same as visioned by Dr. Nikolay Durov. If critical bugs are found, they would be fixed, of course. But active development is not planned. All contracts written in FunC will continue working, obviously. And FunC will forever be available for use.

Since Tolk allows doing literally the same as FunC, all newcomers will be onboarded to Tolk.

In 2025, FunC will be officially deprecated to avoid confusion.

Tooling around Tolk Language

Sources of the Tolk compiler are a part of the ton-blockchain repo. Besides the compiler, we have:

  1. Documentation in a separate repo.
  2. tolk-js — a WASM wrapper for Tolk compiler.
  3. JetBrains IDE plugin supports Tolk besides FunC, Fift, TL/B, and Tact.
  4. VS Code Extension enabling Tolk Language support.
  5. Converter from FunC to Tolk — convert a .fc file to a .tolk file with a single npx command.
  6. Tolk Language is available in blueprint

The Tolk Language will be positioned as "next-generation FunC".
It's literally a fork of a FunC compiler,
introducing familiar syntax similar to TypeScript,
but leaving all low-level optimizations untouched.

Note, that FunC sources are partially stored
in the parser/ folder (shared with TL/B).
In Tolk, nothing is shared.
Everything from parser/ is copied into tolk/ folder.
All changes from PR "FunC v0.5.0":
#1026

Instead of developing FunC, we decided to fork it.
BTW, the first Tolk release will be v0.6,
a metaphor of FunC v0.5 that missed a chance to occur.
As it turned out, PSTRING() created a buffer of 128K.
If asm_code exceeded this buffer, it was truncated.
I've just dropped PSTRING() from there in favor of std::string.
A new lexer is noticeably faster and memory efficient
(although splitting a file to tokens is negligible in a whole pipeline).

But the purpose of rewriting lexer was not just to speed up,
but to allow writing code without spaces:
`2+2` is now 4, not a valid identifier as earlier.

The variety of symbols allowed in identifier has greatly reduced
and is now similar to other languages.

SrcLocation became 8 bytes on stack everywhere.

Command-line flags were also reworked:
- the input for Tolk compiler is only a single file now, it's parsed, and parsing continues while new #include are resolved
- flags like -A -P and so on are no more needed, actually
Several related changes:
- stdlib.tolk is embedded into a distribution (deb package or tolk-js),
  the user won't have to download it and store as a project file;
  it's an important step to maintain correct language versioning
- stdlib.tolk is auto-included, that's why all its functions are
  available out of the box
- strict includes: you can't use symbol `f` from another file
  unless you've #include'd this file
- drop all C++ global variables holding compilation state,
  merge them into a single struct CompilerState located at
  compiler-state.h; for instance, stdlib filename is also there
Now, the whole .tolk file can be loaded as AST tree and
then converted to Expr/Op.
This gives a great ability to implement AST transformations.
In the future, more and more code analysis will be moved out of legacy to AST-level.
Since I've implemented AST, now I can drop forward declarations.
Instead, I traverse AST of all files and register global symbols
(functions, constants, global vars) as a separate step, in advance.

That's why, while converting AST to Expr/Op, all available symbols are
already registered.
This greatly simplifies "intermediate state" of yet unknown functions
and checking them afterward.

Redeclaration of local variables (inside the same scope)
is now also prohibited.
Lots of changes, actually. Most noticeable are:
- traditional //comments
- #include -> import
- a rule "import what you use"
- ~ found -> !found (for -1/0)
- null() -> null
- is_null?(v) -> v == null
- throw is a keyword
- catch with swapped arguments
- throw_if, throw_unless -> assert
- do until -> do while
- elseif -> else if
- drop ifnot, elseifnot
- drop rarely used operators

A testing framework also appears here. All tests existed earlier,
but due to significant syntax changes, their history is useless.
- split stdlib.tolk into multiple files (tolk-stdlib/ folder)
  (the "core" common.tolk is auto-imported, the rest are
  needed to be explicitly imported like "@stdlib/tvm-dicts.tolk")
- all functions were renamed to long and clear names
- new naming is camelCase
This is a very big change.
If FunC has `.methods()` and `~methods()`, Tolk has only dot,
one and only way to call a `.method()`.
A method may mutate an object, or may not.
It's a behavioral and semantic difference from FunC.

- `cs.loadInt(32)` modifies a slice and returns an integer
- `b.storeInt(x, 32)` modifies a builder
- `b = b.storeInt()` also works, since it not only modifies, but returns
- chained methods also work, they return `self`
- everything works exactly as expected, similar to JS
- no runtime overhead, exactly same Fift instructions
- custom methods are created with ease
- tilda `~` does not exist in Tolk at all
Instead on 'ton_crypto', Tolk now depends on 'ton_crypto_core'.
The only purpose of ton_crypto (in FunC also, btw) is address parsing:
"EQCRDM9...", "0:52b3..." and so on.
Such parsing has been implemented manually exactly the same way.
Unary logical NOT was already implemented earlier.
Logical AND OR are expressed via conditional expression:
* a && b  ->  a ? (b != 0) : 0
* a || b  ->  a ? 1 : (b != 0)
They work as expected in any expressions. For instance, having
`cond && f()`, f is called only if cond is true.
For primitive cases, like `a > 0 && b > 0`, Fift code is not optimal,
it could potentially be without IFs.
These are moments of future optimizations. For now, it's more than enough.
@tolk-vm tolk-vm changed the title Tolk v0.6.0 Tolk Language: next-generation FunC Nov 2, 2024
@EmelyanenkoK EmelyanenkoK merged commit 7151ff2 into master Nov 2, 2024
14 checks passed
Copy link

@Cabdulaahi6649 Cabdulaahi6649 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solana

@OGROYALRAY
Copy link

Tolk is a new language for writing smart contracts in TON. Think of Tolk as the "next‑generation FunC". Tolk compiler is literally a fork of FunC compiler, introducing familiar syntax similar to TypeScript, but leaving all low-level optimizations untouched.

Motivation behind Tolk

FunC is awesome. It is really low-level and encourages a programmer to think about compiler internals. It gives full control over TVM assembler, allowing a programmer to make his contract as effective as possible. If you get used to it, you love it.

But there is a problem. FunC is "functional C", and it's for ninja. If you are keen on Lisp and Haskell, you'll be happy. But if you are a JavaScript / Go / Kotlin developer, its syntax is peculiar for you, leading to occasional mistakes. A struggle with syntax may decrease your motivation for digging into TON.

Imagine, what if there was a language, also smart, also low-level, but not functional and not like C? Leaving all beauty and complexity inside, what if it would be more similar to popular languages at first glance?

That's what Tolk is about.

Meaning of the name "Tolk"

I'll update this section after announcing Tolk on TON Gateway.

History of Tolk origin

In June 2024, I created a pull request FunC v0.5.0. Besides this PR, I've written a roadmap — what can be enhanced in FunC, syntactically and semantically.

All in all, instead of merging v0.5.0 and continuing developing FunC, we decided to fork it. To leave FunC untouched, as it is. As it always was. And to develop a new language, driven by a fresh and new name.

For several months, I have worked on Tolk privately. I have implemented a giant list of changes. And it's not only about the syntax. For instance, Tolk has an internal AST representation, completely missed in FunC.

On TON Gateway, on 1-2 November in Dubai, I had a speech presenting Tolk to the public, and we released it the same day. Once the video is available, I'll attach it here.

The first version of the Tolk Language is v0.6, a metaphor of FunC v0.5 that missed a chance to occur.

Tolk vs FunC: in short

Tolk is much more similar to TypeScript and Kotlin than to C and Lisp. But it still gives you full control over TVM assembler, since it has a FunC kernel inside.

  1. Functions are declared via fun, get methods via get, variables via var (and val for immutable), putting types on the right; parameter types are mandatory; return type can be omitted (auto inferred), as well as for locals; specifiers inline and others are @ attributes
global storedV: int;

fun parseData(cs: slice): cell {
    var flags: int = cs.loadMessageFlags();
    ...
}

@inline
fun sum(a: int, b: int) {   // auto inferred int
    val both = a + b;       // same
    return both;
}

get currentCounter(): int { ... }
  1. No impure, it's by default, compiler won't drop user function calls

  2. Not recv_internal and recv_external, but onInternalMessage and onExternalMessage

  3. 2+2 is 4, not an identifier; identifiers are alpha-numeric; use naming const OP_INCREASE instead of const op::increase

  4. Logical operators AND &&, OR ||, NOT ! are supported

  5. Syntax improvements:

    • ;; comment// comment
    • {- comment -}/* comment */
    • #includeimport, with a strict rule "import what you use"
    • ~ found!found (for true/false only, obviously) (true is -1, like in FunC)
    • v = null()v = null
    • null?(v)v == null, same for builder_null? and others
    • ~ null?(v)c != null
    • throw(excNo)throw excNo
    • catch(_, _)catch
    • catch(_, excNo)catch(excNo)
    • throw_unless(excNo, cond)assert(cond, excNo)
    • throw_if(excNo, cond)assert(!cond, excNo)
    • return ()return
    • do ... until (cond)do ... while (!cond)
    • elseifelse if
    • ifnot (cond)if (!cond)
  6. A function can be called even if declared below; forward declarations not needed; the compiler at first does parsing, and then it does symbol resolving; there is now an AST representation of source code

  7. stdlib functions renamed to verbose clear names, camelCase style; it's now embedded, not downloaded from GitHub; it's split into several files; common functions available always, more specific available with import "@stdlib/tvm-dicts", IDE will suggest you

  8. No ~ tilda methods; cs.loadInt(32) modifies a slice and returns an integer; b.storeInt(x, 32) modifies a builder; b = b.storeInt() also works, since it not only modifies, but returns; chained methods work identically to JS, they return self; everything works exactly as expected, similar to JS; no runtime overhead, exactly same Fift instructions; custom methods are created with ease; tilda ~ does not exist in Tolk at all

Tooling around:

  • JetBrains plugin exists
  • VS Code extension exists
  • WASM wrapper for blueprint exists
  • Documentation and migration guide exists
  • And even a converter from FunC to Tolk exists

Tolk vs FunC: in detail

A very huge list below. Will anyone have enough patience to read it up to the end?..

✅ Traditional comments :)

FunC Tolk
;; comment // comment
{- multiline comment -} /* multiline comment */

2+2 is 4, not an identifier. Identifiers can only be alpha-numeric

In FunC, almost any character can be a part of identifier. For example, 2+2 (without a space) is an identifier. You can even declare a variable with such a name.

In Tolk, spaces are not mandatory. 2+2 is 4, as expected. 3+~x is 3 + (~ x), and so on.

FunC Tolk
return 2+2; ;; undefined function `2+2` return 2+2; // 4
More precisely, an identifier can start from [a-zA-Z$_] and be continued with [a-zA-Z0-9$_]. Note, that ?, :, and others are not valid symbols, found? and op::increase are not valid identifiers.

You can use backticks to surround an identifier, and then it can contain any symbols (similar to Kotlin and some other langs). Its potential usage is to allow keywords be used as identifiers, in case of code generation by a scheme, for example.

FunC Tolk
const op::increase = 0x1234; const OP_INCREASE = 0x1234;

;; even 2%&!2 is valid
int 2+2 = 5;
// don\'t do like this :)
var \`2+2\` = 5;

✅ Impure by default, compiler won't drop user function calls

FunC has an impure function specifier. When absent, a function is treated as pure. If its result is unused, its call was deleted by the compiler.

Though this behavior is documented, it is very unexpected to newcomers. For instance, various functions that don't return anything (throw an exception on mismatch, for example), are silently deleted. This situation is spoilt by the fact that FunC doesn't check and validate function body, allowing impure operations inside pure functions.

In Tolk, all functions are impure by default. You can mark a function pure with annotation, and then impure operations are forbidden in its body (exceptions, globals modification, calling non-pure functions, etc.).

✅ New functions syntax: fun keyword, @ attributes, types on the right (like in TypeScript, Kotlin, Python, etc.)

FunC Tolk
cell parse_data(slice cs) { } fun parse_data(cs: slice): cell { }
(cell, int) load_storage() { } fun load_storage(): (cell, int) { }
() main() { ... } fun main() { ... }
Types of variables — also to the right:

FunC Tolk
slice cs = ...; var cs: slice = ...;
(cell c, int n) = parse_data(cs); var (c: cell, n: int) = parse_data(cs);
global int stake_at; global stake_at: int;
Modifiers inline and others — with annotations:

FunC Tolk


int f(cell s) inline {
@inline
fun f(s: cell): int {

() load_data() impure inline_ref {
@inline_ref
fun load_data() {

global int stake_at; global stake_at: int;
forall — this way:

FunC Tolk
forall X -> tuple cons(X head, tuple tail) fun cons<X>(head: X, tail: tuple): tuple
asm implementation — like in FunC, but being properly aligned, it looks nicer:

@pure
fun third<X>(t: tuple): X
    asm "THIRD";

@pure
fun iDictDeleteGet(dict: cell, keyLen: int, index: int): (cell, slice, int)
    asm(index dict keyLen) "DICTIDELGET NULLSWAPIFNOT";

@pure
fun mulDivFloor(x: int, y: int, z: int): int
    builtin;

There is also a @deprecated attribute, not affecting compilation, but for a human and IDE.

get instead of method_id

In FunC, method_id (without arguments) actually declared a get method. In Tolk, you use a straightforward syntax:

FunC Tolk
int seqno() method_id { ... } get seqno(): int { ... }
Both get methodName() and get fun methodName() are acceptable.

For method_id(xxx) (uncommon in practice, but valid), there is an attribute:

FunC Tolk

() after_code_upgrade(cont old_code) 
              impure method_id(1666)
@method_id(1666)
fun afterCodeUpgrade(oldCode: continuation)

✅ It's essential to declare types of parameters (though optional for locals)

// not allowed
fun do_smth(c, n)
// types are mandatory
fun do_smth(c: cell, n: int)

There is an auto type, so fun f(a: auto) is valid, though not recommended.

If parameter types are mandatory, return type is not (it's often obvious of verbose). If omitted, it means auto:

fun x() { ... }  // auto infer return

For local variables, types are also optional:

var i = 10;                      // ok, int
var b = beginCell();             // ok, builder
var (i, b) = (10, beginCell());  // ok, two variables, int and builder

// types can be specified manually, of course:
var b: builder = beginCell();
var (i: int, b: builder) = (10, beginCell());

✅ Variables are not allowed to be redeclared in the same scope

var a = 10;
...
var a = 20;  // error, correct is just `a = 20`
if (1) {
    var a = 30;  // it's okay, it's another scope
}

As a consequence, partial reassignment is not allowed:

var a = 10;
...
var (a, b) = (20, 30);  // error, releclaration of a

Note, that it's not a problem for loadUint() and other methods. In FunC, they returned a modified object, so a pattern var (cs, int value) = cs.load_int(32) was quite common. In Tolk, such methods mutate an object: var value = cs.loadInt(32), so redeclaration is unlikely to be needed.

fun send(msg: cell) {
    var msg = ...;  // error, redeclaration of msg

    // solution 1: intruduce a new variable
    var msgWrapped = ...;
    // solution 2: use `redef`, though not recommended
    var msg redef = ...;

✅ Changes in the type system

Type system in the first Tolk release is the same as in FunC, with the following modifications:

  • void is effectively an empty tensor (more canonical to be named unit, but void is more reliable); btw, return (without expression) is actually return (), a convenient way to return from void functions
fun setContractData(c: cell): void
    asm "c4 POP";
  • auto mean "auto infer"; in FunC, _ was used for that purpose; note, that if a function doesn't specify return type, it's auto, not void
  • self, to make chainable methods, described below; actually it's not a type, it can only occur instead of return type of a function
  • cont renamed to continuation

✅ Another naming for recv_internal / recv_external

fun onInternalMessage
fun onExternalMessage
fun onTickTock
fun onSplitPrepare
fun onSplitInstall

All parameter types and their order rename the same, only naming is changed. fun main is also available.

✅ #include → import. Strict imports

FunC Tolk
#include "another.fc"; import "another.tolk"
In Tolk, you can not used a symbol from a.tolk without importing this file. In other words, "import what you use".

All stdlib functions are available out of the box, downloading stdlib and #include "stdlib.fc" is not needed. See below about embedded stdlib.

There is still a global scope of naming. If f is declared in two different files, it's an error. We "import" a whole file, no per-file visibility and export keyword is now supported, but probably will be in the future.

✅ #pragma → compiler options

In FunC, "experimental" features like allow-post-modifications were turned on by a pragma in .fc files (leading to problems when some files contain it, some don't). Indeed, it's not a pragma for a file, it's a compilation option.

In Tolk, all pragmas were removed. allow-post-modification and compute-asm-ltr were merged into Tolk sources (as if they were always on in FunC). Instead of pragmas, there is now an ability to pass experimental options.

As for now, there is one experimental option introduced — remove-unused-functions, which doesn't include unused symbols to Fift output.

#pragma version xxx was replaced by tolk xxx (no >=, just a strict version). It's good practice to annotate compiler version you are using. If it doesn't match, Tolk will show a warning.

tolk 0.6

✅ Late symbols resolving. AST representation

In FunC (like in С) you can not access a function declared below:

int b() { a(); }   ;; error
int a() { ... }    ;; since it's declared below

To avoid an error, a programmer should create a forward declaration at first. The reason is that symbols resolving is performed right at the time of parsing.

Tolk compiler separates these two steps. At first it does parsing, and then it does symbol resolving. Hence, a snippet above would not be erroneous.

Sounds simple, but internally, it's a very huge job. To make this available, I've introduced an intermediate AST representation, completely missed in FunC. That's an essential point of future modifications and performing semantic code analisys.

null keyword

Creating null values and checking variables on null looks very pretty now.

FunC Tolk
a = null() a = null
if (null?(a)) if (a == null)
if (~ null?(b)) if (b != null)
if (~ cell_null?(c)) if (c != null)
Note, that it does NOT mean that Tolk language has nullability. No, you can still assign null to an integer variable — like in FunC, just syntactically pleasant. A true nullability will be available someday, after hard work on the type system.

throw and assert keywords

Tolk greatly simplifies working with exceptions.

If FunC has throw(), throw_if(), throw_arg_if(), and the same for unless, Tolk has only two primitives: throw and assert.

FunC Tolk
throw(excNo) throw excNo
throw_arg(arg, excNo) throw (excNo, arg)
throw_unless(excNo, condition) assert(condition, excNo)
throw_if(excNo, condition) assert(!condition, excNo)
Note, that !condition is possible since logical NOT is available, see below.

There is a long (verbose) syntax of assert(condition, excNo):

assert(condition) throw excNo;
// with possibility to include arg to throw

Also, Tolk swaps catch arguments: it's catch (excNo, arg), both optional (since arg is most likely empty).

FunC Tolk
try { } catch (_, _) { } try { } catch { }
try { } catch (_, excNo) { } try { } catch(excNo) { }
try { } catch (arg, excNo) { } try { } catch(excNo, arg) { }

do ... untildo ... while

FunC Tolk
do { ... } until (~ condition); do { ... } while (condition);
do { ... } until (condition); do { ... } while (!condition);
Note, that !condition is possible since logical NOT is available, see below.

✅ Operator precedence became identical to C++ / JavaScript

In FunC, such code if (slices_equal() & status == 1) is parsed as if( (slices_equal()&status) == 1 ). This is a reason of various errors in real-world contracts.

In Tolk, & has lower priority, identical to C++ and JavaScript.

Moreover, Tolk fires errors on potentially wrong operators usage to completely eliminate such errors:

if (flags & 0xFF != 0)

will lead to a compilation error (similar to gcc/clang):

& has lower precedence than ==, probably this code won't work as you expected.  Use parenthesis: either (... & ...) to evaluate it first, or (... == ...) to suppress this error.

Hence, the code should be rewritten:

// either to evaluate it first (our case)
if ((flags & 0xFF) != 0)
// or to emphasize the behavior (not our case here)
if (flags & (0xFF != 0))

I've also added a diagnostic for a common mistake in bitshift operators: a << 8 + 1 is equivalent to a << 9, probably unexpected.

int result = a << 8 + low_mask;

error: << has lower precedence than +, probably this code won't work as you expected.  Use parenthesis: either (... << ...) to evaluate it first, or (... + ...) to suppress this error.

Operators ~% ^% /% ~/= ^/= ~%= ^%= ~>>= ^>>= no longer exist.

✅ Immutable variables, declared via val

Like in Kotlin: var for mutable, val for immutable, optionally followed by a type. FunC has no analogue of val.

val flags = msgBody.loadMessageFlags();
flags &= 1;         // error, modifying an immutable variable

val cs: slice = c.beginParse();
cs.loadInt(32);     // error, since loadInt() mutates an object
cs.preloadInt(32);  // ok, it's a read-only method

Parameters of a function are mutable, but since they are copied by value, called arguments aren't changed. Exactly like in FunC, just to clarify.

fun some(x: int) {
    x += 1;
}

val origX = 0;
some(origX);      // origX remains 0

fun processOpIncrease(msgBody: slice) {
    val flags = msgBody.loadInt(32);
    ...
}

processOpIncrease(msgBody);  // by value, not modified

In Tolk, a function can declare mutate parameters. It's a generalization of FunC ~ tilda functions, read below.

✅ Deprecated command-line options removed

Command-line flags -A, -P, and others, were removed. Default behavior

/path/to/tolk {inputFile}

is more than enough. Use -v to print version and exit. Use -h for all available command-line flags.

Only one input file can be passed, others should be import'ed.

✅ stdlib functions renamed to verbose clear names, camelCase style

All naming in standard library was reconsidered. Now, functions are named using longer, but clear names.

FunC Tolk

cur_lt()
car(l)
get_balance().pair_first()
raw_reserve(count)
dict~idict_add?(...)
dict~udict::delete_get_max()
t~tpush(triple(x, y, z))
s.slice_bits()
~dump(x)
...
getLogicalTime()
listGetHead(l)
getMyOriginalBalance()
reserveToncoinsOnBalance(count)
dict.iDictSetIfNotExists(...)
dict.uDictDeleteLastAndGet()
t.tuplePush([x, y, z])
s.getRemainingBitsCount()
debugPrint(x)
...

A former "stdlib.fc" was split into multiple files: common.tolk, tvm-dicts.tolk, and others.

✅ stdlib is now embedded, not downloaded from GitHub

FunC Tolk

  1. Download stdlib.fc from GitHub

  2. Save into your project

  3. #include "stdlib.fc";

  4. Use standard functions

  5. Use standard functions
    In Tolk, stdlib a part of distribution. Standard library is inseparable, since keeping a triple "language, compiler, stdlib" together is the only correct way to maintain release cycle.

It works in such a way. Tolk compiler knows how to locate a standard library. If a user has installed an apt package, stdlib sources were also downloaded and exist on a hard disk, so the compiler locates them by system paths. If a user uses a WASM wrapper, they are provided by tolk-js. And so on.

Standard library is split into multiple files: common.tolk (most common functions), gas-payments.tolk (calculating gas fees), tvm-dicts.tolk, and others. Functions from common.tolk are available always (a compiler implicitly imports it). Other files are needed to be explicitly imported:

import "@stdlib/tvm-dicts"   // ".tolk" optional

...
var dict = createEmptyDict();
dict.iDictSet(...);

Mind the rule "import what you use", it's applied to @stdlib/... files also (with the only exception of "common.tolk").

JetBrains IDE plugin automatically discovers stdlib folder and inserts necessary imports as you type.

✅ Logical operators && ||, logical not !

In FunC, there are only bitwise operators ~ & | ^. Developers making first steps, thinking "okay, no logical, I'll use bitwise in the same manner", often do errors, since operator behavior is completely different:

a & b a && b
sometimes, identical:
0 & X = 0 0 & X = 0
-1 & X = -1 -1 & X = -1
but generally, not:
1 & 2 = 0 1 && 2 = -1 (true)
~ found !found
sometimes, identical:
true (-1) → false (0) -1 → 0
false (0) → true (-1) 0 → -1
but generally, not:
1 → -2 1 → 0 (false)
condition & f() condition && f()
f() is called always f() is called only if condition
condition | f() condition || f()
f() is called always f() is called only if condition is false
Tolk supports logical operators. They behave exactly as you get used to (right column). For now, && and || sometimes produce not optimal Fift code, but in the future, Tolk compiler will become smarter in this case. It's negligible, just use them like in other languages.

FunC Tolk
if (~ found?) if (!found)

if (~ found?) {
    if (cs~load_int(32) == 0) {
        ...
    }
}
if (!found && cs.loadInt(32) == 0) {
    ...
}

ifnot (cell_null?(signatures)) if (signatures != null)
elseifnot (eq_checksum) else if (!eqChecksum)
Keywords ifnot and elseifnot were removed, since now we have logical not (for optimization, Tolk compiler generates IFNOTJMP, btw). Keyword elseif was replaced by traditional else if.

Note, that it does NOT mean that Tolk language has bool type. No, comparison operators still return an integer. A bool type support will be available someday, after hard work on the type system.

Remember, that true is -1, not 1. Both in FunC and Tolk. It's a TVM representation.

✅ No tilda ~ methods, mutate keyword instead

This change is so huge that it's described in a separate section:

Tolk mutate vs FunC ~ tilda functions

TLDR:

  • no ~ tilda methods
  • cs.loadInt(32) modifies a slice and returns an integer
  • b.storeInt(x, 32) modifies a builder
  • b = b.storeInt() also works, since it not only modifies, but returns
  • chained methods work identically to JS, they return self
  • everything works exactly as expected, similar to JS
  • no runtime overhead, exactly same Fift instructions
  • custom methods are created with ease
  • tilda ~ does not exist in Tolk at all

This is a drastic change. If FunC has .methods() and ~methods(), Tolk has only dot, one and only way to call a .method(). A method may mutate an object, or may not. Unlike the list "in short", it's a behavioral and semantic difference from FunC.

The goal is to have calls identical to JS and other languages:

FunC Tolk

int flags = cs~load_uint(32);
var flags = cs.loadUint(32);
(cs, int flags) = cs.load_uint(32);
var flags = cs.loadUint(32);
(slice cs2, int flags) = cs.load_uint(32);
var cs2 = cs;
var flags = cs2.loadUint(32);
slice data = get_data()
             .begin_parse();
int flag = data~load_uint(32);
val flag = getContractData()
           .beginParse()
           .loadUint(32);
dict~udict_set(...);
dict.uDictSet(...);
b~store_uint(x, 32);
b.storeInt(x, 32);
b = b.store_int(x, 32);
b.storeInt(x, 32);

// also works
b = b.storeUint(32);
b = b.store_int(x, 32)
     .store_int(y, 32);
b.storeInt(x, 32)
 .storeInt(y, 32);

// b = ...; also works

In order to make this available, Tolk offers a mutability conception, which is a generalization of what a tilda means in FunC.

By default, all arguments are copied by value (identical to FunC)

fun someFn(x: int) {
    x += 1;
}

var origX = 0;
someFn(origX);  // origX remains 0
someFn(10);     // ok, just int
origX.someFn(); // still allowed (but not recommended), origX remains 0

Same goes for cells, slices, whatever:

fun readFlags(cs: slice) {
    return cs.loadInt(32);
}

var flags = readFlags(msgBody);  // msgBody is not modified
// msgBody.loadInt(32) will read the same flags

It means, that when you call a function, you are sure that original data is not modified.

mutate keyword and mutating functions

But if you add mutate keyword to a parameter, a passed argument will be mutated. To avoid unexpected mutations, you must specify mutate when calling it, also:

fun increment(mutate x: int) {
    x += 1;
}

// it's correct, simple and straightforward
var origX = 0;
increment(mutate origX);  // origX becomes 1

// these are compiler errors
increment(origX);         // error, unexpected mutation
increment(10);            // error, not lvalue
origX.increment();        // error, not a method, unexpected mutation
val constX = getSome();
increment(mutate constX); // error, it's immutable, since `val`

Same for slices and any other types:

fun readFlags(mutate cs: slice) {
    return cs.loadInt(32);
}

val flags = readFlags(mutate msgBody);
// msgBody.loadInt(32) will read the next integer

It's a generalization. A function may have several mutate parameters:

fun incrementXY(mutate x: int, mutate y: int, byValue: int) {
    x += byValue;
    y += byValue;
}

incrementXY(mutate origX, mutate origY, 10);   // both += 10

You may ask — is it just passing by reference? It effectively is, but since "ref" is an overloaded term in TON (cells and slices have refs), a keyword mutate was chosen.

self parameter turning a function into a method

When a first parameter is named self, it emphasizes that a function (still a global one) is a method and should be called via dot.

fun assertNotEq(self: int, throwIfEq: int) {  
    if (self == throwIfEq) {  
        throw 100;
    }
}

someN.assertNotEq(10);
10.assertNotEq(10);      // also ok, since self is not mutating
assertNotEq(someN, 10);  // still allowed (but not recommended)

self, without mutate, is immutable (unlike all other parameters). Think of it like "read-only method".

fun readFlags(self: slice) {
    return self.loadInt(32);  // error, modifying immutable variable
}

fun preloadInt32(self: slice) {
    return self.preloadInt(32);  // ok, it's a read-only method
}

Combining mutate and self, we get mutating methods.

mutate self is a method, called via dot, mutating an object

As follows:

fun readFlags(mutate self: slice) {
    return self.loadInt(32);
}

val flags = msgBody.readFlags(); // pretty obvious

fun increment(mutate self: int) {
    self += 1;
}

var origX = 10;
origX.increment();    // 11
10.increment();       // error, not lvalue

// even this is possible
fun incrementWithY(mutate self: int, mutate y: int, byValue: int) {  
    self += byValue;
    y += byValue;  
}

origX.incrementWithY(mutate origY, 10);   // both += 10

If you take a look into stdlib, you'll notice, that lots of functions are actually mutate self, meaning they are methods, modifying an object. Tuples, dictionaries, and so on. In FunC, they were usually called via tilda.

@pure
fun tuplePush<X>(mutate self: tuple, value: X): void  
    asm "TPUSH";

t.tuplePush(1);

return self makes a method chainable

Exactly like return self in Python or return this in JavaScript. That's what makes methods like storeInt() and others chainable.

fun storeInt32(mutate self: builder, x: int): self {
    self.storeInt(x, 32);
    return self;

    // this would also work as expected (the same Fift code)
    // return self.storeInt(x, 32);
}

var b = beginCell().storeInt(1, 32).storeInt32(2).storeInt(3, 32);
b.storeInt32(4);     // works without assignment, since mutates b
b = b.storeInt32(5); // and works with assignment, since also returns

Pay attention to the return type, it's self. Currently, you should specify it. Being left empty, compilation will fail. Probably, in the future it would be correct.

mutate self and asm functions

While it's obvious for user-defined functions, one could be interested, how to make an asm function with such behavior? To answer this question, we should look under the hood, how mutation works inside the compiler.

When a function has mutate parameters, it actually implicitly returns them, and they are implicitly assigned to arguments. It's better by example:

// actually returns (int, void)
fun increment(mutate x: int): void { ... }

// actually does: (x', _) = increment(x); x = x'
increment(mutate x);  

// actually returns (int, int, (slice, cell))
fun f2(mutate x: int, mutate y: int): (slice, cell) { ... }

// actually does: (x', y', r) = f2(x, y); x = x'; y = y'; someF(r)
someF(f2(mutate x, mutate y));

// when `self`, it's exactly the same
// actually does: (cs', r) = loadInt(cs, 32); cs = cs'; flags = r
flags = cs.loadInt(32);

So, an asm function should place self' onto a stack before its return value:

// "TPUSH" pops (tuple) and pushes (tuple')
// so, self' = tuple', and return an empty tensor
// `void` is a synonym for an empty tensor
fun tuplePush<X>(mutate self: tuple, value: X): void  
    asm "TPUSH";

// "LDU" pops (slice) and pushes (int, slice')
// with asm(-> 1 0), we make it (slice', int)
// so, self' = slice', and return int
fun loadMessageFlags(mutate self: slice): int  
    asm(-> 1 0) "4 LDU";

Note, that to return self, you don't have to do anything special, just specify a return type. Compiler will do the rest.

// "STU" pops (int, builder) and pushes (builder')
// with asm(op self), we put arguments to correct order
// so, self' = builder', and return an empty tensor
// but to make it chainable, `self` instead of `void`
fun storeMessageOp(mutate self: builder, op: int): self  
    asm(op self) "32 STU";

It's very unlikely you'll have to do such tricks. Most likely, you'll just write wrappers around existing functions:

// just do like this, without asm, it's the same effective

@inline
fun myLoadMessageFlags(mutate self: slice): int {
    return self.loadUint(4);
}

@inline
fun myStoreMessageOp(mutate self: builder, flags: int): self {
    return self.storeUint(32, flags);
}

Do I need @inline for simple functions/methods?

For now, better do it, yes. In most examples above, @inline was omitted for clarity. Currently, without @inline, it will be a separate TVM continuation with jumps in/out. With @inline, a function will be generated, but inlined by Fift (like inline specifer in FunC).

In the future, Tolk will automatically detect simple functions and perform a true inlining by itself, on AST level. Such functions won't be even codegenerated to Fift. The compiler would decide, better than a human, whether to inline, to make a ref, etc. But it will take some time for Tolk to become so smart :) For now, please specify the @inline attribute.

But self is not a method, it's still a function! I feel like I've been cheated

Absolutely. Like FunC, Tolk has only global functions (as of v0.6). There are no classes / structures with methods. There are no methods hash() for slice and hash() for cell. Instead, there are functions sliceHash() and cellHash(), which can be called either like functions or by dot (preferred):

fun f(s: slice, c: cell) {
    // not like this
    s.hash();  
    c.hash();
    // but like this
    s.sliceHash();
    c.cellHash();
    // since it's the same as
    sliceHash(s);
    cellHash(s);
}

In the future, after a giant work on the type system, having fully refactored FunC kernel inside, Tolk might have an ability of declaring structures with real methods, generalized enough for covering built-in types. But it will take a long journey to follow.

Tolk vs FunC gas consumption

TLDR: Tolk gas consumption could be a bit higher, because it fixes unexpected arguments shuffling in FunC. It's negligible in practice. In the future, Tolk compiler will become smart enough to reorder arguments targeting less stack manipulations, but still avoiding a shuffling problem.

FunC compiler could unexpectedly shuffle arguments when calling an assembly function:

some_asm_function(f1(), f2());

Sometimes, f2() could be called before f1(), and it's unexpected. To fix this behavior, one could specify #pragma compute-asm-ltr, forcing arguments to be always evaluated in ltr-order. This was experimental, and therefore turned off by default.

This pragma reorders arguments on a stack, often leading to more stack manipulations than without it. In other words, in fixes unexpected behavior, but increases gas consumption.

Tolk puts arguments onto a stack exactly the same as if this pragma turned on. So, its gas consumption is sometimes higher than in FunC if you didn't use this pragma. Of course, there is no shuffling problem in Tolk.

In the future, Tolk compiler will become smart enough to reorder arguments targeting less stack manipulations, but still avoiding a shuffling problem.

Some technical details

Here I keep a list of points not seen by a user's eye, but related to implementation.

  • Tolk compiler is a fork of FunC compiler; literally: the first commit is copying all FunC sources renaming "FunC" to "Tolk".
  • It means, that all FunC intelligence and complexity (and probable bugs, huh) are also a part of Tolk.
  • Tolk still outputs Fift code, Fift compiler is assumed to be invoked afterward.
  • All compiler sources are in {repo}/tolk. All tests are in {repo}/tolk-tester.
  • I have fully rewritten everything about lexing (see lexer.cpp), it's not unified with TL/B. Spaces in .tolk files are not mandatory (2+2 is 4, identifiers are alpha-numeric), lexing works based on Trie. A new lexer is faster than an old one (though lexing part is negligible in all the process, of course).
  • Tolk has an AST representation of source code. In FunC, there is no AST: while lexing, symbols are registered, types are inferred, and so on. There is no way to perform any more or less semantic analysis. In Tolk, I've implemented parsing .tolk files into AST at first, and then converting this AST into legacy representation (Expr/Op). Consider ast.h for comments.
  • Lots of sources after transforming AST to Expr/Op are unchanged (CodeBlob, StackTransform, etc.), I name it "legacy". In the future, more and more code analysis will be moved out of legacy to AST-level.
  • Mutating functions are a generalization of FunC tilda functions, but they are successfully converted to Expr/Op representation.
  • All C++ global variables spread over FunC sources are combined into CompilerState G, the only C++ global variable in Tolk compiler. See compiler-state.h.
  • Type system remains unchanged, even te_ForAll (just having new <T> syntax), but it will be resonsidered some day.
  • Asm.fif was not modified. Tolk entrypoint is onInternalMessage (not recv_internal), but whereas method_id for recv_internal is generated by Fift, method_id for onInternalMessage is generated by Tolk compiler itself with DECLMETHOD in fif output.
  • Logical operators && || are expressed as ternary expressions: a && ba ? !!b : 0, a || ba ? -1 : !!b, later generated as IFJMP to Fift. For simple cases, codegeneration could avoid jumps, but I had no time for optimizing it. So, logical operators exist and work, but not gas-optimal in simple cases. To be improved in future releases.
  • Tolk stdlib is split into multiple files, see crypto/smartcont/tolk-stdlib/ folder. It's placed there, because smartcont is copied as-is into apt packages.
  • The first thing Tolk compiler does on start is locating stdlib folder (the goal is to make stdlib a part of distribution, not be downloaded from GitHub). It works by searching in predefined paths relative to an executable binary. For example, if the user launches Tolk compiler from a package installed (e.g. /usr/bin/tolk), locate stdlib in /usr/share/ton/smartcont. When it's built from sources (e.g. ~/ton/cmake-build-debug/tolk/tolk), check the ~/ton/crypto/smartcont folder. If a user has non-standard installation, he may pass TOLK_STDLIB env variable. It's standard practice for compilers, though it could be a bit simplified if we used CPack. See tolk-main.cpp.
  • WASM wrapper also exists, see tolk-wasm.cpp. It's similar to funcfiftlib, but supports more options. A GitHub repo tolk-js is a npm package with wasm distribution. It also contains stdlib. So, when a user takes tolk-js or blueprint, all stdlib functions are still available out of the box.
  • Tolk has no dependency on ton_block and ton_crypto CMake targets.
  • Tolk binary is more lightweight than a FunC one, same for wasm.
  • Tolk compiler has a rich testing framework and contains more than 100 tests for now.

A framework for testing Tolk compiler

In FunC, there is an auto-tests folder with several .fc files, specifying provided input and expected output. For example:

... some FunC code

{-    
    method_id | in            | out
TESTCASE | 0  | 1 1 1 -1 10 6 | 8 2
-}

There is a run_tests.py which traverses each file in a folder, detects such lines from comments, compiles to fif, and executes every testcase, comparing output.

It is okay, it works, but... This framework is very-very poor. I am speaking not about the amount of tests, but what exactly we can test using such possibilities.

For example, as a compiler developer, I want to implement functions inlining:

fun myCell() { return beginCell(); }  // want to test it's inlined

...
myCell()...  // that this call is replaced by beginCell()

But even without inlining, all tests for input-output will work :) Because what we really want to test, it that

  1. myCell() is not codegenerated (no DECLPROC and similar)
  2. usages of myCell() are replaced with NEWC (not CALLDICT)

None of these cases could be explained in terms of input-output.

I have fully rewritten an internal testing framework and added lots of capabilities to it. Let's look though.

@compilation_should_fail — checks that compilation fails, and it's expected (this is called "negative tests"). @stderr — checks, when compilation fails, that stderr (compilation error) is expected. Example:

fun main(s: auto) {  
  var (z, t) = ;  
  
/**  
@compilation_should_fail  
@stderr expected <expression>, got `;`  
@stderr var (z, t) = ;  
*/

@fif_codegen — checks that contents of compiled.fif matches the expected pattern. @fif_codegen_avoid — checks that it does not match the pattern. The pattern is a multiline piece of fift code, optionally with "..." meaning "any lines here". It may contain //stack_comments, they will also be checked. Example:

... some Tolk code

/**
@fif_codegen
"""
test1 PROC:<{  
  //  
  NEWC        //  _5  
  ENDC        //  varName  
  NEWC        //  varName _8
  ...
  TRIPLE      //  _27
}>
"""

@fif_codegen_avoid DECLPROC myCell
*/

@code_hash — checks that hash of compiled output.fif matches the provided value. It's used to "record" code boc hash and to check that it remains the same on compiler modifications. Being much less flexible than @fif_codegen, it nevertheless gives a guarantee of bytecode stability. Example:

... some Tolk code

/**
@code_hash 13830542019509784148027107880226447201604257839069192762244575629978154217223
*/

Of course, different tags can be mixed up in a single file: multiple @testcase, multiple @fif_codegen, etc.

Also, I've implemented tolk-tester.js, fully in sync with tolk-tester.py. It means, that now we can test fif codegen, compilation errors and so on for WASM also.

Consider tolk-tester/ folder for an implementation and coverage.

Moreover, I've downloaded sources of 300 verified FunC contracts from verifier.ton.org, converted them to Tolk, and written a tool to launch Tolk compiler on a whole database after every commit. That makes me sure that all future changes in the compiler won't break compilation of "flagship" codebase, and when Fift output is changed, I look through to ensure that changes are expected. That codebase lives outside of ton-blockchain repository.

Tolk roadmap

The first released version of Tolk will be v0.6, emphasizing missing FunC v0.5.

Here are some (yet not all and not ordered in any way) points to be investigated:

  • type system improvements: boolean type, nullability, dictionaries
  • structures, with auto-packing to/from cells, probably integrated with message handlers
  • structures with methods, probably generalized to cover built-in types
  • some integrations with TL scheme, either syntactical or via code generation
  • human-readable compiler errors
  • easier messages sending
  • better experience for common use-cases (jettons, nft, etc.)
  • gas and stack optimizations, AST inlining
  • extending and maintaining stdlib
  • think about some kind of ABI (how explorers "see" bytecode)
  • think about gas and fee management in general

Note, that most of the points above are a challenge to implement. At first, FunC kernel must be fully refactored to "interbreed" with abilities it was not designed for.

Also, I see Tolk evolution partially guided by community needs. It would be nice to talk to developers who have created interconnected FunC contracts, to absorb their pain points and discuss how things could be done differently.

What fate awaits FunC?

We decided to leave FunC untouched. Carved in stone, exactly the same as visioned by Dr. Nikolay Durov. If critical bugs are found, they would be fixed, of course. But active development is not planned. All contracts written in FunC will continue working, obviously. And FunC will forever be available for use.

Since Tolk allows doing literally the same as FunC, all newcomers will be onboarded to Tolk.

In 2025, FunC will be officially deprecated to avoid confusion.

Tooling around Tolk Language

Sources of the Tolk compiler are a part of the ton-blockchain repo. Besides the compiler, we have:

  1. Documentation in a separate repo.
  2. tolk-js — a WASM wrapper for Tolk compiler.
  3. JetBrains IDE plugin supports Tolk besides FunC, Fift, TL/B, and Tact.
  4. VS Code Extension enabling Tolk Language support.
  5. Converter from FunC to Tolk — convert a .fc file to a .tolk file with a single npx command.
  6. Tolk Language is available in blueprint

Copy link

@Cabdulaahi6649 Cabdulaahi6649 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solaaana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FunC Related to FunC compiler Tolk Related to Tolk Language / compiler / tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants