Skip to content
This repository has been archived by the owner on Aug 10, 2020. It is now read-only.

draft for new data representation #217

Open
wants to merge 4 commits into
base: source
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions website/blog/2020-05-13-lazy-encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: Enhanced lazy encoding in BuckleScript
---



Recently we made some significant improvements with our new encoding for lazy values, and we find it so exciting that we want to highlight the changes. The new encoding generates very idiomatic JS output like hand-written code.

For people who are not familiar with lazy evaluation, it is documented here: https://ocaml.org/releases/4.10/htmlman/expr.html#sss:expr-lazy.

# What's the difference?

Take this code snippet, for example:

```reasonml
let lazy1 = lazy {
"Hello, lazy" -> Js.log;
1
}; // create a lazy value

let lazy2 = lazy 3 ; // artifical lazy values for demo purpose

Js.log2 (lazy1, lazy2); // logging the lazy values

let (lazy la, lazy lb) = (lazy1, lazy2); // pattern match to force evaluation

Js.log2 (la, lb); // logging forced values
```

Running this code in node, the output is as below:
```bash
lazy_demo$node src/lazy_demo.bs.js
[ [Function], tag: 246 ] 3 # logging the output of two lazy blocks
Hello, lazy # lazy1, laz2 evaluated forced by pattern match, hence logging
1 3 #logging the evaluated lazy block
```

With the new encoding, the output is as below:
```bash
{ RE_LAZY_DONE: false, value: [Function: value] } { RE_LAZY_DONE: true, value: 3 } # logging block one with new encoding
Hello, lazy
1 3
```

As you can see, with the new encoding, no magic tags like 246 appear, and the lazy status is clearly marked via `RE_LAZY_DONE: (true | false) `.

More than that, the generated code quality is also improved. In the old mode, the generated JS code was like this:

```js
var lazy1 = Caml_obj.caml_lazy_make((function (param) {
console.log("Hello, lazy");
return 1;
}));

console.log(lazy1, 3);

var la = CamlinternalLazy.force(lazy1);

var lb = CamlinternalLazy.force(3);

console.log(la, lb);

var lazy2 = 3;
```

In the new mode, it is simplified:
```js
var lazy1 = {
RE_LAZY_DONE: false,
value: (function () { // closure now is uncurried arity-0 function
console.log("Hello, lazy");
return 1;
})
};

var lazy2 = {
RE_LAZY_DONE: true,
value: 3
};

console.log(lazy1, lazy2);

var la = CamlinternalLazy.force(lazy1);

var lb = CamlinternalLazy.force(lazy2);

console.log(la, lb);
```

## What changes did we make?

In native, the encoding of lazy values is rather complicated:

- It is an array, which is not friendly for debugging in JS context.
- It has some special tags which are not meaningful, for example, magic number 246, in JS context.
- It tries to unbox lazy values with the help of native GC. However, such complexity does not pay off in JS since the JSVM does not expose its GC semantics.

So in the master, our encoding scheme is much simplified to take advantage of JS as much as possible:

- The encoding is uniform; it is always an object of two key value pairs. One is `RE_LAZY_DONE` to mark its status,
the other is either a closure or an evaluated value.

- The compiler optimization still kicks in at compile time: if it knows a lazy value is already evaluated or does not need to be evaluated, it will promote its status to be 'done'. However, unlike native, unboxing is not happening. This makes sense since the most interesting unboxing scenario happens in runtime instead of compile time where it is impossible in JSVM.


With the new encoding, `lazy` has a much nicer sugar, and we encourage you to use it whenever it is convenient!

# Caveats:

Don't rely on the special name `RE_LAZY_DONE` for JS interop; we may change it to a symbol in the future.
217 changes: 217 additions & 0 deletions website/blog/2020-6-10-overview-data-representation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
title: An overview of new data representation in BuckleScript
---

In the next version of BuckleScript, we made several major changes to tweak the data representation for various data types, it's more idiomatic and debugger friendly.

Note since V8 or other JavaScript engines are tweaked to make idiomatic JS code running fast, so it also results in faster running code.

Another property, for a compiled language like BuckleScript, is that we can reason about [IC](https://en.wikipedia.org/wiki/Inline_caching) friendliness by just looking at the type definitions locally; this is very helpful for advanced users to write performance reliable JS code, if you are unfamiliar with how IC works in general, feel free to skip the section about IC.

Note this article is quite dense so that we will skip the old encoding.

## Record (stable)

We compile records to idiomatic JS objects since [version 7](https://bucklescript.github.io/blog/2019/11/18/whats-new-in-7), this is great for performance and debugging. We also support label renaming to shorten field names to save sizes.

Take the code below for example:

```reasonml
type int64 = {
loBits : int [@bs.as "lo"],
hiBits : int [@bs.as "hi]
}
let value = {hiBits : 33 , loBits : 32 };
let rand = ({loBits; hiBits}) => loBits + hiBits;
```

It will generate JS output as below:
```js
var value = {lo : 32, hi : 33}
function rand (param){
return param.lo + param.hi
}
```

If users want to make it even shorter, they can choose to compile record as an array as below:

```reasonml
type int64 = {
loBits : int [@bs.as "0"],
hiBits : int [@bs.as "1"]
}
let value = {hiBits : 33 , loBits : 32 };
let rand = ({loBits; hiBits}) => loBits + hiBits;
```

Now output JS as below:

```js
var value = [32,33]
function rand(param){
return param[0] + param[1]
}
```

The label renaming techniques can be applied systematically using a syntactic macro, in the future we may provide an advanced mode to apply it automatically. Another nice property is that only the type definition needs to be adapted, other parts of code is untouched.

### IC friendliness

Record is always in a perfect IC position, the compiler can ensure all generated records are of the same shape

## Variant (internal)
This encoding for variants may be subject to change in the future, but it is so simple that it makes sense for users to have a basic understanding.

Take such type definition for example:

```reasonml
type t =
| Black(t, int, t)
| Red(t, int, t)
| Empty;
let empty = Empty ;
let v0 = Black (empty, 3, empty);
let v1 = Red (empty, 3, empty);
```
The generated JS code would be:

```js
var empty =/*Empty*/ 0;
var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0};
var v1 = {TAG : 1/*Red*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0};
```

As you can see, variants are divided into two categories, the variant which does not have payload is compiled into a number starting from 0, while variants which has the payload is compiled into an object which has the first slot named `TAG` and the following slots named as `_0`, `_1` ..

### variant with inline records

Users can give names to the payload, and the compiler respect it, however, we don't support user level renaming, i.e, using `bs.as`, at this time.

```reasonml
type t =
| Black ({l:t, value: int, r: t})
| Red({l:t, value: int, r: t})
| Empty;
let empty = Empty ;
let v0 = Black ({l:empty, value: 3, r:empty});
let v1 = Red ({l:empty, value:3, r: empty});
```
The generated JS code would be:
```js
var empty =/*Empty*/ 0;
var v0 = {TAG : 0/*Black*/, l: /*Empty*/ 0 , value : 3 , r: : /*Empty */ 0};
var v1 = {TAG : 1/*Red*/, l : /*Empty*/ 0 , value : 3 , r : /*Empty */ 0};
```


### Special case when the number of variants which has payload is only 1.

Take the types below for example:

```reasonml
type list =
| Nil
| Cons (int * list);
```

Since the number of variants which has payload is only 1, the compiler does not need add `TAG` when we destruct the data for pattern matching, so the code below:

```reasonml
let u = Cons(1,Nil)
```
Will generate such JS output:

```js
var u = {_0: 1, _1 : /*Nil*/ 0 }; // No TAG data.
```

### Specialized for immutable list

The `list` type is a built-in type, its type definition is similar to this :

```reasonml
type t ('a) =
| []
| (::) ('a * t ('a))
```

Without any customization, it will generate js objects with indexes like `_0`, `_1`, since list is so pervasive,
we provide some special treatment so that
```reasonml
let u = [0,1,2,3]
```
Will generate js code as below:
```js
var u = {hd : 0, {hd : 1, {hd: 2, {hd :3 , tl : /*[]*/0 }}}}
```

This is a minor change, we changed the name of `_0` to `hd` and `_1` to `tl`.

### IC friendliness

For types whose number of variants which has payload is 1, it will be in a perfect IC position.
The number of variants which does not carry payload will not affect IC, since the pattern match will do a split first.

For types whose number of variants which has the same number of payloads, it will also be in a perfect IC position, like the red-black-tree example above.

For other cases, it will hit a polymorphic IC in the V8 jit compiler, this is not the fastest running case.

Note that user can always tweak the variant layout to make it IC friendly, for example, it can always introduce one level of indirection to make all variants share the same number of payloads:

```reasonml
type t =
| A0 of a0 // 1 paylaod
| A1 of a1 // 1 payload
| A2 of a2 // 1 payload
| C0
| C1
| C2 // This will not affect IC
```

### Variant in debug mode

Note we only generate constructor names in comments for debugging, when constructor names are attached to the data, it will be more useful for debugging. When debug mode is activated using `-bs-g`,

The generated code will be changed from below
```js
var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0};
```
to

```js
var v0 = {TAG : 0, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0, [Symbol.for("name")]: "Black"};
```

## Polymorphic-variant (internal)

Polymorphic variant allows users to use the types without declaring it first:

```reasonml
let u = 3 -> `hello
```

It will generate
```js
var u = {HASH : MAGIC_NUMBER, VAL: 3 }
```
The field of `HASH` is the hash of name `"hello"`, while the `VAL` is the payload

### IC friendliness

Polymorphic variant is always in a perfect IC position, the compiler can ensure all generated records are of the same shape. This is due to that the payload is not unpacked, it is always just one payload


### Polymorphic variant in debug mode

In debug mode, similar to variant, we carry the name in generated code for debugging,

So instead of
```js
var u = {HASH : MAGIC_NUMBER, VAL: 3 }
```

It will generate

```js
var u = {HASH : MAGIC_NUMBER, VAL: 3 , [Symbol.for("name")] : "hello"}
```