diff --git a/website/blog/2020-05-13-lazy-encoding.md b/website/blog/2020-05-13-lazy-encoding.md new file mode 100644 index 000000000..916c677f1 --- /dev/null +++ b/website/blog/2020-05-13-lazy-encoding.md @@ -0,0 +1,110 @@ +--- +title: Enhanced lazy encoding in BuckleScript +--- + + + +Recently we made some significant improvements with our new encoding for lazy values, and we find it so exciting that we want to highlight the changes. The new encoding generates very idiomatic JS output like hand-written code. + +For people who are not familiar with lazy evaluation, it is documented here: https://ocaml.org/releases/4.10/htmlman/expr.html#sss:expr-lazy. + +# What's the difference? + +Take this code snippet, for example: + +```reasonml +let lazy1 = lazy { + "Hello, lazy" -> Js.log; + 1 +}; // create a lazy value + +let lazy2 = lazy 3 ; // artifical lazy values for demo purpose + +Js.log2 (lazy1, lazy2); // logging the lazy values + +let (lazy la, lazy lb) = (lazy1, lazy2); // pattern match to force evaluation + +Js.log2 (la, lb); // logging forced values +``` + +Running this code in node, the output is as below: +```bash +lazy_demo$node src/lazy_demo.bs.js +[ [Function], tag: 246 ] 3 # logging the output of two lazy blocks +Hello, lazy # lazy1, laz2 evaluated forced by pattern match, hence logging +1 3 #logging the evaluated lazy block +``` + +With the new encoding, the output is as below: +```bash +{ RE_LAZY_DONE: false, value: [Function: value] } { RE_LAZY_DONE: true, value: 3 } # logging block one with new encoding +Hello, lazy +1 3 +``` + +As you can see, with the new encoding, no magic tags like 246 appear, and the lazy status is clearly marked via `RE_LAZY_DONE: (true | false) `. + +More than that, the generated code quality is also improved. In the old mode, the generated JS code was like this: + +```js +var lazy1 = Caml_obj.caml_lazy_make((function (param) { + console.log("Hello, lazy"); + return 1; + })); + +console.log(lazy1, 3); + +var la = CamlinternalLazy.force(lazy1); + +var lb = CamlinternalLazy.force(3); + +console.log(la, lb); + +var lazy2 = 3; +``` + +In the new mode, it is simplified: +```js +var lazy1 = { + RE_LAZY_DONE: false, + value: (function () { // closure now is uncurried arity-0 function + console.log("Hello, lazy"); + return 1; + }) +}; + +var lazy2 = { + RE_LAZY_DONE: true, + value: 3 +}; + +console.log(lazy1, lazy2); + +var la = CamlinternalLazy.force(lazy1); + +var lb = CamlinternalLazy.force(lazy2); + +console.log(la, lb); +``` + +## What changes did we make? + +In native, the encoding of lazy values is rather complicated: + +- It is an array, which is not friendly for debugging in JS context. +- It has some special tags which are not meaningful, for example, magic number 246, in JS context. +- It tries to unbox lazy values with the help of native GC. However, such complexity does not pay off in JS since the JSVM does not expose its GC semantics. + +So in the master, our encoding scheme is much simplified to take advantage of JS as much as possible: + +- The encoding is uniform; it is always an object of two key value pairs. One is `RE_LAZY_DONE` to mark its status, +the other is either a closure or an evaluated value. + +- The compiler optimization still kicks in at compile time: if it knows a lazy value is already evaluated or does not need to be evaluated, it will promote its status to be 'done'. However, unlike native, unboxing is not happening. This makes sense since the most interesting unboxing scenario happens in runtime instead of compile time where it is impossible in JSVM. + + +With the new encoding, `lazy` has a much nicer sugar, and we encourage you to use it whenever it is convenient! + +# Caveats: + +Don't rely on the special name `RE_LAZY_DONE` for JS interop; we may change it to a symbol in the future. diff --git a/website/blog/2020-6-10-overview-data-representation.md b/website/blog/2020-6-10-overview-data-representation.md new file mode 100644 index 000000000..0fcb5e479 --- /dev/null +++ b/website/blog/2020-6-10-overview-data-representation.md @@ -0,0 +1,217 @@ +--- +title: An overview of new data representation in BuckleScript +--- + +In the next version of BuckleScript, we made several major changes to tweak the data representation for various data types, it's more idiomatic and debugger friendly. + +Note since V8 or other JavaScript engines are tweaked to make idiomatic JS code running fast, so it also results in faster running code. + +Another property, for a compiled language like BuckleScript, is that we can reason about [IC](https://en.wikipedia.org/wiki/Inline_caching) friendliness by just looking at the type definitions locally; this is very helpful for advanced users to write performance reliable JS code, if you are unfamiliar with how IC works in general, feel free to skip the section about IC. + +Note this article is quite dense so that we will skip the old encoding. + +## Record (stable) + +We compile records to idiomatic JS objects since [version 7](https://bucklescript.github.io/blog/2019/11/18/whats-new-in-7), this is great for performance and debugging. We also support label renaming to shorten field names to save sizes. + +Take the code below for example: + +```reasonml +type int64 = { + loBits : int [@bs.as "lo"], + hiBits : int [@bs.as "hi] +} +let value = {hiBits : 33 , loBits : 32 }; +let rand = ({loBits; hiBits}) => loBits + hiBits; +``` + +It will generate JS output as below: +```js +var value = {lo : 32, hi : 33} +function rand (param){ + return param.lo + param.hi +} +``` + +If users want to make it even shorter, they can choose to compile record as an array as below: + +```reasonml +type int64 = { + loBits : int [@bs.as "0"], + hiBits : int [@bs.as "1"] +} +let value = {hiBits : 33 , loBits : 32 }; +let rand = ({loBits; hiBits}) => loBits + hiBits; +``` + +Now output JS as below: + +```js +var value = [32,33] +function rand(param){ + return param[0] + param[1] +} +``` + +The label renaming techniques can be applied systematically using a syntactic macro, in the future we may provide an advanced mode to apply it automatically. Another nice property is that only the type definition needs to be adapted, other parts of code is untouched. + +### IC friendliness + +Record is always in a perfect IC position, the compiler can ensure all generated records are of the same shape + +## Variant (internal) +This encoding for variants may be subject to change in the future, but it is so simple that it makes sense for users to have a basic understanding. + +Take such type definition for example: + +```reasonml +type t = + | Black(t, int, t) + | Red(t, int, t) + | Empty; +let empty = Empty ; +let v0 = Black (empty, 3, empty); +let v1 = Red (empty, 3, empty); +``` +The generated JS code would be: + +```js +var empty =/*Empty*/ 0; +var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0}; +var v1 = {TAG : 1/*Red*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0}; +``` + +As you can see, variants are divided into two categories, the variant which does not have payload is compiled into a number starting from 0, while variants which has the payload is compiled into an object which has the first slot named `TAG` and the following slots named as `_0`, `_1` .. + +### variant with inline records + +Users can give names to the payload, and the compiler respect it, however, we don't support user level renaming, i.e, using `bs.as`, at this time. + +```reasonml +type t = + | Black ({l:t, value: int, r: t}) + | Red({l:t, value: int, r: t}) + | Empty; +let empty = Empty ; +let v0 = Black ({l:empty, value: 3, r:empty}); +let v1 = Red ({l:empty, value:3, r: empty}); +``` +The generated JS code would be: +```js +var empty =/*Empty*/ 0; +var v0 = {TAG : 0/*Black*/, l: /*Empty*/ 0 , value : 3 , r: : /*Empty */ 0}; +var v1 = {TAG : 1/*Red*/, l : /*Empty*/ 0 , value : 3 , r : /*Empty */ 0}; +``` + + +### Special case when the number of variants which has payload is only 1. + +Take the types below for example: + +```reasonml +type list = + | Nil + | Cons (int * list); +``` + +Since the number of variants which has payload is only 1, the compiler does not need add `TAG` when we destruct the data for pattern matching, so the code below: + +```reasonml +let u = Cons(1,Nil) +``` +Will generate such JS output: + +```js +var u = {_0: 1, _1 : /*Nil*/ 0 }; // No TAG data. +``` + +### Specialized for immutable list + +The `list` type is a built-in type, its type definition is similar to this : + +```reasonml +type t ('a) = + | [] + | (::) ('a * t ('a)) +``` + +Without any customization, it will generate js objects with indexes like `_0`, `_1`, since list is so pervasive, +we provide some special treatment so that +```reasonml +let u = [0,1,2,3] +``` +Will generate js code as below: +```js +var u = {hd : 0, {hd : 1, {hd: 2, {hd :3 , tl : /*[]*/0 }}}} +``` + +This is a minor change, we changed the name of `_0` to `hd` and `_1` to `tl`. + +### IC friendliness + +For types whose number of variants which has payload is 1, it will be in a perfect IC position. +The number of variants which does not carry payload will not affect IC, since the pattern match will do a split first. + +For types whose number of variants which has the same number of payloads, it will also be in a perfect IC position, like the red-black-tree example above. + +For other cases, it will hit a polymorphic IC in the V8 jit compiler, this is not the fastest running case. + +Note that user can always tweak the variant layout to make it IC friendly, for example, it can always introduce one level of indirection to make all variants share the same number of payloads: + +```reasonml +type t = + | A0 of a0 // 1 paylaod + | A1 of a1 // 1 payload + | A2 of a2 // 1 payload + | C0 + | C1 + | C2 // This will not affect IC +``` + +### Variant in debug mode + +Note we only generate constructor names in comments for debugging, when constructor names are attached to the data, it will be more useful for debugging. When debug mode is activated using `-bs-g`, + +The generated code will be changed from below +```js +var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0}; +``` +to + +```js +var v0 = {TAG : 0, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0, [Symbol.for("name")]: "Black"}; +``` + +## Polymorphic-variant (internal) + +Polymorphic variant allows users to use the types without declaring it first: + +```reasonml +let u = 3 -> `hello +``` + +It will generate +```js +var u = {HASH : MAGIC_NUMBER, VAL: 3 } +``` +The field of `HASH` is the hash of name `"hello"`, while the `VAL` is the payload + +### IC friendliness + +Polymorphic variant is always in a perfect IC position, the compiler can ensure all generated records are of the same shape. This is due to that the payload is not unpacked, it is always just one payload + + +### Polymorphic variant in debug mode + +In debug mode, similar to variant, we carry the name in generated code for debugging, + +So instead of +```js +var u = {HASH : MAGIC_NUMBER, VAL: 3 } +``` + +It will generate + +```js +var u = {HASH : MAGIC_NUMBER, VAL: 3 , [Symbol.for("name")] : "hello"} +``` \ No newline at end of file