-
D-Expressions:
- "Token macros become, in essence, part of an ad-hoc recursive descent parser and programmers must be aware of the implications of that. They must always take care to respect any macro syntax contained within the subexpressions manipulated because those subexpressions are not isolated within the structure."
- This quote is very important given that I started the implementation of the Pratt parser in which the handlers only have access to the token stream. This may be a major design limitation. @to_review
-
Map:
- "∘ and ∙ are aliases for map and contramap respectively. Though, I only use those aliases for Functor instance of a function ((->) a), therefore I call them compose."
-
Evaluation precedence: the precedence of function application and algebraic expression must be carefully thought to minimize the need for parentheses. For example in OCaml even simple addition has lower precedence than application, which results in the restriction shown in the following code:
# print_int 2 + 2 ^ ~~~~~~~~~ Error: This expression has type unit but an expression was expected of type int
- Macros as monads. Or code macro.
-
Agda-style mixfix operators can be used to define the core language. So for example the simple conditional operation could be defined as:
if : Bool -> then : a -> else : a -> a if true then t else f = t if false then t else f = f
- Create a dynamic poll with language features and the syntax.
- Organize the language library (lang.meta) by categories and for each one create a set of code examples with variants. Include a up-vote and down-vote button and a field for a small rationale (e.g. too long, too confusing, cryptic.), maybe even include a set of predefined options.
Prefix !
will be used to denote the scope of the pure lambda expression.
-> map !(~ / 2) [10, 20, 40, 30, 100]
= [5, 10, 20, 15, 50] :: List Int
Instead of a more verbose and redundant:
-> map (x -> x / 2) [10, 20, 40, 30, 100]
= [5, 10, 20, 15, 50] :: List Int
The same !
operator is used in postfix form to call nullary functions and to force application where ambiguity with optional arguments exists.
-> hello (name = "World") = print "Hello, ~!" % name
-> hello!
"Hello, World!"
The standard solution to this problem is to use the unit argument to denote the end of application. But this reminds me of the NULL
pointer convention used in C, which is fragile and confusing.
Another option for the prefix denotation is to use ~>
. This alternative has an advantage of being consistent with the label operator ~
and the regular lambda syntax:
-> map ~>(~ / 2) [10 20 40 30 100]
= [5 10 20 15 50] :: List[Int]
But the !
is favoured for being more compact and because of its usage as application forcing operator. Also this way the language will have less arrow operators.
Haskell programmers try to avoid parentheses because their usage in their language is inconsistent. In regular imperative programming languages you always use parentheses to invoke a function, while in Haskell they are only required to force the precedence of the expression. When you think about programming in general, precedence handling is the last think you want to worry about while solving your problem. The consistent usage of parentheses (or generally any common syntax) became almost automatic and you stop noticing their presence. And this is a fact since all Lisp programmers say they can see through all the parentheses, and when they write them they don't think they are writing yet another parentheses in their code. Since in Haskell you usually start by writing without parentheses, when you need a pair of them your writing flow is interrupted and you need to start thinking about operator precedence and explicitly delimit your expression. Which, if you think about it, magnified by the number of times you do it, is really annoying. That's why $
is so common in Haskell, because you write it to just tell the compiler that you want your expression to start exactly here and span to the end of line. This reduces your effort by half, because you don't need to inform the compiler where the expression ends.
How to handle sequences of expressions?
A program is a sequence of expressions, and a sequence is itself an expression.
type expression =
| Sequence of expression list
...
-
The sequence can be explicitly built with the binary semi-colon operator:
a; b; 2 + 2
. -
On the other hand, a sequence operator is automatically inserted on a newline, except when a newline occurs in the middle of another expression.
a = 2 + 2 print a
Which is converted to:
a = 2 + 2;
print a
But not when the newline occurs in the middle of an expression, as in the following example, where the expression is parsed without changes:
a = 2 +
2
How should be parsed a function application expression with a newline?
Regular function application:
-> sum 2 3
= (sum 2 3)
Prefix function application with a new-line in the middle:
-> sum 2
3
= (sum 2 3)
= (do (sum 2) 3)
Infix function application with a new-line in the middle:
-> 2
+ 3
= (+ 2 3)
= (do 2 (+ 3))
-> 2 +
3
= (+ 2 3)
= (do (+ 2) 3)
-> map (x -> x + 1)
[1, 2, 3]
= (map (lambda [x] (+ x 1) [1, 2, 3])
= (do (map (lambda [x] (+ x 1)) [1, 2, 3]))
-> if (a > 3) then:
print "hey"
"yes"
else:
2 + 2
"no"
-> get "/main":
print "Main view"
m = load_model "main"
render main_view model: m
In Go, there is a basic rule: “if the newline comes after a token that could end a statement, insert a semicolon”.
Parsing function application
Forms:
-> f a b c d
-> - a b c d
The prefix minus operator requires an argument:
-> - x
If f
is a prefix function, then f
, depending on the following tokens, may be treated both, as a literal and as a prefix function.
Consider the following situations, where f and g are functions, x and y are literals and (>>) and (+) are infix operators:
-> f
= (name f)
-> x
= (name x)
-> x + y
= ()
-> f >> g
= (apply (>>) [f g])
-> f x y
= (apply f [x y])
-> f g x
= (apply f [g h])
-> f (>>) x
= (apply f [(>>) x])
x y z])
The main difference is the handler used for the token after f
, where >>
is a prefix operator (led) and x is just a literal (nud).
After a symbol token is
Any expression may be used as a prefix expression.
-> (x + y) z
= (apply (+) [x y])
-> (a > 3) ? "yes" : "no"
All the function applications are parsed the same way as the following expression with multiple prefix minus optriangle a b c))))
-> f x y z
= (f (x (y (z))))
Alternatively the previous expression may be parsed as:
-> f x y z
= (f [x y z])
-> f x y z + g
= (f [x y z])
- Start parsing
a b c d
when rbp condition fails check the len and decide if its an application or an id.
-> \ x y -> x + y
-> \ 5
-> !(~ + ~)
-> map !(+ 2) [ ... ]
-
A function may have distinct polymorphic implementations, called methods.
-> print
How the following expressions should be parsed:
-> let f a b = a + b in: f 3 2 - 1
= 4
- About function application syntax:
- To introduce a syntactic the keyword arguments may receive a special block argument. sep::String) = ...
split with
- Error handling: each time an error occurs, store all parser state and pass it for analysis before firing an exception.
- Can the head of an expression be an arbitrary expression?
The regular form of a macro term is defined as a head, referencing a symbol, and a list of arguments, which are expressions.
-
The expression block definition syntax with
:
for body introduction will not be used. It is inconsistent with type definitions, and is not adequate for function definitions.type Person = { name :: String, age :: Int } function sum x y = x + y interface Comparable[a] = ... module Comparable[int] = ...
not:
type Person: { name :: String, age :: Int }
function sum x y: x + y
interface Comparable[a]: ...
module Comparable[int]: ...
- Parsing applicative style function call with prefix parser able to take all the arguments until a led is found would simplify substantially the indentation handling.
Define the Code
type, which extends Expr
with types.
`(5 + 2) :: Code Int = (`+, [`5, `2])
- Study the type inference system of OCaml.
- Study the MetaOcaml code to see how it works.
The syntactic elements must compose. The function definition, or any definition must not change the value.
Fold – Syntax as a library.
Fold syntax has a notion of sequence – a list of adjacent expressions separated by space. Consider some examples:
f x y z
1 2 3 4
f x + g y
All this expressions contain sequences. Some of them may be function applications.
On which level should the separation be made?
- In the grammar definition: every function/macro creates a specialised parser with an appropriate arity. For example the unary minus (as in "-3") will be a function parser accepting a single null denotation expression; a function
f
parser accepting 3 arguments will check during parsing the number of provided arguments. - The second approach is to build application/sequence expressions on the fly just by analysing the syntax. All expression introduced just with juxtaposition are converted to sequences. A distinction between function calls and other constructs is made during the further semantical processing.
The first method may be more intuitive for basic parsers, such as in the the unary minus example or macros (let
, open
, etc), but is not adequate for function application since the number of function arguments is not known during parsing.
Preliminar decision: use the second method, it is more general. A combination of two could be used but it would result in duplicate logic validation in separate compiler phases.
Fix the appending of expressions. The current implementation is broken.
- Should
(f x) y
be equivalent to(f x y)
? - Yes. It is a basic precedence rule, the same way
(2 + 2) + 1
has redundant parentheses, any applicative expression can have them too:f x y z
is equivalent to((f x) y) z
.
There may be some cases where the described grouping rules for application are not intuitive. Consider the following example:
-- Let's greet Alice in Portuguese!
=> (greetings_by_lang # "pt") "Alice"
:: Str = "Olá, Alice!"
-- This expression is syntactically equivalent to:
=> (# greetings_by_lang "pt" "Alice")
:: Str = "Olá, Alice!"
-- This may seem confusing but since `greetings_by_lang is
-- a list of functions it works.
Here is an equivalent code in OCaml ($
== List.nth).
# let xs = [(fun x -> x + 1); (fun x -> x + 2);
(fun x -> x + 3); (fun x -> x + 4)];;
val xs : (int -> int) list = [<fun>; <fun>; <fun>; <fun>]
# xs $ 1;;
- : int -> int = <fun>
# ($) xs;;
- : int -> int -> int = <fun>
# ($) xs 1;;
- : int -> int = <fun>
# ($) xs 1 2;;
- : int = 4
List.nth :: [a] -> a?
List.nth_exn :: [a] -> a (raises exception)
Any time the optional function is used a postfix operator !
could be applied to raise exception when no value is available, just by forcing the optional value.
=> List.head
:: [a] -> a?
=> List.head!
:: [a] -> a
=> List.head (^10)
:: Int? = Some 1
=> List.head! (^10)
:: Int = 1
- Improve the Grammar module. Create
Env.t
and generic definitions (for both macros and values). - Create the explicit REPL phases.
- Define the basic demo operators within the language.
- Define a list of core operators (macro definition, e.g.).
- Macro rewrite rules definition within the language as a DSL.
- Compile-time HTTP requests. Just joking... May be possible eventually though.
- What kind of data will be stored in the
Env
? UnevaluatedExp.t
, atoms or complex instructions
The literals that are stored in the scope only have nud
parsers.
-
Understand the difference between the typed and untyped macros. Start with untyped macros, i.e. simple AST substitutions (no guarantees that the generated code will be well-typed).
-
The separation of default NUD and LED rules may result in better logic reuse. Example: The
if
parser, a regular NUD parser, may be used in LED context resulting in more flexible code:if ok then "yes" else "no" end print ("Answer: " ++ if ok then "yes" else "no" end)
-
Discard. Does not make sense.
Try to avoid the ~
symbol for named function application with a scope variable.
default = 0
Dict.get scores "Alice" default
default = 0
Dict.get scores "Alice" ~default
Instead of allowing the definition of infix operators with custom precedence create a fixed precedence table for all the symbols that may start an operator. This will avoid the need to specify the precedence in the base macros.
- The addition of named arguments will be added after the first stable release.
- Named argument definition depends on the block definition. A simple
'function
macro introduces an implicit block.
Some of the Fold's macros may implicitly introduce code blocks.
Some examples are macro
, function
, interface
. The block is started after the EOL symbol is read on the same line where the macro begins.
function sum x y<EOL>
x + y
end
Discard. Block parsing is integrated with general groupping. Simple do-end
groups can be extended to create blocks by inspecting the first expression.
do
sum x y
x + y
end
In this example changing do
to function
and matching the first expression, i.e., sum x y
, it is possible to directly create function blocks.
function sum x y
x + y
end
This idea is implemented and tested in 80f9eeb for all kinds of blocks, with and without arguments.
The basic idea behind syntax rules is to capture the static symbols and the code arguments that will be matched by the parser during the evaluation.
A simple if condition is a good example to illustrate this idea:
if x > 3 then
print "Ok"
else
print "No"
end
In this case, the static symbols are: if
, then
, else
and end
. And the delimited expressions (x > 3
, print "Ok"
, etc.) are passed as compile-time variables to the macro:
macro `if cond `then conseq `else alt `end
In the macro declarations this two entities must be distinguished. The static symbols, as opposed to the variables, do not need any additional information, such as the matched structure or type. Let's have a look at a familiar function definition macro:
macro `function exp_list...
This macro is introduced by the static symbol function
, followed by the spliced list of expressions, namely the function name and the arguments. So the previous rule may be further expanded to:
macro `function (name & args)...
By applying pattern matching to the expression list it is easier to inspect the structure of the macro arguments and also validate and extract the required information.
As one may notice, the suggested syntax to distinguish between static symbols and expression variables is very simple: quoted names (just symbols) and in-line variables. This way the basic pattern matching on expressions can be used directly in the parser.
One alternative to this approach is to annotate the variables and use in-line symbols:
macro $list [ $n ]
In this case the variables are explicitly introduced by the interrogation mark and the static symbols are written directly.
The limitations and the advantages of each one of the encodings will be studied during the implementation but for now the first one will be used.
Use ~
for explicitly marking partially applied functions:
hello name: "Alice" lang: "en"
map (hello lang: "pt" ~) ["João" "Maria" "André"]
let
bindings are supported in Fold with more concise syntax:
let
x = 1 + 2
y = x - 8
in
print x
end
- Set grammar rules in the expr environment. Env supoorts both nud and led definitions.
- Minimize computation pqths with automata. How a computation graph can be defined as an FA?
- Protected scope for limited rules. Ex:
@
for as-patterns should be only available in the pattern.
-
One possible way of implementing a simple postfix syntax rule is to match the postfix operator during the eval on List xs. This has the advantage of having the same precedence of application lists.
-
Discard. Postfix rules are already easily implemented withe the pratt parser.
Eval environment can be defined as a state monad:
The latest commit 9178d28
has a working implementation of the function
definition. An additional Func constructor had to be added to the Expr type.
The implementation of this constructor may not be final though. As an
alternative consider using a function from expr list to expr.
Next steps: refactor the definition of builtins, improve the Env management (probably avoid returning it from eval) and implement the macro support.
The previously suggested definition of Func expression constructor as a function will avoid the need of explicit env argument.
Env definitions will include the grammar rules from the language definition.
- To store closure scope environment, a genreic metadata value can be added to
let x = 2 + 2
Term [Sym "="; Term [Sym "let"; Sym "x"]; Term [Sym "+"; Int 2; Int 2]]
Call [Sym "="; List [Sym "let"; Sym "x"]; Term [Sym "+"; Int 2; Int 2]]
- People are born at age 0, and durations start at time 0.
- I should find a way to merge Env and Grammar modules. How can the Rule definition be stored within the scope map?
- It is more important that the function definition looks similar to function call, rather than to lambda expression. Mainly because the named arguments will look strange separated from the function name.
Example:
move ~(from source) ~(to dest) =
Unix.system ("mv {} {}" % (source, dest))
move = ~(from source) ~(to dest) ->
Unix.system ("mv {} {}" % (source, dest))
Macro definition with braces is straightforward, since blocks can be seen as regular expressions.
macro `if test a `else b ...
line = Console.line ()
if (lengh (String.split by: ' ' line) == 0) {
fail "Empty input. Please write at least two words."
} else {
print "You wrote %(line)"
}
There is no need for the explicit {...}
notation, blocks are already correctly parsed as an atomic or sequential expression.
The direct translation of this notation to the do-end
syntax does not result in an acceptable solution because writing explicitly the do
keyword is too verbose and conflicts with the possibility of its usage for the monadic blocks.
Here is how would the previous example look with this syntax:
macro `if test a `else b ...
line = Console.line ()
if (lengh (String.split by: ' ' line) == 0) do
fail "Empty input. Please write at least two words."
end else do
print "You wrote %(line)"
end
...Which is ridiculous.
The question is: Is it possible to define a consistent rule for parsing blocks with the end
noation and line terminations?
macro `if test a `else b ...
line = Console.line ()
if (lengh (String.split by: ' ' line) == 0)<EOL>
fail "Empty input. Please write at least two words."
else<EOL>
print "You wrote %(line)"
end
One concrete limitation of the white-space sensitive syntax is the inability to compose blocks. The following construct is impossible with indentation-based if.
if flag 1 else 2 end + 2
(define macro (a (quote symbol +) b)
(quote
plus (unquote a)
(unquote b)))
(a + b)
infix 7 +
Infix
(fun left ->
get >>= fun { token } ->
consume token.value >>
expression 7 >>= fun right ->
return (Expr.Form [Expr.Atom token; left; right]))
if_then_else_ :: ...
if test then a else b = ...
`(if then else) :: ...
(if:then:else:) :: ...
(if..then..else..) :: ...
macro option `?` default =
quote:
case option:
Some value -> value
None -> default
- Q: How are parsers/macros stored in the environment?
- A: They are not. Instead of storing the parser directly in the environment, an abstract expression description is bound to each definition. The parser syntax is then interpreted into an actual monadic parser (see the
parse_rule
function).
From the previous answer we can extract a definition:
Syntax Rule – an expression that describes the syntax and is interpreted as a parser. Examples of syntax rules are: if
macro, +
sign.
Referencing
- Operator definition in Fold:
- Consider including symbol literals in the Fold’s lexer.
- Q: What are the advantages and tradeoffs of having fully custom syntax as compared to infix, prefix and postfix definitions?
"if" test "then" a "else" b = …
`if test `then a `else b = …
Operators:
-- Option with default value.
@infix 6
_ ? _ :: Self a -> a -> a
self ? default =
case self {
Some x -> x
None -> default
}
-- Monad signature
sig Monad {
type Self a
pure :: a -> Self a
_ >>= _ :: Self a -> (a -> Self b) -> Self b
}
-- Conditional macro
@macro
if _ then _ else _ :: Code Bool -> Code a -> Code a -> Code a
if t then a else b =
- Q: How do you represent the list macro
[a, b, c, ...]
?
Provide multiple rules with common prefix.
[ ]
[ _ ]
[ _ , rest... ]
Fold uses a lightweight mechanism to translate complex syntactic structures to regular function applications. This is achieved by the following set of rules:
-- Regular function application.
f x => f x
-- Function application with a named argument
f label: x => f label: x
-- Function application with a named block argument
f label
x
y
end
=> f label: (x; y)
case expr
Form [Symbol "=", name, value] ->
Env.add name (eval env value) env,
Expr.unit
Form (Symbol "=" & _) -> invalid ()
Atom x -> Atom x
end
r = map (count (== 'a')) data_opt |>
| Some x -> x + 1
| None -> 0
- Expressiveness
- Performance
- Composition
- Consistency
- Experience
- Simplicity
- How should the type annotations work?
- a) as in OCaml with inline type annotations and separate interfaces;
- b) as in Haskell alongside definitions.
- Decision: Use whitespace insensitive syntax with ML-style definitions and explicit sequencing operator
;
.
StandardML uses functions with clauses for definitions with pattern matching. One example is shown here:
fun factorial 0 = 1
| factorial n = n * factorial (n - 1)
This particular notation is very concise, expressive and elegant and allows the implementation of a whitespace insensitive parser with Haskell-like pattern-matching.
There are some limitations of this design if applied to Fold:
- Nested
let
definitions require thefun
keyword. - Commenting individual patterns becomes inconsistent because of the first match.
- Pattern matching for infix operators does not work well with lambda syntax.
def factorial 0 = 1
| factorial n = n * factorial (n - 1)
def 0! = 1
| n! = n * factorial (n - 1)
The second example is only possible with full function + pattern syntax.
Function application syntax in Fold follows the tradition started by the functional programming languages such as Lisp and Haskell: arguments are passed to functions by juxtaposition. This is different from the familiar mathematical notation f(x)
adopted by most programming languages.
The advantages of space separated argument application are:
- Cleaner syntax: this minimalistic approach results in code that many functional programmers consider elegant.
- Easier metaprogramming: the fundametnal part of the metaprogramming model in Lisp is the representation of lists in Lisp. Function application is just a special case of list literals in Lisp. This duality is an important design decision that ensures flexibility and consistenncy to the Lisp programms.
- Explicit currying: In Haskell the usage of juxtaposition for application is mainly motivated by the underlying function model in which all functions are only passed one argument. So a function call
f x y z
in reality is treated as((f x) y) z
.
The disadvantages of the juxtaposition for application are:
- Unfamiliar to beginers: people starting with functional languages usually find this syntax strange. It is understandable since even the regular
f(x)
notation is used in math. - Parentheses handling: the regular syntax always forces the programmer to use the parenthese. It is consistent and pepole don't think about it. With functional syntax you only need parentheses if your argument is a compound expressions. In some situations it may be inconvenient to keep track of the parentheses and match them while writing nested applications. Haskell programmers use the
$
operator to avoid writing parentheses. In OCaml it is less frequent but pepole may use@@
for the same reason.
In Fold there will be no $
operator. The equivalent of $
is the <|
operator which is consistent with |>
but verbose and less natural for regular function application.
The proposal is to use a simplified math notation: f x, y, z
. This version of the syntax will be familiar to many developers (languages like Ruby and Elixir already use it). The parentheses handling issue is also avoided by using commas.
Another benefit of this syntax is the possibility to use a simpler notation for function definition:
hello : String -> ()
hello name =
print "Hello, $name!"
Function definition with clauses in Fold will be implemented. The only disadvantage is the small inconsistency in formatting. Consider the following example:
-- Without clauses
def hello name =
print "hello, %s" name
-- With clauses
def hello "world" =
print "hello, WORLD!"
| hello name
print "hello, %s!" name
The indentation is inconsistent in two cases. But weighting all variables, clauses do add value to the language.
- Similar to Haskell and StandardML
- Easy to understand
- Compact and clear code
- Will be used in macros too.
- Do not add expessivity to the language, equivalent to
case
- Do not use
end
termination - Inconsistent indentation