-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update readme * wip more memoize * remove some memoize * try some more memoize * wip basic TDD tests for symbolic graph * wip symbolic graph * refactor 1 * refactor 2 * wip * wip * wip * wip tutorial Leibniz equality * wip * reformat * fix * reformat * finish write-up about Leibniz equality * wip * add code for factorial * wip * wip
- Loading branch information
Showing
12 changed files
with
866 additions
and
239 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
java -Xlog:stringdedup*=debug -XX:+UseG1GC -XX:+UseStringDeduplication -jar dhall.jar --file scall-cli/src/test/resources/yaml-perftest/create_yaml.dhall yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
# Specification and implementation of "Nano-Dhall" | ||
|
||
The "Nano-Dhall" project illustrates how a (very small) functional programming language can be specified and implemented. | ||
Nano-Dhall is a subset of Dhall limited to: | ||
|
||
- Function types `∀(a : t1) → t2` and function values `λ(a : t) → b`. | ||
- Function application: `f x`, grouping to the left, so that `f x y = (f x) y`. | ||
- Built-in type `Natural` with operations `a + b`, `a * b`, and `Natural/fold` | ||
- Built-in type `Text` with the concatenation operation (`x ++ y`). | ||
- Type universes `Type` and `Kind`, so that we have typing judgments `Nat : Type` and `Type : Kind`. (The symbol `Kind` has no type.) | ||
|
||
We have omitted records, union types, imports and many other features of Dhall. | ||
But all of those features (except imports) can be encoded in Nano-Dhall using just functions and types. | ||
|
||
The syntax of Nano-Dhall is simplified so that there are fewer keywords and fewer equivalent ways of writing the same things. | ||
|
||
We kept just two built-in types (`Natural` and `Text`) with some built-in operations, in order to show how such features are implemented. | ||
|
||
For clarity, the impementation will be as simple and straightforward as possible. | ||
|
||
### Encoding various features in Nano-Dhall | ||
|
||
Here we will briefly show examples of Nano-Dhall code that reproduce some of the missing features. | ||
|
||
- Void type: `∀(a : Type) → a` (instead of Dhall's `<>`) | ||
- Unit type: `∀(a : Type) → ∀(x : a) → a` instead of Dhall's `{}` | ||
- Unit value: `λ(a : Type) → λ(x : a) → x` instead of Dhall's `{=}` | ||
- "Let"-expression: `(λ(a : t) → b) x` instead of Dhall's `let a : t = x in b`. We will replace "let"-expressions by "lambda"-expressions at parsing time. | ||
- Pair type: `λ(a : Type) → λ(b : Type) → ∀(r : Type) → ∀(k : a → b → r) → r` instead of Dhall's `λ(a : Type) → λ(b : Type) → { _1 : a, _2 : b}` | ||
- Pair value: `λ(a : Type) → λ(b : Type) → λ(r : Type) → λ(k : a → b → r) → r` | ||
|
||
### Syntax and parsing into the syntax tree | ||
|
||
The syntax of Nano-Dhall is described in the ABNF format. | ||
This is a greatly simplified version of `dhall.abnf`. | ||
|
||
The main entry point is `complete-expression`. | ||
|
||
``` | ||
complete-expression = whsp expression whsp | ||
; Optional whitespace: zero or more whitespace characters. | ||
whsp = *whitespace-chunk | ||
; Non-empty whitespace: one or more whitespace characters. | ||
whsp1 = 1*whitespace-chunk | ||
whitespace-chunk = | ||
" " | ||
/ tab | ||
/ end-of-line | ||
expression = | ||
lambda whsp "(" whsp identifier whsp ":" whsp1 expression whsp ")" whsp arrow whsp expression | ||
/ forall whsp "(" whsp identifier whsp ":" whsp1 expression whsp ")" whsp arrow whsp expression | ||
/ let whsp1 identifier whsp ":" whsp1 expression whsp "=" whsp expression whsp1 "in" whsp1 expression | ||
/ expression0 | ||
lambda = %x3BB / "\" | ||
arrow = %x2192 / "->" | ||
forall = "forall" / %x2200 | ||
keyword = forall / "let" / "in" ; We have just these keywords, for now. | ||
; Precedence tower. | ||
expression0 = expression1 [ whsp ":" whsp1 expression ] ; Type annotation. | ||
expression1 = expression2 *(whsp "+" whsp expression2) ; Nat + Nat | ||
expression2 = expression3 *(whsp "++" whsp expression3) ; Text ++ Text | ||
expression3 = expression4 *(whsp "*" whsp expression4) ; Nat * Nat | ||
expression4 = expression5 *(whsp1 expression5) ; Function application. | ||
; End of precedence tower. | ||
expression5 = primitive-expression | ||
primitive-expression = natural-literal / text-literal / builtin / identifier / "(" complete-expression ")" | ||
natural-literal = NONZERODIGIT *DIGIT / "0" | ||
builtin = "Natural/fold" / "Natural" / "Text" / "Type" / "Kind" | ||
identifier = builtin 1*label-next-char / !builtin label | ||
label = | ||
keyword 1*label-next-char | ||
/ !keyword (label-first-char *label-next-char) | ||
label-first-char = ALPHA / "_" | ||
label-next-char = ALPHANUM / "-" / "/" / "_" | ||
; Uppercase or lowercase ASCII letter. | ||
ALPHA = %x41-5A / %x61-7A | ||
; ASCII digit | ||
DIGIT = %x30-39 ; 0-9 | ||
NONZERODIGIT = %x31-39 | ||
ALPHANUM = ALPHA / DIGIT | ||
text-literal = %x22 *inside-double-quote %x22 | ||
; Printable characters except double quote and backslash. | ||
inside-double-quote = | ||
%x20-21 | ||
; %x22 = '"' | ||
/ %x23-5B | ||
; %x5C = "\" | ||
/ %x5D-7F | ||
/ valid-non-ascii | ||
valid-non-ascii = ; Let us just support some Unicode. | ||
%x80-D7FF | ||
; %xD800-DFFF = surrogate pairs | ||
/ %xE000-FFFD | ||
; %xFFFE-FFFF = non-characters | ||
/ %x10000-1FFFD | ||
``` | ||
|
||
The result of parsing is a syntax tree. The data type for the tree could look like this (Haskell) code: | ||
|
||
```haskell | ||
type Ident = String | ||
type Operator = Plus | Concat | Times | ||
data Expr = Lambda Ident Expr Expr -- λ(ident : expr1) → expr2 | ||
| Forall Ident Expr Expr -- ∀(ident : expr1) → expr2 | ||
| Annot Expr Expr -- expr1 : expr2 | ||
| Op Expr Operator Expr -- expr1 operator expr2 | ||
| App Expr Expr -- expr1 expr2 | ||
| NaturalLit Int -- 123 | ||
| TextLit String -- "abc" | ||
| Var Ident Int -- A variable with a de Bruijn index. In Nano-Dhall, `x@1`. | ||
| Builtin String -- One of the built-in symbols (`Natural`, `Type`, etc.). | ||
``` | ||
|
||
Remarks: | ||
|
||
- Types and values are both represented by an expression (`Expr`). | ||
Later, the type-checking procedure will figure out which is which. | ||
This allows us to write code like `λ(a : Type) → λ(x : a) → ...` | ||
|
||
- Multiple-arity operations are parsed into left-associative, nested binary operations. | ||
For example, `1 + 2 + 3` is parsed into the following expression tree: | ||
|
||
```haskell | ||
Op (Op (NaturalLit 1) Plus (NaturalLit 2)) Plus (NaturalLit 3) | ||
``` | ||
|
||
- All "let"-expressions are immediately rewritten as `App` terms. | ||
|
||
- All built-in operations are associative (as in Dhall). | ||
|
||
- Nano-Dhall does not support explicit de Bruijn indices in the syntax; but internally, all variables must have a de Bruijn index. | ||
Parsing `x` will produce `Var "x" 0`, with the de Bruijn index equal to `0`. | ||
|
||
Any parsing library can be used to implement parsing. | ||
The parsing procedure uses negative lookahead in some places, to prevent variable names from being keywords or built-in symbols. | ||
|
||
### Type checking | ||
|
||
Type checking in Nano-Dhall is a function from an input pair `(context, expression)` to an output . | ||
|
||
The input `context` is a set of `(variable, expression)` pairs, showing all the variables whose types are already known. | ||
The input `expression` is a Nano-Dhall expression with or without existing type annotations. | ||
|
||
The output of typechecking is either a success with an output `expression` or a failure with a message describing a type error. | ||
In case of success, the output `expression` has a type annotation indicating the type of the expression. | ||
|
||
For example, the typechecking of `(context = [y : Natural], λ(x : Natural) → x + 1) ` succeeds and outputs: | ||
|
||
`(λ(x : Natural) → x + 1) : ∀(x : Natural) → Natural` | ||
|
||
This typechecking function is different from what is described in the Dhall standard. | ||
There, typechecking does not modify the original expression but merely computes its type (or determines that there is a type error). | ||
|
||
Making the typechecker return a different expression opens further possibilities for improving the language's usability and tooling: | ||
|
||
- One could return partial type-checking results (with type annotations at valid sub-expressions) even when there are type errors in the input. | ||
- The evaluator can do more simplification if type information is embedded into the expression being evaluated. Currently, Dhall's evaluator (beta-normalization) is designed to work entirely without any type information. This limits its ability to perform certain simplifications that are possible only if types of certain sub-expressions are known. | ||
- Type inference and/or term inference can be performed at the type-checking stage. The output of type checking can be an expression not only with additional type annotations, but also with some additional terms; for instance, automatically inferred type parameters or other inferrable terms. | ||
|
||
The type-checking algorithm proceeds recursively through the structure of the given expression. | ||
When a sub-expression is a function or a function type, a new variable is bound. | ||
So, the type-checking for the function body will proceed with the context that contains the new variable. | ||
To avoid name clashes, the body needs to be modified if it already contains a variable with the same name. | ||
|
||
|
||
|
||
### Alpha-normalization | ||
|
||
### Beta-normalization | ||
|
||
### One-place type inference | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
130 changes: 130 additions & 0 deletions
130
nano-dhall/src/main/scala/io/chymyst/nanodhall/NanoDhallGrammar.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
package io.chymyst.nanodhall | ||
|
||
import fastparse.NoWhitespace._ | ||
import fastparse._ | ||
|
||
import scala.util.{Failure, Success, Try} | ||
import io.chymyst.fastparse.Memoize.MemoizeParser | ||
|
||
object NanoDhallGrammar { | ||
|
||
def ALPHA[$: P] = P( | ||
CharIn("\u0041-\u005A", "\u0061-\u007A") // A_Z | a_z | ||
) | ||
|
||
def DIGIT[$: P] = P( | ||
CharIn("0-9") | ||
// 0_9 | ||
) | ||
|
||
def end_of_line[$: P] = P("\n" | "\r\n") | ||
|
||
def valid_non_ascii[$: P] = P( | ||
CharIn( | ||
"\u0080-\uD7FF", | ||
// %xD800_DFFF = surrogate pairs | ||
// "\uE000-\uFFFC", | ||
"\uE000-\uFFFC", // Workaround: Disallow the "replacement" character ("\uFFFD") because it will be generated for invalid utf-8 encodings. | ||
) | ||
// Encode other Unicode ranges into Java's UTF-16 using UTF-16 surrogates. | ||
// See https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=10000&mode=hex and look for "UTF-16 surrogates". | ||
// %xFFFE_FFFF = non_characters | ||
// | % x10000_1FFFD | ||
// U+10000 = "\uD800\uDC00" | ||
// U+103FF = "\uD800\uDFFF" | ||
// U+10400 = "\uD801\uDC00" | ||
// U+1FFFD = "\uD83F\uDFFD" | ||
// format: off | ||
| (CharIn("\uD800-\uD83E") ~ CharIn("\uDC00-\uDFFF")) | ||
// format: on | ||
| (CharIn("\uD83F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x1FFFE_1FFFF = non_characters | ||
// | % x20000_2FFFD // U+20000 = \uD840\uDC00 | ||
| (CharIn("\uD840-\uD87E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD87F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x2FFFE_2FFFF = non_characters | ||
// | % x30000_3FFFD | ||
| (CharIn("\uD880-\uD8BE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD8BF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x3FFFE_3FFFF = non_characters | ||
// | % x40000_4FFFD | ||
| (CharIn("\uD8C0-\uD8FE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD8FF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x4FFFE_4FFFF = non_characters | ||
// | % x50000_5FFFD | ||
| (CharIn("\uD900-\uD93E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD93F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x5FFFE_5FFFF = non_characters | ||
// | % x60000_6FFFD | ||
| (CharIn("\uD940-\uD97E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD97F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x6FFFE_6FFFF = non_characters | ||
// | % x70000_7FFFD | ||
| (CharIn("\uD980-\uD9BE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD9BF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x7FFFE_7FFFF = non_characters | ||
// | % x80000_8FFFD | ||
| (CharIn("\uD9C0-\uD9FE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uD9FF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x8FFFE_8FFFF = non_characters | ||
// | % x90000_9FFFD | ||
| (CharIn("\uDA00-\uDA3E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDA3F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %x9FFFE_9FFFF = non_characters | ||
// | % xA0000_AFFFD | ||
| (CharIn("\uDA40-\uDA7E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDA7F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %xAFFFE_AFFFF = non_characters | ||
// | % xB0000_BFFFD | ||
| (CharIn("\uDA80-\uDABE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDABF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %xBFFFE_BFFFF = non_characters | ||
// | % xC0000_CFFFD | ||
| (CharIn("\uDAC0-\uDAFE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDAFF") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %xCFFFE_CFFFF = non_characters | ||
// | % xD0000_DFFFD | ||
| (CharIn("\uDB00-\uDB3E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDB3F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %xDFFFE_DFFFF = non_characters | ||
// | % xE0000_EFFFD | ||
| (CharIn("\uDB40-\uDB7E") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDB7F") ~ CharIn("\uDC00-\uDFFD")) | ||
// // %xEFFFE_EFFFF = non_characters | ||
// | % xF0000_FFFFD | ||
| (CharIn("\uDB80-\uDBBE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDBBF") ~ CharIn("\uDC00-\uDFFD")) | ||
// U+F0000 = "\uDB80\uDC00" | ||
// U+FFFFD = "\uDBBF\uDFFD" | ||
// // %xFFFFE_FFFFF = non_characters | ||
// | % x100000_10FFFD | ||
| (CharIn("\uDBC0-\uDBFE") ~ CharIn("\uDC00-\uDFFF")) | ||
| (CharIn("\uDBFF") ~ CharIn("\uDC00-\uDFFD")) | ||
// U+100000 = "\uDBC0\uDC00" | ||
// U+10FFFD = "\uDBFF\uDFFD" | ||
// %x10FFFE_10FFFF = non_characters | ||
) | ||
.memoize | ||
|
||
def tab[$: P] = P("\t") | ||
|
||
def whitespace_chunk[$: P] = P( | ||
" " | ||
| tab | ||
| end_of_line | ||
) | ||
.memoize | ||
|
||
def whsp[$: P]: P[Unit] = P( | ||
NoCut(whitespace_chunk.rep) | ||
) | ||
|
||
def whsp1[$: P]: P[Unit] = P( | ||
NoCut(whitespace_chunk.rep(1)) | ||
) | ||
|
||
def ALPHANUM[$: P] = P( | ||
ALPHA | DIGIT | ||
) | ||
|
||
} |
6 changes: 3 additions & 3 deletions
6
...myst/nanodhall/unit/LRUHashDictTest.scala → ...nodhall/unit/ConcurrentHashDictTest.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.