Skip to content

Commit

Permalink
Editing
Browse files Browse the repository at this point in the history
  • Loading branch information
noelwelsh committed Oct 30, 2023
1 parent f68ebce commit 33926eb
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 19 deletions.
4 changes: 2 additions & 2 deletions src/pages/adt-interpreters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

In the previous chapter we learned about algebraic data types. In this chapter we'll learn one of their major use cases, implementing interpreters. The interpreter strategy is perhaps the most important in all of functional programming.

Before discussing the code, let's talk about why we might want to implement an interpreter. Implementing an interpreter sounds like a fairly esoteric thing to do, but it forms the core for building systems that maintain the desirable properties of functional programming---compositionality and reasoning---while allowing us to have effects in our code. The central idea is to **separate description from action**. For example, imagine we're implementing a graphics library using the interpreter strategy. A description simply says what we want to draw on the screen, but critically it does not actually draw anything. The interpreter takes this description and carries out the actions described by it. As the description doesn't do anything, we can freely compose descriptions. For example, if we have a description that describes a circle, and one for a square, we can compose them by saying we should draw the circle next to the square. This creates a new description that is the composition of the two base descriptions.
Why might we want to implement an interpreter? It sounds like a fairly esoteric thing to do. The central idea is to **separate description from action**. When we use the interpreter strategy our program consists of two parts: the instructions or program that describes what we want to do, and the interpreter that carries the actions in the description. This is the core of many systems that maintain compositionality and reasoning, particularly while allowing effects in our code. For example, imagine we're implementing a graphics library using the interpreter strategy. A description simply says what we want to draw on the screen, but critically it does not actually draw anything. The interpreter takes this description and carries out the actions described by it. As the description doesn't do anything, we can freely compose descriptions. For example, if we have a description that describes a circle, and one for a square, we can compose them by saying we should draw the circle next to the square. This creates a new description that is the composition of the two base descriptions. Drawing on the screen is an effect, and by keeping a clear separation between description and action we can more easily reason about our code.

It may be hard to see how this works from an abstract description. We'll make it concrete in just a moment, by building an interpreter for regular expressions. We've chosen this example because regular expressions are hopefully familiar to most you, so we can concentrate on the interpreter strategy and not on the details of regular expressions. We'll them extract the general strategy from this specific example, and finally give a few pointers to learn more.
It may be hard to see how this works from an abstract description. We'll make it concrete in just a moment, by building an interpreter for regular expressions. This example is chosen because regular expressions are familiar to many, so we can concentrate on the interpreter strategy and not on the details of regular expressions. We'll them extract the general strategy from this specific example, and finally give a few pointers to learn more.
21 changes: 12 additions & 9 deletions src/pages/adt-interpreters/regexp.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

We'll start this case study by briefly describing the usual task for regular expressions---matching text---and then take a more theoretical view. We'll then move on to implementation.

Programmers mostly commonly use regular expressions to determine if a string matches a particular pattern.
The simplest regular expression is one that matches only one fixed string.
We most commonly use regular expressions to determine if a string matches a particular pattern.
The simplest regular expression is one that matches exactly one string.
In Scala we can create a regular expression by calling the `r` method on `String`.
Here's a regular expression that matches exactly the string `"Scala"`.

Expand All @@ -19,7 +19,10 @@ regexp.matches("Sca")
regexp.matches("Scalaland")
```

When creating regular expressions, there are some characters that have a special meaning.
Notice we already have a separation between description and action.
The description is the regular expression itself, created by calling the `r` method, and the action is calling `matches` method on the regular expression.

There are some characters that have a special meaning within the `String` describing a regular expression.
For example, the character `*` matches the preceding character zero or more times.

```scala mdoc:reset:silent
Expand Down Expand Up @@ -53,16 +56,16 @@ regexp.matches("Scalal")
regexp.matches("Scalaland")
```

That's all I'm going to say about regular expressions as they exist in Scala. If you'd like to learn more there are many resources online. The [JDK documentation](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html) is one example, which describes all the features available in the JVM implementation of regular expressions.
That's all I'm going to say about Scala's built-in regular expressions. If you'd like to learn more there are many resources online. The [JDK documentation](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/regex/Pattern.html) is one example, which describes all the features available in the JVM implementation of regular expressions.

Let's turn to the theoretical description. A regular expression is:
Let's turn to the theoretical description, such as we might find in a textbook. A regular expression is:

1. a string, which matches exactly that string;
2. the concatenation of two regular expressions, which matches the first regular expression and then the second;
3. the union of two regular expressions, which matches if either expression matches; and
4. the repetition of a regular expression (often known as the Kleene star), which matches zero or more repetitions of the underlying expression.

This description may seem a bit abstract, but it is actually very useful because it defines a minimal API. Let's walk through the four parts of the description and see how they relate to code.
If you're not useful to this kind of description it may seem a bit abstract, but it is very useful for our purposes because it defines a minimal API that we can implement. Let's walk through the four parts of the description and see how they relate to code.

The first part tells us we need a constructor with type `String => Regexp`.
In Scala we put constructors on the companion object, so this tells us we need
Expand Down Expand Up @@ -122,7 +125,7 @@ trait Regexp {
Now we've defined the API we can turn to implementation.
We're going to represent `Regexp` as an algebraic data type, and each method that returns a `Regexp` will return an instance of this algebraic data type.
What should be the elements that make up the algebraic data type?
They're going to exactly match the method calls, and their constructor arguments will be exactly the parameters passed to the method *including the hidden `this` parameter for methods on the trait*.
There will be one element for each method, and the constructor arguments will be exactly the parameters passed to the method *including the hidden `this` parameter for methods on the trait*.

Here's the code.

Expand Down Expand Up @@ -189,7 +192,7 @@ object Regexp {
}
```

Now we can apply the usual strategies to complete the implementation. We start reasoning independently by case. The case for `Apply` is very simple: we match if the `input` starts with the string we're looking for.
Now we can apply the usual strategies to complete the implementation. Let's reason independently by case, starting with the case for `Apply`. A reasonable first attempt is to match if the `input` starts with the string we're looking for. This doesn't seem completely correct, as we should on succeed if we match all the input, but it's good enough for now.

```scala
def matches(input: String): Boolean =
Expand All @@ -201,7 +204,7 @@ def matches(input: String): Boolean =
}
```

Let's choose the `Append` case next. This should match if the `left` regular expression matches the start of the `input`, and the `right` regular expression matches starting where the `left` regular expression stopped. This has uncovered a hidden requirement for us: we need to keep an index into the `input` that tells us where we should start matching from. The easiest way to implement this is with a nested method. Here I've created a nested method that returns an `Option[Int]`. The `Int` is the new index to use, and we return an `Option` to indicate if the regular expression matched or not.
Let's move on to the `Append` case. This should match if the `left` regular expression matches the start of the `input`, and the `right` regular expression matches starting where the `left` regular expression stopped. This has uncovered a hidden requirement: we need to keep an index into the `input` that tells us where we should start matching from. The easiest way to implement this is with a nested method. Here I've created a nested method that returns an `Option[Int]`. The `Int` is the new index to use, and we return an `Option` to indicate if the regular expression matched or not.

```scala
def matches(input: String): Boolean = {
Expand Down
16 changes: 8 additions & 8 deletions src/pages/adt-interpreters/reification.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
## Interpreters and Reification

There are two different concepts at play here:
There are two different programming strategies at play here:

1. the interpreter strategy; and
2. the interpreter's implementation strategy of reification.

The essence of the **interpreter strategy** is to separate description and action. Descriptions are programs; things that we want to happen. The interpreter runs the programs, carrying out the actions described within them.
Remember the essence of the **interpreter strategy** is to separate description and action. Therefore whenever we use the interpreter strategy we need at least two things: descriptions and an interpreter that carries out actions. Descriptions are programs; things that we want to happen. The interpreter runs the programs, carrying out the actions described within them.

In the example we just saw, a `Regexp` value is a program. It is a description of a pattern we are looking for within a `String`.
The `matches` method is an interpreter. It carries our the instructions in the description, looking for the pattern within the input.
The `matches` method is an interpreter. It carries our the instructions in the description, looking for the pattern within the input. We could have other interpreters, such as one that matches if only the start of the input matches.


### The Structure of Interpreters

All uses of the interpreter strategy have a particular structure to their methods.
There are three different types of methods:

1. constructors, or introduction forms, have type `A => Program`, where `A` is any type and `Program` is the type of programs. Constructors conventionally live on the `Program` companion object in Scala. We see that `apply` is a constructor of `Regexp`. It has type `String => Regexp`, which matches the pattern `A => Program` for a constructor.
1. constructors, or introduction forms with type `A => Program`, where `A` is any type and `Program` is the type of programs. Constructors conventionally live on the `Program` companion object in Scala. We see that `apply` is a constructor of `Regexp`. It has type `String => Regexp`, which matches the pattern `A => Program` for a constructor.

2. combinators have a program input and output, so the type is similar to `Program => Program` but there are often additional parameters. In our regular expression example, all of `++`, `orElse`, and `repeat` are combinators. They all have a `Regexp` input (the `this` parameter) and produce a `Regexp`. They sometimes have additional parameters, as is the case for `++` or `orElse`. In both these methods the parameter is a `Regexp`, but it is not the case that additional parameters to a combinator must be of the program type. Conventionally these methods live on the `Program` type.

3. destructors, interpreters, or elimination forms, have type `Program => A`. In our regular expression example we have a single interpreter, `matches`, but we could easily add more. For example, we often want to extract elements from the input.

This structure is often called an **algebra** in the functional programming world.
This structure is often called an **algebra** in the functional programming world. When we talk about constructors and destructors in an algebra we're talking at a more abstract level then when we talk about constructors and destructors on algebraic data types. A constructor of an algebra is an abstract concept, at the theory level in my taxonomy, that we can choose to concretely implement at the craft level with the constructor of an algebraic data type. There are other possible implementations. We'll see one later.


### Implementing Interpreters with Reification

Now that we understand the components of interpreter we can talk more clearly about the implementation strategy we used.
We used a strategy called a **reification**, **deep embedding**, or **initial algebra**.
We used a strategy called **reification**, a **deep embedding**, or an **initial algebra**.

Reification, in an abstract sense, means to make concrete what is abstract. Concretely, reification in the programming sense means to turn methods into data. When using the interpreter strategy, we reify all the components of the program type. This means reifying constructors and combinators.
Reification, in an abstract sense, means to make concrete what is abstract. Concretely, reification in the programming sense means to turn methods into data. When using reification in the interpreter strategy we reify all the components that produce the `Program` type. This means reifying constructors and combinators.

Here are the rules for reification:

Expand All @@ -39,7 +39,7 @@ Here are the rules for reification:
3. All constructors and combinators become product types within the `Program` algebraic data type.
4. Each product type holds exactly the parameters to the constructor or combinator, including the `this` parameter for combinators.

If we do this, the interpreter becomes a structural recursion on the algebraic data type we have just defined.
Once we've defined the `Program` algebraic data type, the interpreter becomes a structural recursion on `Program`.


### Exercise: Arithmetic {-}
Expand Down

0 comments on commit 33926eb

Please sign in to comment.