Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern matching equality types #36

Open
eduardoleon opened this issue Jan 27, 2018 · 10 comments
Open

Pattern matching equality types #36

eduardoleon opened this issue Jan 27, 2018 · 10 comments

Comments

@eduardoleon
Copy link

Standard ML has two branching mechanisms: pattern matching and the more traditional if-then-else. The former is generally considered superior for most purposes. For example:

fun foo nil = bar
  | foo (x :: xs) = qux (x, xs)

is regarded as cleaner code than:

fun foo xs = if null xs then bar else qux (hd x, tl x)

However, SML's existing pattern matching mechanism provides no support for equality testing. One cannot write, for example:

fun foo ((p,a) :: (p,b) :: (p,c) :: xs) = SOME whatever
  | foo _ = NONE

Instead, one has to write the much clumsier:

fun foo ((p,a) :: (q,b) :: (r,c) :: xs) =
    if p = q andalso q = r then SOME whatever else NONE
  | foo _ = NONE

Note that q and r serve no purpose other than to be tested for equality with p.

I propose that repeated variables be allowed in patterns, and that they be required to have eqtypes. As an additional benefit, the equality testing function wouldn't need to be a magical primitive anymore. Instead, it can be defined as a library function:

fun op= (x, x) = true
  | op= _ = false

I would also like to propose that the equality testing function be renamed to ==, since it is confusing to use the same token for definitions and equality testing. But that's a debate for another day.

@JohnReppy
Copy link
Contributor

The Successor ML design has an alternative approach to this issue using guarded patterns. You can write something like

(p1, a) :: (p2, a) :: (p3, a) :: xs if (p1 = p2) andalso (p2 = p3)

(I may be getting the exact syntax slightly wrong, but it is described in the SuccessorML definition).

I've got an undergraduate working on adding this mechanism to SML/NJ.

@eduardoleon
Copy link
Author

eduardoleon commented Jan 27, 2018

What I don't like about guarded patterns is that they may require evaluating arbitrary expressions, which of course might have arbitrary effects, including nontermination.

Pattern matching should be the eliminator for a sum, i.e., the language's semantics should ensure that the distinction between the cases is decidable. Guarded patterns shift the burden of proof onto the programmer.

@RobertHarper
Copy link

RobertHarper commented Jan 27, 2018 via email

@eduardoleon
Copy link
Author

Out of curiosity: In practice, how often is equality testing for different types? Personally, I only need it with “atomic” types such as integers and reference cells. Strings are probably best manipulated by treating them as streams of characters, using a function:

val uncons : string -> (char * string) option
(* IMO preferable to String.explode *)

And I don't use equality testing with datatypes or abstract types at all. (In fact, couldn't we make datatypes not eqtypes?)

In conclusion, I would be satisfied with a limited version of my original proposal where equality testing in pattern matching is only allowed for integers, words, characters, and reference cells.

@JohnReppy
Copy link
Contributor

While equality for abstract and recursive types has its dangers (as Bob noted), equality is useful for more than just integers and reference cells. The bool datatype is an obvious example.

Going back to the original proposal, guards support the desired pattern in a much more general way than non-linear patterns. For example, say my list elements that foo is working on are hash-consed terms. They have a fast equality test, but it is not op=, so non-linear patterns won't work, but guards will.

Also, it is a bit off topic, but the substring type is what you want for efficient manipulation of strings (the Substring.getc function is your uncons.

@RobertHarper
Copy link

RobertHarper commented Jan 27, 2018 via email

@eduardoleon
Copy link
Author

eduardoleon commented Aug 5, 2019

I agree that if-then-else really needs to die, but I do not see how pattern matching on bools is any better. Effectively, it is different syntax for the same thing. You are still computing values of type bool, whose provenance you need to remember. Why not get rid of the bool altogether, and branch on what really matters - whether two given values are equal or not?

Now, regarding why it is useful to branch on whether two values are equal. This is an operation that does not arise so often in purely functional programming. (Although I have found use cases, such as the post that begins this thread.) However, it is absolutely essential in the implementation of certain imperative data structures, such as link-cut trees, in which the “object identity” of every node matters, and we need to test at runtime whether two given pointers (or reference cells) are the same.

@ratmice
Copy link
Contributor

ratmice commented Aug 5, 2019

Not a much to do with pattern matching in this comment, but the approach to guarding via some predicate that I like has been to use a functor/opaque type returning an option as below.

This has the benefit that the predicate is called on construction via mk, thus you don't have to worry about effects occurring during pattern match. That said it doesn't really work if T contains mutable state, which the predicate conditionalizes.

I'm not sure others would want to work in this style which is a somewhat detrimental to pattern matching, it probably wouldn't end up very pretty in Eduardo's example where there is equality across list elements, but perhaps worth mentioning, as I like both the predictability of not only when, but how many times the predicate is called (on construction, once), of this style of code.

shrug

signature DISCRIM = sig
        type T;
        type discrim_t;
        val mk : T -> discrim_t option;
        val get : discrim_t -> T;
end

functor Discrim(type T_; val predicate : T_ -> bool)
:> DISCRIM where type T = T_
= struct
        type T = T_;
        type discrim_t = T_;
        fun mk v = if predicate v then SOME v else NONE;
        fun get v = v;
end

@JohnReppy
Copy link
Contributor

A response to parts of the previous two comments.

While if-then-else is just a derived form, it is both notationally and conceptually convenient. Successor ML goes further and adds an if-then derived form.

As far as using an abstract interface instead of pattern matching, you lose the efficiency benefits of pattern matching, unless you have a very smart compiler. An alternative that gives you similar behavior are the Abstract Value Constructors that Bill Aitken and I proposed back in 1992.

@eduardoleon
Copy link
Author

eduardoleon commented Aug 8, 2019

Consider the type 'a point defined as follows:

datatype 'a state = Empty | Point of 'a point
withtype 'a point = ('a * 'a state * 'a state * 'a state) ref

The elements of the 4-tuple are:

  1. A user-supplied value
  2. A pointer to the left child node
  3. A pointer to the right child node
  4. A pointer to the parent node

The invariants of this type are:

  1. The left and right children of a node must point back to their parent.
  2. The left and right children of a node must be either distinct of both Empty.
  3. There are no cycles obtained by following downward paths, i.e., left and right pointers.
  4. There are no cycles obtained by following upward paths, i.e., parent pointers.

A node may have an arbitrary number of children. The left and right children are merely (zero, one or) two of them.

An auxiliary tree is a maximal subgraph with the property that, given two nodes m and n in it, there exist paths from m to n and from n to m. Auxiliary trees are manipulated as splay trees. In particular, one must be able to splay a node. In present-day Standard ML, this is done as follows:

datatype 'a parent
  = Root
  | Left of 'a point
  | Right of 'a point
  | Other of 'a point

(* Working around how dumb eqtypes are. *)
fun eq (Empty, Empty) = true
  | eq (Point p, Point q) = p = q
  | eq _ = false

fun parent p =
  case #4 (!p) of
      Empty => Root
    | Point q =>
      if eq (#2 (!q), Point p) then Left q else
      if eq (#3 (!q), Point p) then Right q else Other q

fun left (p, q) =
  case parent q of
      Left r => l3 (p, q, r)    (* implemented somewhere else *)
    | _ => l2 (p, q)            (* implemented somewhere else *)

fun right (p, q) =
  case parent q of
      Right r => r3 (r, q, p)    (* implemented somewhere else *)
    | _ => r2 (q, p)             (* implemented somewhere else *)

fun splay p =
  case parent p of
      Root => Empty
    | Left q => (left (p, q); splay p)
    | Right q => (right (p, q); splay p)
    | Other q => Point q

The helper datatype 'a parent and the helper functions parent, left and right exist solely to work around the lack of expressiveness of Standard ML's patterns. With equality patterns and pattern synonyms, one could instead write something like:

pat Left (p, q) = p as ref (_, _, _, Point (q as ref (_, Point p, _, _)))
pat Right (p, q) = p as ref (_, _, _, Point (q as ref (_, _, Point p, _)))
pat Child (p, q) = p as ref (_, _, _, q)

fun splay (Left (p, Left (q, r)) = (l3 (p, q, r); splay p)
  | splay (Right (p, Right (q, r)) = (r3 (r, q, p); splay p)
  | splay (Left (p, q)) = (l2 (p, q); splay p)
  | splay (Right (p, q)) = (r2 (q, p); splay p)
  | splay (Child (p, q)) = q

And then pattern matching imperative data structures is every single bit as sweet as pattern matching purely functional ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants