-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper sum types and structural pattern matching #525
Comments
Is it really necessary to change the language syntax for this? Maybe it would be better to standardize macros that implement the same thing on the existing syntax? The current variant objects are really quite unconvinient, and it would be nice to have syntactic sugar over them to create algebraic types. |
🤔 |
That doesn't mean much beyond the fact that people avoid dependencies for various reasons. |
So what does this RFC accomplish that e.g. fusion/matching does not? You don't need to teach me about ML and Rust and Racket and Foozbaz, I know. You need to tell me why you found the existing solutions based on macros lacking. |
We agree. |
The
After work, I'll write up a detailed comparison between the two because I think it reveals some interesting design decisions. But a place to look for now, if interested, is the difference between this RFC's use of |
Also, a little note on other languages: despite Racket's match statement being the most powerful of the bunch and Python being the closest language to Nim syntactically, I took the least inspiration from them. I find Python's match to be grotesquely complex for relatively little benefit, and think |
Very nice! ADT's are to Nim It's true that the proposed type sort of is a special case of a case object with a specific form of enum, exactly one discriminator and no other members - but that special case is useful in many constructs which logically express "one and only one of these things" - the rest is followup on top of that concept. Rather than asking "can macros solve this?", the better question is "can we create better macros with this feature present", and there the answer is undoubtedly yes, because it's such a fundamental construct that appears over and over in many code patterns. Many things become more simple when you can these assumptions about a type: as mentioned already: resetting or overwriting during serialization, exhaustive case statements, etc. Here's a simple example which relies on the knowledge that a type takes on exactly one shape: because the type is guaranteed to be exactly one of the enumerated options, we can reason about that option without specifying the full type but still express a meaningful relationship to it. In |
I generally like the idea and I was one of the complainers about the PM situation. But as I said in Discord, if this ever gets accepted, who is going to take ownership and implement it. Especially PM?
At the very least, it will be a workaround for this :P |
How would the proposed syntax distinguish between a single field that is a tuple and multiple fields? Using tuples for the latter is not nice imo. I'd prefer something closer to the syntax for constructing/matching the variants (which is also how Rust and Swift do it): type Shape = union
Point()
Line(int)
Square(side: int)
Rectangle(x, y: int)
Triangle(TriangleKind) |
In my unwritten proposal I used this syntax: type
Tree = ref case kind # also possible without the `ref`
of StringLeaf:
str: string
of IntLeaf:
x: int
of Parent:
kids: seq[Tree]
No new keyword required and the syntax is optimized for pasting into a |
@konsumlamm Currently: type Shape = union
Rectangle: tuple[x, y: int]
type Shape = union
Rectangle: tuple[tuple[x, y: int]] I picked this syntax to be as close to Nim's existing semantics as possible while still providing the full power of ADTs. I like your syntax more aesthetically, but IMO it doesn't fit in with tuple / object types as well. Also note that the inner type of Rectangle in both of our examples is @Araq I assume the unwritten proposal was mostly for syntactic sugar over the existing variant object syntax? A benefit of the |
That's a good point but easily fixed: type
Tree = ref case # also possible without the `ref`
of StringLeaf:
string
of IntLeaf:
int
of Parent:
seq[Tree]
The advantages remain: Natural Nim syntax, no new keyword, easily turned into a |
I disagree that that's natural Nim syntax, I find it subjectively very ugly, the lack of a field following I'm not particularly attached to the |
I suppose we could also do this leaving eg. type Tree = ref nested enum # oops need indirection
StringLeaf: string
IntLeaf: int
Parent: seq[Tree] |
I care about keeping the type
Tree = ref enum # also possible without the `ref`
of StringLeaf:
string
of IntLeaf:
int
of Parent:
seq[Tree]
|
type Tree = nested enum
StringLeaf: string
IntLeaf: int
Parent: seq[Tree]
is wrong for two reasons: You need the |
That kinda goes against the existing semantics of enum being explicitly an enumeration though, and also conflicts a bit semantically (not in syntax) with the existing feature that lets you set labels of an enum to values. My original approach was to overload the Now that I think about it more, I really don't like the idea that you can copy and paste the re: |
Yeah, even though they're comparable to objects and the syntax looks similar, they're not. This is one of the few parts that differs from existing semantics. I personally think it's fine - |
Actually, this is how you ensure correctness, having thought about it, you handle every case explicitly. It's the "oh and in all other cases (else) do ..." that produces bugs. In the Nim compiler and everywhere else. |
Maybe |
If a non-macro solution for ADTs/Pattern Matching is finally being considered, then my vote goes to a rust/haxe like syntax only. A lot of the syntaxes suggested above are confusing / error prone / ugly just like the existing object variant syntax (imo). |
ML style ADTs would be a great idea and I won't quibble about the syntax but I worry that too much conditional freedom on each match arm eg. Have you looked D's sumtype? It is inspired by ADTs but leans heavily into static introspection and would be a nicer fit for Nim vs. trying to copy Haskell/OCaml. |
D's |
It's more cumbersome but when you can only destructure on one thing at a time static analysis is much easier, you can get better IDE support, possibly even exhaustiveness checking, and maybe even case splitting like Idris. |
Nim's |
Nim's
Here I get the error: |
It's not "brittle", it works as intended. I'm also not familiar with any mainstream language that offers exhaustiveness checking over nested |
Not mainstream, and not completely generally but with a ton of effort Haskell does. But the point is by making the pattern matching more "cumbersome" but also more analyzable you can get this but without most of the implementation complexity. |
@deech Ah, I'll write up a little thing on this with more detail. But basically: I'm very aware of this, and designed it so that the @metagn I don't love |
I'd generally agree. Without pattern matching sum types would only provide convenient constructors which macros can provide. I'd still think it'd be great to have but would feel lacking. Really the implementation could ship in multiple parts, but having even a basic pattern matching syntax agreed upon would be useful. Maybe skip Importantly having pattern matching syntax in the language would prevent splintering and requiring folks to learn multiple different macro based versions. I already didn't want to mix Fungus and Patty in a project as they have different semantics. Also I don't believe it's possible to do the BTW, here's another possible syntax: type
Tree1 = ref enumeration
of StringLeaf:
str: string
of IntLeaf:
x: int
y: int
of Parent:
kids: seq[Tree]
Tree2 = ref enumeration
of StringLeaf: string
of IntLeaf: tuple[int, int]
of Parent: seq[Tree] The downside is that it's slightly confusing at first why there'd be an |
It should be for Aka |
I think my point is more that pattern matching should take the whole language into consideration and not only union types (ditto union types themselves). One can easily imagine pattern-matching over other things than union types, including value matching / conditionals / etc and treating them here in the same RFC not only risks blocking union types on bikeshedding of pattern matching but indeed risks not giving due attention to each of them in the context of the rest of the language. I don't disagree that it's good to keep both things in mind during initial design, but union types are useful on their own (because of the expressivity they add to the type system) and pattern matching isn't limited to them. |
If only we had capable language designers who could have anticipated that... ;-) |
Good points. I wonder if there's a way to enable just a simple Then it'd just be settling on a union / sum type / enumeration / maybe-an-adt syntax. That should be easy. ;) Then we could get sample PRs to get the ball rolling. |
That just allows you to write macro that is syntactically overlapping with case statement ... so basically the link should point to the macro section actually |
@J-James I think aside from comparison with hmatching on features you can also ask for feedback on forum/discord like what sucks?, What you don't like, what could be added, what you didn't understand how to use, what feels off -- just some quick reactions |
Yes, that kind of feedback is very welcome (along with all other feedback)! I've posted the link to this a few times but have not explicitly asked people to go through and evaluate it yet, will do so in the morning. |
Does a language implementation of something like https://github.com/alaviss/union/ or like Crystal's runtime union types (but not implicit or merged with union typeclasses) solve the problems listed in the issue? type
A = object
x, y: int
Foo = union(int, A)
# under the hood an object variant, but structural and commutative
assert Foo is union(A, int)
proc `$`(f: Foo): string =
if f of int: # reuse object inheritance check syntax
result = $int(f)
elif f of A:
let f = A(f)
result = "A(" & $f.x & ", " & $f.y & ")"
# or as case statement
case f
of int: $f # automatic type narrowing? doesn't really make sense but not essential
of A: "A(" & $f.x & ", " & $f.y & ")"
# cases exhausted
var f: Foo
# A is convertible to Foo
f = A(x: 1, y: 2)
assert $f == "A(1, 2)"
# so is int
f = 456
assert $f == "456"
# recursion and distinct types
type
StringLeaf = distinct string
IntLeaf = distinct int
ParentTree = distinct seq[Tree]
Tree = ref union(StringLeaf, IntLeaf, ParentTree)
let tree = Tree(ParentTree("abc")) (I expect recursion here to work because the following code works, but it might not be reliable) type
Bar = int | ref Foo
Foo = float | Bar
assert int is Foo
assert (ref int) is Foo
assert (ref ref int) is Foo
assert float is Foo
assert (ref float) is Foo
assert (ref ref float) is Foo
assert string isnot Foo
assert (ref string) isnot Foo This way each branch is tied to a type, construction and destructuring are much simpler to implement (and not ambiguous with existing Is this sound, or is there maybe a problem it doesn't solve? |
I know you don't appreciate the feature but I don't know why that is. It unifies the potential different pattern matching DSLs and is as beneficial as having a unified syntax for assignment and equality for generic code. |
It only unifies a single keyword, everything else is up to the macro author to write, so it can be hardly called 'unifying'. Assignment and equality seem to have different syntax: = vs == and they don't allow the user to put on complex custom DSL in the body. |
In order for the unification claim to hold true we must ensure that every DSL implemented with case statement follows the same patter and largely has the same semantics, which more or less implies some standardized approach for this, in which case it does not make any sense to aim for several different macros that are "unified" yet different somehow, as opposed to a single standard solution. |
Let's assume I have a custom type with two fields that I want to be able to match against an array-like pattern: type
Pair = object
x, y: int
var p: Pair
macro `case`(p: Pair; u: untyped) = ...
case p
of [x, _]:
Now if I happen to use something more complex than Just like you can lift |
The RFC also provides unified interface -- I would argue that it is even more unified because it does not care about
So the multiple macro solution does not provide any win here -- unless I'm expected to write the same boilerplate for which might be unified into a single |
Cool but case statement macros predate this RFC. So ok, we don't need case statement macros leaving us with the question of whether to use |
Since it was asked earlier, here is an example of Haskell's coverage spanning nested pattern matches. https://old.reddit.com/r/haskell/comments/13tahn8/til_lower_your_guards_in_ghc_9/ |
@Araq for pattern matching, what do you think of the following action plan about finally deciding on how it should be implemented in the language and then proceeding with the implementation. Over the last years (and different implementations) discussion largely revolved around multiple different topics, going into "That doesn't mean much beyond the fact that people avoid dependencies for various reasons." and other things. I think at this point most of the existing solutions have been tried, so the questions that are left are rather yes/no kind
Note that "yes" does not mean "yes, any garbage that you write will be merged" but rather "people who are willing to design and implement this feature and receive the feedback on the implementation quality can expect their work to actually move towards the final goal of getting the pattern matching in the stdlib within a reasonable amount of time (post 2.0 or post 2.2 or what you call it, just not years and years of waiting)" |
If none of these are affirmative, then the whole discussion can pretty much end there, otherwise we can go to the
In this order of importance |
Yes, in principle pattern matching would be accepted into the language core. It can be done like so:
Notice that even though the syntax is the least important part, it's the part that most likely implies changes/additions to the core language since the rest of Nim excels at AST rewrites. |
I wasn't going to comment on the pattern matching part of this proposal (counterproposal for ADTs in #527) because it's not something I'm invested in, but since we're talking about adding it to the language I have some issues.
No it's not. You break this. proc foo(x: int) =
const abc = 1
case x
of abc: echo "was the constant abc"
else: echo "was any other number"
foo(1) # was the constant abc
foo(2) # was the constant abc A rule "any constant in scope will be resolved" here would be bad too, because then when you lose access to the constant or write it with a typo, you use a match all pattern that can cause bugs. Either use
For "field renaming" I understand, but for renaming a pattern? case x
of (a, b): c: ... There's not really a choice other than
I think some people talked about preferring just having Collection matchingI put the specific issues with the collection matching in this proposal in the following details block because I think there is a more general problem which I will describe after.
Due to how bitset iteration works, this pattern is O(n^p) where p is the number of variables you used and n is either the size of the set if we are considering the
This makes it seem like of [first, second, third, .., penultimate, last]: In this case I think it would be better to do: of [first, second, third, ..middle, penultimate, last]: Where (end of details) The more general problem is this is the exact kind of stuff we shouldn't have in the language and libraries should (and already do). In general I think we should minimize the "conditionality" of the pattern matching and focus on the destructuring syntax so we can get it consistent with the existing language (tuple unpacking matches tuple construction, so we should match named tuple construction, object construction etc). Even the ADTs in this proposal serve this purpose: we have a simpler alternative to the conditionality of object variant I'm not even particularly passionate about destructuring. There's not much ergonomic difference between Bonus:
The tag absolutely needs to exist. There is no overhead. The alternative is C unions (
??? |
Thanks for the detailed feedback! That Note that I don't intend for field renaming to extend to patterns. Right now, this RFC only allows for field renaming to work on fields in objects and named tuple: I don't want it to act as a general "bind" statement because it isn't needed and it adds syntactic complexity. eg. every time you want to bind to an expression, you can just use its inner fields instead. (this is because a lot of stuff is moved to the Re: minimizing the conditionality, I think I agree - for a first attempt I want to implement the You're very right about the tag also: I don't know what I was writing there? I think I meant to refer to the overhead of having all cases in a variant object vs. only having to worry about the size of the greatest, though I'm not sure if Nim's object variants are implemented in this fashion (it would be a bit naive if they were). It certainly needs a tag of course. |
Fwiw I personally only want For binding of values I would add case obj
of Obj(x) where x == 10: echo "match"
of Obj(x: _ as y) where y == 21: echo "different match"
else: echo "no match"
Whether |
Also, @metagn a note on exhaustiveness (because I said I'd provide that ages ago :-P). I generally designed this so that expressions that can contribute to the exhaustivity of a pattern match are done within the (lol @Araq - looks like we're on the same page) This ideal perhaps falls apart a bit with collection matching as you noted. One, collection types are very hard to be exhaustive on: if you're not working with (short) arrays you need a wildcard Anyway, that's just a bit of a brain dump. I think I agree that ADTs + destructuring are the most important part of this proposal, though I probably would have argued for |
What's the point of collection matching though? A collection typically holds a variable number of things and the pattern by its nature can only deal with a fixed amount of variables. Sounds like solving a problem for people who used the wrong types to begin with. |
After thinking it over for a while I really like @Araq's value binding syntax. It sidesteps the backwards compatibility issue @metagn brought up nicely while the sugar keeps it clean for the most common use case, inner types of objects / named tuples. However: the second most common case is single types within an ADT: ex. type JsonValue = union
Boolean: bool
Number: int
String: string
Array: seq[ref Value]
Object: Table[string, ref Value] Matching upon those would then look like this: case value
of Boolean(_ as x):
...
of Number(_ as x):
...
of String(_ as x):
...
of Array(_ as x):
...
of Object(_ as x):
... Which is a bit gross. But I don't know what kind of syntax sugar would work well here and not conflict with existing case semantics. |
For those interested, the issue is fixed now. |
Related: #548 |
Abstract
Proper sum types and structural pattern matching for Nim by the introduction of
union
and extension ofcase
with structural matching and andwhere
clauses. Heavily inspired by Rust, Swift, Racket, Python, and others, and adapted to Nim's particular type system and syntax.Motivation
I write a lot of Nim code, and I write a lot of Rust code. I find both to be excellent languages with their own unique tradeoffs and specialties. But there is one thing in particular I greatly miss when going from Rust to Nim: and that is algebraic data types and pattern matching.
Algebraic data types (ADTs), for those unfamiliar, are an extremely powerful data structure that are useful for modelling a wide variety of types. They allow, by the composition of product types and sum types (as well as ordinary types), the modelling a wide variety of different data structures.
Nim does have algebraic data types -
objects
may be composed arbitrarily. Their application to product types is straightforward: product types are directly analogous to structs (or in Nim, standard objects). Sum types complement product types and can be thought of as their dual: they act as a wrapper around different types, letting you interact with their inner structure only by safely pattern matching upon the container. They are best represented by tagged unions, however, which Nim lacks a direct implementation of.Variant objects are capable of modelling tagged unions and thus sum types. These are useful as, well, variants of objects, but have a number of serious drawbacks when used as sum types that the first portion of this RFC aims to address.
As for pattern matching: Nim has the case statement, which alone is sufficient to make use of the inner contents of an sum type (or currently, switch on a
kind
label). But structural pattern matching is the bread and butter of ADTs: aside from simply being cleaner than nestedif
statements, it provides guarantees of exhaustiveness not otherwise given by the alternative. It is perhaps best exemplified by the ML family of languages (including Rust): but other languages have picked it up offering their own takes on it, notably Racket, Swift, and Haxe. This RFC pulls from a number of different designs to provide a powerful and Nim-esque model.Description
This RFC is two-fold. It introduces a new fundamental type,
union
, with similar semantics to Rust'senum
. It also extends Nim's existing case statement with support for structural pattern matching and particularly that of unions/objects/tuples (although it supports lists as well), and introduces a new supplementalwhere
clause in order to cleanly and powerfully do so.union
is a type that functions similarly toenum
at a surface level. It takes a list of identifiers as fields/variants/values, and an instance of a union may only ever be one of the fields. However, each field of the union may additionally be of its own entirely separate type (or none). This is best exemplified with the classic "shape" example:These fields of the unions are unordered, unqualified (via the same mechanism as overloadable enums), and unique. The inner fields are each of distinct types: entries used solely for their label (ex.
Point
) are implicitly ofvoid
type. Instantiating a union, then, is done by picking a particular variant, and providing data for the construction of its associated type (or none) as needed. The syntax of this is again best shown with an example. As a special syntax rule, tuples and objects may drop their extraneous parentheses on construction.To support proper and powerful pattern matching, we introduce three new extensions to the language: an extension for existing
case
/of
statements to support (nested) structured types (array
,seq
,tuple
,object
,union
), an extension forof
expressions in regular conditionals to support the same, and a newwhere
expression that functions as a reusable "guard". Just likeunion
, these language features combine to provide power with minimal new syntax. Following existing case semantics, this pattern matching may be exhaustive (to be described later), or non-exhaustive and followed byelif
andelse
statements.Elements of a tuple may be matched in two ways: by providing a value, which will match all tuples with that value at that location, or by binding them to a new identifier, which will match all tuples with any value at that location, and will then be made available as a variable within the branch. New variables introduced by pattern matching in this fashion follow the mutability of the matched-upon statement they are a part of. As is the case elsewhere, binding an element to the
_
identifier discards it.where
guards allow for restricting the matching of a case based on an arbitrary conditional expression. They have access to any bound fields: and so can be used to restrict variables to a value while still allowing access to that value among other, arbitrary conditions.where
clauses follow the main match section of anof
line and are considered to be attached to the main match rather than theof
itself to allow for the reuse of labels to follow existing case semantics: see the example below. Of important note is the intersection between variable bindings and repeated cases: in order for a variable to be accessible within the body of the case, it must be bound (in a consistent location) in every label of that statement. We do not disallow this at a binding level because the bindings are still helpful forwhere
clauses.The exhaustiveness of a pattern match is determined on a best-effort attempt. Fields of booleans, enums, and unions consist of few possibilities and may be exhaustively matched upon. Guards do not contribute to the exhaustivity of a search as they may contain arbitrary conditional expressions: and so must typically be followed by a case for the same match without the guard, or
else
.Named tuples and objects are matched identically to tuples, with one exception: the identifier name when binding a field must be the name of the original field. If this is undesired, the syntax
oldident: _ as newident
is introduced to allow for explicit rebinding. Unions are matched by specifying their label, and then their inner type in parenthesis. Tuples and objects within a union may elide their parenthesis.Previous thoughts on matching sequences, arrays, and sets are collapsed in the below block. I no longer think they're terribly important. They are also not updated for the new backwards-compatible
_ as x
syntax.Matching sequences, arrays, sets
Matching sequences and arrays is done slightly differently. To distinguish them from tuple and object matching, brackets are used instead of parenthesis. These are not elided within a union type as the parentheses of tuples and objects are. Matching of elements is done similarly to tuples: individual elements of the sequence/array are matched positionally, the variable/value matching distinction remains the same, andwhere
guards are again available for conditional treatment of bindings.As sequences may be of unknown length, and arrays may have uncomfortably long lengths, the
..
operator is introduced/reused to denote a range of fields we do not care about matching, with the syntax[first, second, .. last]
. As a result, the position of fields may be specified either from the beginning or from the end of the list - it is not possible to match a field an arbitrary distance into a long list within the match pattern itself. Again, however,where
guards provide a more generic way to do so.Matching sets and tables mostly follows list semantics. Both are wrapped in braces instead of brackets or parentheses. As the elements of sets and tables are never named: the
:
does not function as it does in tuples and objects, instead, it takes its regular semantics within tables to denote a key-value pair. Of additional note is the..
operator: as sets and tables can be of unknown length it is helpful: however, as they are unordered, the..
operator must come last.While the exhaustivity of pattern matching is important: frequently you want to disregard it, and only worry about a single case. Rust provides the
if let
construct for this use case, which I find very useful but pretty gross syntactically. Instead, this RFC uses the syntaxif shape of Square(side: binding)
. The bindings become immediately available for use within the body of the if statement and within any parts of the if statement following that clause, as shown by the following examples:Following the semantics above, these extended
case
andif ident of match
statements should also allow unwrappingOption[T]
types. What this looks like is not yet specified.Code Examples
Algebraic Data Types
An example that perhaps more showcases the utility of ADTs is that of an abstract syntax tree for the simple lambda calculus.
Note that other structures needed for an implementation of the simple lambda calculus (including a type system) are also efficiently and aesthetically modelled with ADTs.
Pattern Matching
Backwards Compatibility
This RFC is almost fully backwards compatible with Nim. It will cause issues with anything using the new
union
orwhere
keywords, and with any templates or macros directly inspectingcase
orof
.A previous version of this RFC had an issue around pattern matching: the previous syntax was
of x
, wherex
is a newly bound identifier. This conflicted with the existing semantics to resolve constants in those expressions. This was fixed by changing pattern matching to require explicitly rebinding the catchall character_
, i.e._ as name
, with syntax sugar in selected cases (named tuples & structs).Notes
To end with some notes: the general design of
where
guards is to provide all of the parts of pattern matching that cannot be exhaustive, in one general construct. Hence the use ofwhere
for things regarding ranges, lengths, existing variables, generic types (although: type constraints can be exhaustive. this is specific enough that i thought it not worth special syntax).where
was also chosen over reusingif
because I wanted to emphasize that there is not a correspondingelif
/else
as it is, already, what amounts to anelif
clause within the broader case statement.I think I like
oldident: newident
for field renaming. It matches object constructors.oldident as newident
is an alternative, however.I only mentioned it briefly, but because I've seen it discussed in other implementations of pattern matching in Nim: the mutability of the introduced bindings while pattern matching follows from the mutability of the matched-upon object, that is, you're directly mutating the matched-upon object. Other languages follow this pattern and I see no reason why this + variable shadowing if you really want to make it mutable are good enough.
I'm considering allowing guards to be reusable on
for
blocks a la Swift: withfor binding in iterator where condition
acting as syntax sugar to the commonfor binding in iterator: if condition
. Is this desired? Is this good or bad for language consistency? Are there other placeswhere
clauses could be used?The
else
statement andcase _
are functionally the same. I believe this is currently the case.References and Acknowledgements
match
statementwhere
statementmatch
statementMany thanks to
fungus
: which implements a near-identicalunion
and quite similarmatch
already as macros.Many thanks to
patty
: particularly for their "does not yet work" section, ha.Many thanks to Beef, for having good taste and calling my stupid ideas stupid.
The text was updated successfully, but these errors were encountered: