Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features of a pattern matching syntax #337

Open
zspitz opened this issue Aug 28, 2018 · 42 comments
Open

Features of a pattern matching syntax #337

zspitz opened this issue Aug 28, 2018 · 42 comments

Comments

@zspitz
Copy link

zspitz commented Aug 28, 2018

Note The latest version of the design based on these principles can be found here.

What features / attributes should a VB.NET pattern matching syntax have?

I propose the following:

  • Allow the same syntax for patterns in a boolean-returning expression (introduced with a dedicated keyword), as for patterns in a Case clause. This enables patterns to be used as the condition of If ... Then, Do While ... blocks and the like.
  • Allows the pattern to introduce variables into the child scope -- this will be either the Case block or the block following the match expression
  • Pattern-neutral, not oriented specifically to a type-checking pattern
  • Allow When clauses for additional conditions
  • Allows for nested patterns -- certain pattern types can contain other patterns

I've written some examples, and a proposed spec in line with these goals here. But the purpose of this issue is to discuss whether these goals are valid and relevant.

@zspitz
Copy link
Author

zspitz commented Dec 15, 2018

One other point:

@ericmutta
Copy link

ericmutta commented Dec 15, 2018

@zspitz I've written some examples, and a proposed spec in line with these goals here.

Thanks for thinking about this feature in detail and going as far as writing a grammar!

I think the goals are reasonable though I would add one more: keep it really really really simple.

The simplicity goal matters a lot because every feature is a product: it needs a spec, it has to be implemented in the compiler, implemented in the IDE, tested in all scenarios, documented (with samples) and then supported forever. This is a lot of work!

So to keep it simple, we should find the minimum viable functionality for this feature, hope that gets implemented and then expand from there in the future (otherwise none of it will ever get done in our lifetime).

From my own experience with the feature in C#, the type checking pattern is really all we need (and the C# docs seem to focus on mostly that). And we could probably do it without adding a new keyword and sticking to Is, so borrowing from some of the examples you gave:

'test the type without introducing a variable (not strictly necessary but...)
If obj Is String Then Console.WriteLine("obj is a string")

'test the type and introduce a variable.
If obj Is s As String Then Console.WriteLine(s.Length)

'can also assign the boolean result to a local variable
'(useful for extending scope of variable "s" beyond block that checks test1/test2).
Dim test1 = obj Is String
Dim test2 = obj is s As String

'we should probably use AndAlso instead of When for simplicity.
If obj Is s As String AndAlso s.Length > 5 Then Console.WriteLine(s.Length)

'can do the above with Select too:
Select Case obj
  Case Is String
    Console.WriteLine("obj is a string")
  Case Is s As String
    Console.WriteLine(s.Length)
  Case Is s As String AndAlso s.Length > 5
    Console.WriteLine(s.Length)
End Select

'...and in loops that expect a boolean condition:
Do While obj Is s As String
 Process(s)
 obj = GetNextObject()
Loop

Do Until obj Is s As String
  ProcessObjectsThatAreNotStrings(obj)
Loop

In short:

  1. We are introducing a boolean-valued type checking and (optional) conversion expression.
  2. It returns false if the type check fails and sets any introduced variable to Nothing.
  3. It returns true if the conversion succeeds and assigns the converted value to the introduced variable.
  4. The introduced variable (if any) is scoped to the block containing the expression.

If for example the above functionality is all we got in order to keep things as simple as possible, does anyone feel something critical is missing? Remember: we can get really fancy with this, but if we do, it is unlikely to ever get implemented because it will be too much work!

CC: @KathleenDollard some food for thought here for the LDM as you consider this feature.

@paul1956
Copy link

Generally I love it and would use this functionality constantly it would dramatically simplify code.
Questions
Can "If obj Is s As String Then Console.WriteLine(s.Length)" have an else case? If not, then how would "sets any introduced variable to Nothing", ever be tested or be useful? The variable would not be in scope. If there is an else what Type is "s" certainty not String.

In the Do Until, once you fall out of the loop "s" is not in scope so the assignment seems useless, so I am not sure if the assignment feature is useful or could be tested.

@ericmutta
Copy link

@paul1956 Can "If obj Is s As String Then Console.WriteLine(s.Length)" have an else case? If not, then how would "sets any introduced variable to Nothing", ever be tested or be useful? The variable would not be in scope. If there is an else what Type is "s" certainty not String.

I imagine it should work like this:

If obj Is s As String Then 
  'can use variable "s" here and it will be a non-null string.
Else
  'can't use variable "s" here because it is not in scope.
End If

@paul1956 In the Do Until, once you fall out of the loop "s" is not in scope so the assignment seems useless,

That's a good point! It would be allowed for completeness much like assigning a variable to itself is allowed and equally useless. Though there could be some utility in the version that doesn't introduce a variable:

Do Until obj Is String
  ProcessNonStringObject(obj)
  obj = GetNextObject()
Loop

@zspitz
Copy link
Author

zspitz commented Dec 16, 2018

(An updated version is here.)

@ericmutta In thinking about what the minimum viable functionality would look like, I've reached the following conclusions (I apologize for the length).

TL;DR

  1. Extend Is and Case for pattern matching -- <expression> Is <pattern> and Case <pattern>; instead of a dedicated keyword that introduces a pattern.

  2. The same syntax should work with both <expression> Is <pattern> and Case <pattern>

    2a. Is should be extended to allow all current Case syntax -- Case <expression> To <expression>, Case <expression>, <expression>; and should use value equality where available

    2b. Case should be extended for all current Is syntax -- Case <expression> should support reference equality and comparison against Nothing (Select Case on identity (reference equality) #119)

  3. Since Case and Is already support simple expressions, anything until the end of the condition / Case clause must be considered part of the pattern, to prevent ambiguity.

    3a. The only way to allow for additional logic AFAICT is to wrap it in a single-per-pattern-expression When clause.

    3b. Trying to reuse AndAlso for this purpose will only increase complexity.

  4. Case should continue to not require Is.

(Possible grammar at the end.)


If pattern matching is just another type-checking syntax (admittedly also allowing introduced variables), then I don't think the feature can possibly justify itself. (In fact, the compiler could do the same thing without any additional syntax.)

But pattern matching is so much more than type-checking -- it's saying:

  • does the expression on the left match the pattern expressed by some syntax on the right?
  • if it does, assign parts of whatever is being matched to new identifiers

From my own experience with the feature in C#, the type checking pattern is really all we need (and the C# docs seem to focus on mostly that).

Pattern matching in C# is still in its infancy; see the F# documentation for a better idea of the range of potential patterns.

Having said that, you are quite correct -- it's going to be harder to add all the possible patterns right at the beginning. However, any first steps in pattern matching must specify at least two things:

  • What syntax introduces a pattern? Also, where does the pattern end?
  • One "killer app" pattern, e.g. the variable-as-type pattern

It also should allow for the goals outlined above, even if these goals aren't implemented immediately; and should also be compatible and consistent with existing syntax.


AFAICT there are two possible ways to insert pattern matching into the language:

  • Introduce a new dedicated keyword (e.g. Matches)
  • Extend the behavior of the existing Case and Is

I prefer extending Case/Is, because:

  • Case already has a (albeit very limited) form of pattern matching
  • Having a new keyword means users of the language have to remeber yet another syntax
  • There is already an inverse for Is in the conditions of If statements and the like: IsNot. For a dedicated keyword, it would be necessary to choose between no inverse -- If Not o Matches String Then -- or something equally awkward..

although a dedicated keyword might be simpler to implement, because we wouldn't have to deal with the historical usages of Is and Case.


If we extend Case/Is, we shouldn't require Is for patterns in Case clauses. Forcing Is to be required, would result in three different rules for whether Case needs to be followed by Is:

  • Case <expression> -- the Is cannot be used, per the spec
  • Case Is <operator> <operand> -- the Is is optional, also per the spec
  • Case Is <pattern> -- the Is is required

which I think would be very confusing.


Also, assuming we extend Case/Is, since VB.NET already supports both Case <expression> and Is <expression>, any other expressions until the end of the condition / Case clause must be considered part of the pattern. Otherwise the following would be ambiguous:

Dim o As Object
Dim foo As Boolean
Dim bar As Boolean
If o Is foo AndAlso bar Then

Should foo AndAlso bar be evaluated first? Or should o Is foo be evaluated first?

C# doesn't have this problem, because C# doesn't allow case with an expression, so the following:

object o = null;
bool bar = true;
if (o is bool b && bar)

is unambiguous -- first perform the pattern match, then the logical AND.

(NB. OTOH, this does simplify things a little, because we don't need an explicit literal pattern.)


If a pattern expressions must be greedy (i.e. go to the end of the condition / Case clause), then extra logic at the end of the pattern cannot use a simple boolean expression; we need a special clause for this purpose. C# introduces this clause with the when keyword.

Specifically in VB.NET, we can't simply reuse an existing keyword such as AndAlso to introduce this clause, because AndAlso might be part of the expression in the pattern.

We could make up all sorts of complicated rules to disambiguate, but the simplest would be to introduce a new keyword -- When.

(NB. A similar usage of When already exists in VB.NET, when applying a condition to a Catch exception.)


Taking into account the need to define the following from the start:

  • where a pattern expresion (an expression containing a pattern match) can be used,
  • what are the parts of a pattern expression,
  • patterns that cover the existing behaviors of both Case and Is
  • the additional new variable+typecheck pattern

I think the following is the "minimum viable functionality":

Where a pattern expression can be used:

BooleanOrPatternExpression
    : BooleanExpression
    | Expression 'Is' PatternExpression
    ;


// If...Then..ElseIf blocks

BlockIfStatement
    : 'If' BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
      ElseIfStatement*
      ElseStatement?
      'End' 'If' StatementTerminator
    ;

ElseIfStatement
    : ElseIf BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
    ;

LineIfThenStatement
    : 'If' BooleanOrPatternExpression 'Then' Statements ( 'Else' Statements )? StatementTerminator
    ;


// Loops

WhileStatement
    : 'While' BooleanOrPatternExpression StatementTerminator
      Block?
      'End' 'While' StatementTerminator
    ;

DoTopLoopStatement
    : 'Do' ( WhileOrUntil BooleanOrPatternExpression )? StatementTerminator
      Block?
      'Loop' StatementTerminator
    ;

// cannot introduce variables in to child scope here
DoBottomLoopStatement
    : 'Do' StatementTerminator
      Block?
      'Loop' WhileOrUntil BooleanOrPatternExpression StatementTerminator
    ;

ConditionalExpression
    : 'If' OpenParenthesis BooleanOrPatternExpression Comma Expression Comma Expression CloseParenthesis
    | 'If' OpenParenthesis Expression Comma Expression CloseParenthesis
    ;


// Within a Case clause

CaseStatement
    : 'Case' PatternExpression StatementTerminator
      Block?
    ;

What are the parts of a pattern expression?

PatternExpression
    : Pattern ('When' BooleanExpression)?
    ;

What patterns should be supported from the start?

Pattern
    // patterns with subpatterns
    : Pattern ',' Pattern                    // OR pattern (already supported in Case)

    // patterns without subpatterns
    | 'Of' TypeName                          // Type check pattern -- matches when subject is of TypeName
    | Identifier 'As' TypeName               // Variable pattern -- introduces a new variable in child scope
    | 'Is'? ComparisonOperator Expression    // Comparison pattern
    | 'Like' StringExpression                // Like pattern
    | Expression 'To' Expression             // Range pattern
    | Expression                             // Equality pattern -- value/reference equality test against Expression
    ;

@zspitz
Copy link
Author

zspitz commented Dec 16, 2018

Ping @bandleader

@paul1956
Copy link

If the powers in charge can agree on a grammar and ultimate feature set, it would be nice to get "Is TypeName" and "Is Identifier As TypeName" out first. If all the things being proposed this is the one thing needed yesterday and would simplify lots of VB Code..

@ericmutta
Copy link

@zspitz Otherwise the following would be ambiguous:

Dim o As Object
Dim foo As Boolean
Dim bar As Boolean
If o Is foo AndAlso bar Then

It isn't ambiguous because it doesn't compile at all! Compiler says Is operator does not accept operands of type Boolean. Operands must be reference or nullable types.

Ultimately the choice between AndAlso and When is likely to be a matter of preference, mainly because both keywords already exist in the language so nothing new is being added, regardless of which you choose. I personally prefer 'AndAlso' mainly because it makes clear the short-circuiting nature of the expression (i.e the part to the right of AndAlso will not run if the type-check and conversion on the left did not succeed).

@zspitz I've reached the following conclusions (I apologize for the length).

I believe that when it comes down to implementing this, the team will appreciate the level of detail in your comments, it is clear you have given this a lot of thought! 👍 👍

@zspitz
Copy link
Author

zspitz commented Dec 17, 2018

@ericmutta

Thanks for the kind words; I hope you're right, and this will enable the team to move forward on pattern matching that much faster.

It isn't ambiguous because it doesn't compile at all!

If o Is foo currently doesn't compile, as you've noted. But if we extend Is to follow everything Case does today, then If o Is foo Then would compile just fine.

But even then, trying to push more boolean expressions onto the pattern would be ambiguous.

@bandleader
Copy link

bandleader commented Dec 17, 2018

o Is foo AndAlso bar -- It isn't ambiguous because it doesn't compile at all! Compiler says Is operator does not accept operands of type Boolean. Operands must be reference or nullable types.

@ericmutta In addition to what @zspitz noted above -- it's nevertheless ambiguous in parsing. The fact that bar is Boolean is not known until the binding stage of the compiler. (Also, compilers should anyway never parse differently based on semantics.)

This distinction (between lexical and semantic analysis -- done by the parser and binder respectively) is often overlooked in these discussions. More about that later.

@bandleader
Copy link

bandleader commented Dec 17, 2018

  1. Another issue with extending Is to work with patterns and supporting an expression as a pattern: Is currently checks for reference equality, whereas expressions would presumably check for equality including IEquatable equality/.Equals (and it would have to because that's what Case does as well). So obj Is otherObjWhichEquatesToObj would suddenly have to return True. Aside from breaking existing code, there's also the fact that VB does need a reference equality operator.

@bandleader
Copy link

bandleader commented Dec 17, 2018

  1. Problem: Case x currently does a value check. Case x As String is being proposed to do a typecheck-and-assign. This is very non-intuitive.

(For comparison, the following two lines do the same thing both lexically and in human understanding, just with specific types vs. type inference. This is critical for intuition and should be a hard requirement for any VB syntax which can take an As T or skip it.)

For Each x In collection
For Each x As String In collection

Same for:

Dim x = expr
Dim x As String = expr

I have some more thoughts against re-using Select Case and Is as-is for pattern matching, or at least for this pattern (typecheck-and-assign), even while realizing the similarity between them and the desire to mesh it all together. I'll try to post later.

@bandleader
Copy link

bandleader commented Dec 18, 2018

@zspitz Most importantly -- thank you for your amazing efforts in moving this forward. Here's hoping MS will take notice and reciprocate!

@zspitz
Copy link
Author

zspitz commented Dec 18, 2018

Another issue with extending Is to work with patterns and supporting an expression as a pattern: Is currently checks for reference equality, whereas expressions would presumably check for equality including IEquatable equality/.Equals (and it would have to because that's what Case does as well). So obj Is otherObjWhichEquatesToObj would suddenly have to return True. Aside from breaking existing code, there's also the fact that VB does need a reference equality operator.

Agreed. I think we'd need a dedicated keyword for boolean contexts If <expression> Matches <pattern> Then, Do While <expression> Matches <pattern> etc. (I'll update my original post accordingly.)

But I still think we should extend Select Case to patterns in general, without an additional keyword. Otherwise users will have to decide whether to use Case Matches <pattern> or Case <expression>, each with its own limitations.

@KathleenDollard
Copy link
Contributor

In general, I like letting the discussion flow without interrupting it.

This has been a great discussion. I appreciate everyone's work, and wanted to be sure you all didn't think you were talking into a void. It's also the start of Christmas holidays, which means a lot of folks are out until the first of the year.

@bandleader
Copy link

bandleader commented Dec 18, 2018

@KathleenDollard Appreciated, but I dunno; I would much prefer if you would indeed chime in: participation is not an interruption! It would be wonderful to see participation from other LDT members as well, even if it's one person who knows Roslyn well, and even better if he/she can ask other busy team members for their opinion.

@zspitz
Copy link
Author

zspitz commented Dec 19, 2018

@KathleenDollard To add to what bandleader said, I myself have almost no knowledge of how Roslyn works, or even compilers in general; both of which would inevitably preclude some design choices, while enabling others. I am only writing from my day-to-day usage of VB.NET and C#, and a little dabbling in F#. I think that some input from someone with Roslyn experience would be helpful in not barking up the wrong tree, or to confirm that a given design choice is relevant.

@zspitz
Copy link
Author

zspitz commented Dec 19, 2018

(This is a rewrite of the initial required functionality for pattern matching above. Thanks for everyone's help in clarifying this; in particular, the discussions I've had with @bandleader have been extremely illuminating.)

  1. Is cannot be extended, as it's currently used for reference equality; we'll require a dedicated keyword for boolean contexts -- e.g. If <expression> Matches <pattern> Then, Do While <expression> DoesntMatch <pattern>. However, we should extend Case to use Case <pattern> without an additional keyword.

  2. <expression> should be a valid pattern, using value equality where available, and reference equality if not. (This would have the same effect on Case as Select Case on identity (reference equality) #119).

  3. Anything valid in Case today should also be a pattern:

    • <pattern>, <pattern> -- OR pattern
    • <expression> To <expression>
    • Like <string expression>
    • [Is] <comparison operator> <expression>
  4. For newcomers to pattern matching who may not be familiar with the full range of potential available patterns, the most obvious patterns are those that relate to type-checking and variable-introduction. The initial release of pattern matching should therefore include these three patterns or pattern variants:

    • variable introduction + typecheck pattern
    • variable introduction pattern
    • typecheck pattern

    Figuring out the syntax for these is a separate issue (Roundup of proposed patterns on vblang #367). But whatever the syntax, it has to read well in nested patterns as well, even if currently the only nested pattern is the OR pattern.

  5. Since Case already supports expressions combined with operators, we must consider anything until the end of the condition / Case clause as part of the pattern.

    1. Additional logic cannot be expressed using additional logical operators, as they would be considered part of the pattern. AFAICT we'd have to wrap such logic in a single-per-pattern-expression When clause.
    2. We can't use AndAlso because it is an operator used currently in expressions and wouldn't clearly delineate the end of the pattern itself. When parallels C#'s when and cannot be an operator in an expression (it's currently valid only in Catch clauses).
  6. Since Case supports arbitrary expressions, any pattern must almost always be invalid as an expression, to allow reliable disambiguation. This is easily done by using some keyword to introduce the pattern, such as Case Of <typename> for a typecheck pattern, or If o Matches Dim x As Integer Then as a variable+typecheck pattern.

  7. Case should continue to not require Is.

  8. Some further questions:

    1. Specifying a non-match -- In a boolean context, Matches could be paralleled by DoesntMatch. Perhaps Case should not allow non-matches? There is precedent for this -- Case Is can currently be used, but not Case IsNot.
    2. How necessary is exhaustiveness at this point? Is it even possible, considering the way VB.NET makes rather fluid shifts between types? I know that currently C# doesn't implement exhaustiveness.
    3. IIUC, VB.NET requires Is because comparing two objects which have a default property would result in both being converted to their respective values via the default property, and value-equal comparing the results; it's then impossible to compare reference equality for the two objects. How would this issue be handled within pattern matching on the expression pattern?
    4. In the DoBottomLoopStatement, should we allow variables introduced with Loop While <expression> Matches <pattern> to bleed back into the body of the Do? Seems very counterintuitive.

And the current state of the grammar:

Where a pattern expression can be used:

BooleanOrPatternExpression
    : BooleanExpression
    | Expression 'Matches' PatternExpression
    ;


// If...Then..ElseIf blocks

BlockIfStatement
    : 'If' BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
      ElseIfStatement*
      ElseStatement?
      'End' 'If' StatementTerminator
    ;

ElseIfStatement
    : ElseIf BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
    ;

LineIfThenStatement
    : 'If' BooleanOrPatternExpression 'Then' Statements ( 'Else' Statements )? StatementTerminator
    ;


// Loops

WhileStatement
    : 'While' BooleanOrPatternExpression StatementTerminator
      Block?
      'End' 'While' StatementTerminator
    ;

// introducing variables with Until could only be used 
// by the When clause, not within the block
DoTopLoopStatement
    : 'Do' ( WhileOrUntil BooleanOrPatternExpression )? StatementTerminator
      Block?
      'Loop' StatementTerminator
    ;

// introducing variables with either While or Until could only be used 
// by the When clause, not within the block
DoBottomLoopStatement
    : 'Do' StatementTerminator
      Block?
      'Loop' WhileOrUntil BooleanOrPatternExpression StatementTerminator
    ;

ConditionalExpression
    : 'If' OpenParenthesis BooleanOrPatternExpression Comma Expression Comma Expression CloseParenthesis
    | 'If' OpenParenthesis Expression Comma Expression CloseParenthesis
    ;


// Within a Case clause

CaseStatement
    : 'Case' PatternExpression StatementTerminator
      Block?
    ;

What are the parts of a pattern expression?

PatternExpression
    : Pattern ('When' BooleanExpression)?
    ;

What patterns should be supported from the start?

Pattern
    // patterns with subpatterns
    : Pattern ',' Pattern                    // OR pattern (already supported in Case)

    // patterns without subpatterns
    | 'As' TypeName                          // Type check pattern -- matches when subject is of TypeName
    | 'Dim' Identifier ('As' TypeName)?      // Variable pattern -- introduces a new variable in child scope; as TypeName or Object
    | 'Is'? ComparisonOperator Expression    // Comparison pattern
    | 'Like' StringExpression                // Like pattern
    | Expression 'To' Expression             // Range pattern
    | Expression                             // Expression pattern -- value/reference equality test against Expression
    ;

Pinging @KathleenDollard @ericmutta @paul1956

@KathleenDollard
Copy link
Contributor

Notes from our meeting on this Wednesday

You all are awesome.

@ericmutta
Copy link

@KathleenDollard thanks for the update from the LDM!

I think this kind of iteration and feedback loop between the community and the LDM is really awesome and should become the way to do things going forward, that is:

  1. LDM shares what they are considering (this is important and helps the community focus on things that have the highest probability of happening before the universe dies).

  2. community rallies around that and talks about it to flesh it out.

  3. then LDM comes back with feedback and own thoughts.

  4. rinse and repeat until something epic happens.

You are all awesome (especially having language design meetings so close to X-Mas!)

@ericmutta
Copy link

@bandleader Another issue with extending Is to work with patterns and supporting an expression as a pattern...

I can see now that using Is could be more trouble than it's worth because Is already has several uses. What about Like though? The only thing this operator has done since day one is pattern matching, which is exactly what we are talking about here!

If obj Like String Then Console.WriteLine("obj is a string")

If obj Like s As String Then Console.WriteLine(s.Length)

Dim test1 = obj Like String
Dim test2 = obj Like s As String

If obj Like s As String When s.Length > 5 Then Console.WriteLine(s.Length)

Select Case obj
  Case Like String
    Console.WriteLine("obj is a string")
  Case Like s As String
    Console.WriteLine(s.Length)
  Case Like s As String When s.Length > 5
    Console.WriteLine(s.Length)
End Select

Do While obj Like s As String
 Process(s)
 obj = GetNextObject()
Loop

Do Until obj Like String
  ProcessObjectsThatAreNotStrings(obj)
Loop

...something to consider before introducing an entirely new keyword and creating a scenario where we have two keywords doing pattern matching 👍 It also seems to negate naturally and do ranges/tuples quite nicely:

If num Like 1 To 10 Then Console.WriteLine("number between 1 to 10")

if num Not Like 1 To 10 Then Console.WriteLine("number NOT between 1 to 10")

if MyTuple Like (String, Integer) Then Console.WriteLine("tuple of string and integer")

@franzalex
Copy link

@ericmutta
What about Like though?

I can't believe we didn't suggest this earlier! By extending its usage to include pattern matching, Like can be the perfect keyword short of introducing a new one.

@zspitz
Copy link
Author

zspitz commented Dec 22, 2018

@ericmutta @franzalex Note that @AdamSpeight2008 suggested it in #124.

@zspitz
Copy link
Author

zspitz commented Dec 23, 2018

Thanks to the members of the LDT for discussing this, and all your hard work on VB.NET; and for taking into account the community's contribution.

(I'm responding here to the meeting notes; @KathleenDollard if there's a better place to put it please let me know.)


(I'm addressing this first, because it affects some subsequent points.)

It's not clear how variable introduction without typechecking would work, or how type checking without assignment differs from the available TypeOf x Is .

Without nested patterns, there is indeed no difference. But with nested patterns -- patterns composed of other patterns -- and the pattern itself doesn't sufficiently enforce the type of the item in question, we may want to enforce a specific shape of parts of the item without having to explicitly name those parts. For example, with the tuple pattern:

Dim o As Object
Select Case o
    Case (Integer, Integer)
        Console.WriteLine("Pair of numbers")
    Case (String, String)
        Console.WriteLine("Pair of strings")
End Select

If every type check also requires variable introduction, we have the needlessly cluttered:

Dim o As Object
Select Case o
    Case (Integer Into x, Integer Into y)
        Console.WriteLine("Pair of numbers")
    Case (String Into s1, String Into s2)
        Console.WriteLine("Pair of strings")
End Select

It's not clear how variable introduction without typechecking would work

With nested patterns, the converse is also true -- I may want to extract part of the root matched value, without assigning a new name to the root. Using the array literal pattern (#141):

Select Case o
    Case {Into firstArg, String, String} When firstArg = "help" Or firstArg = "version"
        Console.WriteLine($"First argument -- {firstArg}")
    Case Else
        Console.WriteLine("Invalid first argument")
End Function  

Not all of us are happy with the reading of the If syntax. The human English wording would be more like "If o matches string, put it in x as a string."

@bandleader has also pointed out that Case x As String is rather counter-intuitive, because everywhere else in the language, omitting the As String doesn't change the basic meaning of the statement; both the following statements declare a variable, albeit of different types:

Dim x
Dim x As String

The same applies for method parameters:

Sub Foo(bar As String)
Sub Foo(bar) ' defaults to parameter of type Object

and in all other places where a variable is introduced into a child scope -- For, For Each, Using, and Catch.

However, the following two Cases mean very different things:

Case x ' is the case value equal to the already-existing expression `x`?
Case x As String ' `x` doesn't exist; if the case value be assigned to a String, introduce a new `x`

I think Into is an excellent choice, because as noted the variable introduction (Into x) follows from the
typecheck (Case String). In addition, Into already has a similar usage when inline-declaring a variable for a grouped or aggregate LINQ query.

(NB. What happens if there is an identifier String in scope? Would it be better to disambiguate with some keyword: Case Of String Into x or Case As String Into x?)

This will lead to a discussion about whether it is more important to read like English or to look like a declaration here.

I don't think "looking like a declaration" has inherent value. The only reason I can see to prefer Case Dim x As String over Case x As String is in order to visually distinguish from Case x; Case String Into x does this equally well, if not better.

#2. We are a little confused. The effect of #119 seems desirable, but not sure whether this is a pattern or evaluation (or what distinctions matter here).

This was only relevant if 1) using Is as the pattern-match operator in boolean contexts, 2) everything supported by Case is a pattern, 3) and the <expression> pattern would have the same behavior in both Is and Case. Since Is <expression> tests for reference equality, Case <expression> would also test for reference equality, and <expression> would be considered a special case of <pattern>.

Since 1) the LDT has decided on Matches for boolean contexts and, 2) some syntaxes matched by Case will not be patterns, this no longer appears relevant to pattern matching. (It's nice to have independent of pattern matching though.)

#3. We thought about commas...

Agreed.

Certainly everything that works in the context of a Case today should continue to work in a Case. But hesitate on moving syntax from Case to other places patterns can be used.

With nested patterns, if an expression is considered a special case of pattern, it becomes possible to write something like this (e.g. using the tuple pattern):

Dim o As Object
Dim x = 5
If o Matches (x, x+1, x+2) Then

#4. The linked "full range of potential patterns" is for F# and several of these are not available in other .NET languages.

I didn't mean to imply that all these patterns should be in VB.NET; only that while typecheck+variable is the big draw for those who have never used it, pattern matching is a far more generalized idea than just typecheck+variable. #367 contains a list of patterns that might be of value specifically in VB.NET.

#6. Can we get clarity on this. Is this basically saying there can't be ambiguity and we can't break existing code?

A rather specific ambiguity. For patterns which mean a literal expression in other contexts (e.g. the tuple pattern), since Case supports any expression, it is necessary to distinguish between Case <expression> and Case <pattern>. (C# doesn't have this problem, because initially only constants were supported by switch.)

(This may not be an issue when the type of the resulting literal expression is a value type. For example, even though this is ambiguous:

Dim t = (1, 2)
If t Matches (1,2) Then

between:

  • is it matching against the tuple pattern?
  • Or is it matching against a newly created tuple?

but it doesn't matter; since a ValueTuple is a value type, multiple instances of ValueTuple are the same as long as their members are the same.)

#7. Do you mean Case should not require Is where it does not require it today?

This is about using Case Is <pattern> to distinguish between matching against a pattern, and Case <expression> to check value-equality on an expression. The Is in Case would then be required (for patterns), optional (for comparison operators), or disallowed (for simple expressions), based on context. Really confusing.

#8 iii) Need clarity on what this is saying

I made a mistaken assumption here; it's irrelevant.

// introducing variables with either While or Until could only be used
// by the When clause, not within the block
// LDM thinks probably within the block as well

This was a typo -- when introduced with a While variables should be usable within the block, but when introduced with an Until, variables should not be available within the block.

@zspitz
Copy link
Author

zspitz commented Dec 24, 2018

@KathleenDollard

Emphasizing one additional point:

For background: a pattern is not an expression, but a thing that when matched results in an expression, in this context a Boolean expression.

A pattern may not be an expression, but every expression could be considered a pattern that matches on value-equality (o. With recursive patterns, this would enable using expressions as sub-patterns:

Dim x = 5
Select Case o
    Case (x, x+1)
End Select

@zspitz
Copy link
Author

zspitz commented Mar 20, 2020

(Post-LDM spec, incorporating Into for data extraction)

Pattern matching has two goals:

  1. Matching against a pattern (in BooleanExpression or Case block)
  2. Extracting data from the matched value (using Into identifier)

The following is a possible updated grammar, using the LDM-suggested Into for data extraction. In addition, there's a description of the scope rules for Into-introduced variables; as well as when they are considered initialized or not. Some additional points at the end.

// Redefine CaseStatement as using PatternClauses instead of CaseClauses
CaseStatement
    : 'Case' PatternClauses StatementTerminator
      Block?
    ;

// Redefine BooleanExpression as possibly a match against a pattern
// All the usages of BooleanExpression -- If Then, Do While etc. -- remain the same
BooleanExpression
    : Expression 'Matches' PatternClause
    | Expression
    ;

// CaseClauses and CaseClause are no longer needed

PatternClauses
    : PatternClause ( Comma PatternClause )*
    ;

PatternClause
    : PatternOrInto ('When' BooleanExpression)?
    ;

// Into introduces a new identifier holding the value matched by the rest of the pattern clause (Pattern + When)
PatternOrInto
    : Pattern ('Into' Identifier)?
    | 'Into' Identifier
    ;

MemberPattern
    : '.' Identifer Equals Expression
    | '.' Identifier `Matches` Pattern
    ;

Pattern
    // general nested patterns
    : 'AnyOf(' PatternClauses ')'    // OR pattern
    | 'AllOf(' PatternClauses ')'    // AND pattern
    | 'NoneOf(' PatternClauses ')'   // Multiple-pattern negation
    | 'Not(' PatternClause ')'       // Single-pattern negation

    | '{' PatternClauses '}'                         // array pattern
    | '(' PatternClauses ')'                         // tuple pattern
    | 'With {' MemberPattern (, MemberPattern)* '}'  // With pattern 

    // non-nested patterns
    : TypeName
    | '*'                                    // Discard pattern
    | 'Is'? ComparisonOperator Expression    // Comparison pattern
    | 'Like' StringExpression                // Like pattern
    | Expression 'To' Expression             // Range pattern
    | Expression                             // Expression pattern -- value/reference equality test against Expression
    ;

Note that @AnthonyDGreen discusses a syntax for rest of the array pattern; it could be applied to other similar patterns such as the tuple pattern.

Scope rules for Into-introduced variables

The Into-introduced variable should be in scope for the When clause.

For a pattern defined within a CaseExpression, the scope of the Into-introduced variable should (at least) be the body of the Case.

Select o
    Case String Into s
        Console.WriteLine(s)
End Select

If the pattern is defined in a BooleanExpression, and the BooleanExpression is part of the test of a block (Do While ..., If ... Then etc.), the variable should certainly be in scope for the child block immediately following the test:

If o Matches String Into s Then
    Console.WriteLine(s)
End If

Do While o Matches String Into s Then
    Console.WriteLine(s)
Loop

(RE: if the Else block should be in scope see open question #2 at the end.)

Initialization rules for Into-introduced variables

The variable should be initialized for when the match is successful:

Do While o Matches String Into s
    Console.WriteLine(s)
Loop

but not when the match is not successful:

Do Until o Matches String Into s
    ' Should report here "A variable has been used before it has been assigned a value."
    Console.WriteLine(s)
Loop

or when it might not have been successful:

Do
    ' Should report here "A variable has been used before it has been assigned a value."
    ' because the match hasn't succeeded until after the first iteration
    Console.WriteLine(s)
Loop While o Matches String Into s

As follows:

Construct Guaranteed initialization
If .. Then
If .. Then .. Else
If .. Then .. End If
If .. Then .. Else .. End If
If(..) operator
Within Then part / block
Do While .. Loop
While .. End While
Within the loop
Do Until .. Loop
Do .. Loop Until
Never
Do .. Loop While No guarantee, because of the first interation

Some further points relating to non-nested patterns

  1. Is it acceptable to redefine BooleanExpression like this?
  2. LDM has indicated that they want to start with the non-nested patterns first. But I think having a spec describing these patterns allows them to be added in the future.
  3. Can nullable reference types be used in a type check? If the syntax for NRT is different from nullable value types (e.g. Dim rnd As Random Or Nothing), then I think it should be disallowed. If the syntax resembles nullable value types (Dim rnd As Random?), then perhaps it should behave as some kind of pseudo-type, and the pattern o Matches Random? should desugar to o Matches AnyOf(Random, Nothing). Any Into-introduced variables would also have to be similarly desugared: o Matches Random? Into r -> o Matches AnyOf(Random Into r, Nothing Into r); and the compiler would treat it as an exception to (3) below. TBH it feels like more trouble than it's worth.

Some points about nested patterns:

  1. Thanks to @Happypig375 for the AnyOf / AllOf suggestion.

  2. How would the single-pattern NOT and multi-pattern NOT affect the scoping rules described above? (I'll update once I have an answer.)

  3. Into should be disallowed within the parts of the OR pattern, because there's no way to use an identifier which doesn't come from all the OR's sub-patterns.

    o Matches AnyOf(String Into s, Integer Into i) ' Either s or i is uninitialized

Unless all parts of the OR define the identifier with the same type:

o Matches AnyOf(1 To 10 Into i, (Integer Into i, String))
' On either side of the OR, i refers to an Integer, and is initialized

Open questions

  1. The selling point for When is that it offers room to put additional logic to customize the pattern:

    Select o
        Case String Into s When DateTime.Now.Hour > 6
    End Select
    

    It's only natural that the Into-introduced variable be in scope and initialized within the When clause:

    Select o
        Case String Into s When s.Length > 0
    End Select
    

    But for BooleanExpression containing a pattern:

    If o Matches String Into s When s.Length >0 Then
    End If
    

    because we could extend the scope of the Into-introduced variable to the end of the BooleanExpression, and then use AndAlso:

    If o Matches String Into s AndAlso s.Length >0 Then
    End If
    

    allowing When may not be necesary. FWIW, this is what C# does. instead of allowing When as part of the pattern as described in the above spec. The following will not compile:

    if (o is string s when s.Length >0) {
    }
    

    In short, the two choices AFAICT are:

    • allow when in BooleanExpression, or
    • extend the scope and initialization of Into-introduced variables until the end of the parent BooleanExpression, allowing AndAlso clauses to make use of the variable
  2. Should the Else block also be in scope for an Into-introduced variable from the test of the starting If block? And, if the If has a NOT pattern, should the variable also be initialized?

    If Not(o Matches String Into s AndAlso s.Length >0) Then
        ' s is in scope, but uninitialized
    Else
        Console.WriteLine(s)
    End If
    

    It might be simpler just to say that the logic should be flipped around in this case. But VB has a history of not requiring the logic to be adjusted -- Do While Not can be expressed in terms of Loop Until; as can Do ... Loop While Not in terms of Do ... Loop Until.

    Similarly, if a NOT pattern is used in one of the Case blocks, should the variable be in scope and initialized in the subsequent Cases?

     Select o
         Case Not(String Into s)
         Case Else
             Console.WriteLine(s)
     End Select
    

    What about Do Until Not(String Into s)?

@zspitz
Copy link
Author

zspitz commented Mar 22, 2020

@Happypig375 Yes, but not Not( ... ).

@zspitz
Copy link
Author

zspitz commented Mar 22, 2020

@Happypig375 It's really a broader question -- which takes precedence, an expression or a pattern? I guess in order to avoid breaking changes, an expression should take precedence.

At least for Not, there is another possible resolution; allow NoneOf to apply to a single pattern as well, with the same effect.

Alternatively, even though Not(0) is a breaking change, it's a very small one.

@ericmutta
Copy link

@zspitz this is a very lively and thorough discussion which would be awesome if any of it had a chance of being implemented, especially given the recent announcement that there are no plans to evolve the language anymore.

Until we as a community figure out the language evolution problem, I am afraid the vblang repo won't have much use and any discussions will be purely theoretical!

@zspitz
Copy link
Author

zspitz commented Mar 22, 2020

@ericmutta Agreed. But I hope things will be resolved eventually for the better, and at that point this discussion will become very useful. Also, I need to get the details of pattern matching out of my system, so I can continue working on something else.

@ericmutta
Copy link

@zspitz But I hope things will be resolved eventually for the better...

I salute your optimism 💪

I've been thinking about the way TypeScript is implemented (it compiles down to JavaScript) and perhaps we as a community could develop a VB pre-processor that compiles *.vbx files. These files would contain a superset of VB that is translated into the current VB language which is then compiled by Rosyln.

Much of the stuff we would like to see in VB (including this pattern matching business) boils down to syntactic sugar that could be implemented "the TypeScript way" (i.e by a pre-processor extension). If others like this concept perhaps we can create a seperate issue and discuss further 👍

@tverweij
Copy link

@ericmutta That will also need editor support and tooling support.

@ericmutta
Copy link

@tverweij indeed! I am glad you read the comment above, I was trying to tag you yesterday but it didn't work when commenting from my phone.

I remember you have started working with a company that has experience with programming languages and tools development. Perhaps this is something they could consider?

Rather than an entirely new IDE and change in name and a $500 price tag, if they could do it "the TypeScript way" (i.e as a Visual Studio extension that pre-processes VBX into VB) and charge say $99, I reckon most people would just buy the extension without thinking too hard.

The extension shouldn't introduce complex changes (e.g. the type system should be left alone), it should leverage Rosyln as much as possible, and the focus should be on the syntax sugar stuff like the pattern matching being discussed here.

Please talk to RemObjects and let us know whether this makes business sense for them 👍 🙏

@tverweij
Copy link

@ericmutta: I read everything here. And see what we can adopt and what not.

The extensible Idea from @VBAndCs and now from you is a nice idea, but almost not implementable. At least not for us in the existing toolchain where Mercury is being added.

But three things:
First: we are going to extend the language - much. What is added and what not is not decided yet. But this one has a really big chance although it will be implanted the same as the C# implementation.
Second: We have AOP for all languages for custom code generation. So this will work for Mercury too.
Third: With Mercury, forget about Roslyn. This is not a Microsoft language, so there is no Roslyn compiler.

But first we have to come to the point where we are on par with Visual Basic. We are working hard on that now.

@tverweij
Copy link

@ericmutta: That idea would also destroy the reason of the Mercury project. We started well before Microsoft declared VB dead/zombie.
The goal of the project is 2-fold:

  1. Remove all limitations from Visual Basic
  2. Make sure that you can compile it for any target / platform

Your idea would solve 1. but not 2. as the compiler would still be the limited Roslyn VB compiler.
So adding thing like unsafe, inline, lazy and reference returns would still be impossible.

@tverweij
Copy link

@ericmutta - last part, I promise :-)

About 2. - We are already compiling for MacOs (Cacoa), Linux (native), Windows (native), .Net, .NetCore and WebAssembly with the partial implemented version.

@bandleader
Copy link

bandleader commented Mar 24, 2020

Much of the stuff we would like to see in VB (including this pattern matching business) boils down to syntactic sugar that could be implemented "the TypeScript way" (i.e by a pre-processor extension).

@ericmutta A pre-processor like Typescript needs a lexer, parser, [binder,] lowerers, and a source emitter. In other words, much of the components of a compiler like Roslyn, with the most significant difference being a source emitter instead of an IL emitter. Thus, such a pre-processor should likely re-use Roslyn for the lexing, parsing, binding, etc. Once doing that, why not build your changes into Roslyn -- isn't that less difficult?

Of course the difficulty would be getting MS to accept those changes, when it wants to minimize the resources it spends on VB. I'm not sure they could be convinced to let the community maintain VB (including docs, etc.) just like F#.

Another option would be simply forking Roslyn and configuring VS to use that forked version. @AnthonyDGreen once described how to do that here.

@zspitz
Copy link
Author

zspitz commented Mar 24, 2020

Another option would be simply forking Roslyn and configuring VS to use that forked version. @AnthonyDGreen once described how to do that here.

Or build a language server that could choose between the original and the fork, and use VS Code or some other editor.

@ericmutta
Copy link

@tverweij ...last part, I promise :-)

Many thanks for the clarification :-) I see the goals for Mercury are much broader than what I was thinking about and it would be interesting to see how that works out (competing against popular, high quality free tools from a trillion dollar corporation is not going to be easy but somebody has to try :-))

@bandleader A pre-processor like Typescript needs a lexer, parser, [binder,] lowerers, and a source emitter.

That's the trouble with this whole situation: language tools need a lot of work and its highly unlikely to be done (effectively) on a part-time basis by people who have day jobs, mortgages and no experience in the field of compilers. The barriers to entry are high even before worrying about what Microsoft will or will not accept.

Unless @AnthonyDGreen's articles (thanks @aarondglover for sharing) miraculously reach the powers that be and convince them to keep evolving VB, I think the course taken by @tverweij with project Mercury is the only potential alternative for now. So hustle hard @tverweij you may very well save the day 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants