From f64f0692e4e84a87ab0ee74531b8db7a9a430868 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 09:26:18 -0700 Subject: [PATCH 01/54] Create the structure for the PEP 622 split --- pep-0622.rst | 3 +- pep-0634.rst | 117 +++++++++++++++++++++++++++++++++++++++++++++++++++ pep-0635.rst | 57 +++++++++++++++++++++++++ pep-0636.rst | 44 +++++++++++++++++++ 4 files changed, 220 insertions(+), 1 deletion(-) create mode 100644 pep-0634.rst create mode 100644 pep-0635.rst create mode 100644 pep-0636.rst diff --git a/pep-0622.rst b/pep-0622.rst index 3a56daf1995..ec0c09c2f53 100644 --- a/pep-0622.rst +++ b/pep-0622.rst @@ -10,12 +10,13 @@ Author: Brandt Bucher , Talin BDFL-Delegate: Discussions-To: Python-Dev -Status: Draft +Status: Superseded Type: Standards Track Content-Type: text/x-rst Created: 23-Jun-2020 Python-Version: 3.10 Post-History: 23-Jun-2020, 8-Jul-2020 +Superseded-By: 634 Resolution: diff --git a/pep-0634.rst b/pep-0634.rst new file mode 100644 index 00000000000..f264249ea3a --- /dev/null +++ b/pep-0634.rst @@ -0,0 +1,117 @@ +PEP: 634 +Title: Structural Pattern Matching: Specification +Version: $Revision$ +Last-Modified: $Date$ +Author: Brandt Bucher , + Guido van Rossum +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Replaces: 622 +Resolution: + + +Abstract +======== + +This PEP provides the technical specification for the ``match`` +statement. It replaces PEP 622, which is split in three parts: + +- PEP 634: Specification +- PEP 635: Motivation and Rationale +- PEP 636: Tutorial + + +Specification +============= + +TBD. + + +.. _Appendix A: + +Appendix A -- Full Grammar +========================== + +Here is the full grammar for ``match_stmt``. This is an additional +alternative for ``compound_stmt``. It should be understood that +``match`` and ``case`` are soft keywords, i.e. they are not reserved +words in other grammatical contexts (including at the start of a line +if there is no colon where expected). By convention, hard keywords +use single quotes while soft keywords use double quotes. + +Other notation used beyond standard EBNF: + +- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*`` +- ``!RULE`` is a negative lookahead assertion + +:: + + match_expr: + | star_named_expression ',' star_named_expressions? + | named_expression + match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + patterns: value_pattern ',' [values_pattern] | pattern + pattern: walrus_pattern | or_pattern + walrus_pattern: NAME ':=' or_pattern + or_pattern: '|'.closed_pattern+ + closed_pattern: + | capture_pattern + | literal_pattern + | constant_pattern + | group_pattern + | sequence_pattern + | mapping_pattern + | class_pattern + capture_pattern: NAME !('.' | '(' | '=') + literal_pattern: + | signed_number !('+' | '-') + | signed_number '+' NUMBER + | signed_number '-' NUMBER + | strings + | 'None' + | 'True' + | 'False' + constant_pattern: attr !('.' | '(' | '=') + group_pattern: '(' patterns ')' + sequence_pattern: '[' [values_pattern] ']' | '(' ')' + mapping_pattern: '{' items_pattern? '}' + class_pattern: + | name_or_attr '(' ')' + | name_or_attr '(' ','.pattern+ ','? ')' + | name_or_attr '(' ','.keyword_pattern+ ','? ')' + | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' + signed_number: NUMBER | '-' NUMBER + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + values_pattern: ','.value_pattern+ ','? + items_pattern: ','.key_value_pattern+ ','? + keyword_pattern: NAME '=' or_pattern + value_pattern: '*' capture_pattern | pattern + key_value_pattern: + | (literal_pattern | constant_pattern) ':' or_pattern + | '**' capture_pattern + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0635.rst b/pep-0635.rst new file mode 100644 index 00000000000..8e94efdef92 --- /dev/null +++ b/pep-0635.rst @@ -0,0 +1,57 @@ +PEP: 635 +Title: Structural Pattern Matching: Motivation and Rationale +Version: $Revision$ +Last-Modified: $Date$ +Author: Tobias Kohn , + Guido van Rossum +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Resolution: + + +Abstract +======== + +This PEP provides the motivation and rationale for PEP 634. + + +Motivation +========== + +TBD. + +This section should explain why we think pattern matching is a good +addition for Python. + + +Rationale +========= + +TBD. + +This section should provide the rationale for individual design decisions. +It takes the place of "Rejected ideas" in the standard PEP format. +It is organized in sections corresponding to the specification (PEP 634). + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0636.rst b/pep-0636.rst new file mode 100644 index 00000000000..bf378ed84aa --- /dev/null +++ b/pep-0636.rst @@ -0,0 +1,44 @@ +PEP: 636 +Title: Structural Pattern Matching: Tutorial +Version: $Revision$ +Last-Modified: $Date$ +Author: Daniel F Moisset , + Tobias Kohn +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Resolution: + + +Abstract +======== + +This PEP is a tutorial for the pattern matching introduced by PEP 634. + + +Body +==== + +TBD. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From b3da517633cb92926205eccd9881ee0bbc8ab8f2 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 12:01:31 -0700 Subject: [PATCH 02/54] Checkpoint -- converted up to and including Constant Value Patterns to the new style --- pep-0634.rst | 621 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 600 insertions(+), 21 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index f264249ea3a..e08311a92e6 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -19,18 +19,586 @@ Resolution: Abstract ======== +TODO: Do we write ``match`` statement or just match statement? +Currently we're inconsistent. + This PEP provides the technical specification for the ``match`` -statement. It replaces PEP 622, which is split in three parts: +statement. It replaces PEP 622, which is hereby split in three parts: - PEP 634: Specification - PEP 635: Motivation and Rationale - PEP 636: Tutorial +This PEP is intentionally devoid of commentary; all explanations of +design choices are in PEP 635. First-time readers are encouraged to +start with PEP 636, which provides a gentler introduction to the +concepts, syntax and semantics of patterns. + + +Syntax and Semantics +==================== + +TODO: Should we show the lookaheads in the grammar? They are a bit +ugly. Maybe as a compromise show them in the appendix but not in the +main text? + +See `Appendix A`_ for the complete grammar. + + +The ``match`` statement +----------------------- + +A ``match`` statement has the following top-level grammar:: + + match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT + match_expr: + | star_named_expression ',' star_named_expressions? + | named_expression + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + +The rules ``star_named_expression``, ``star_named_expressions``, +``named_expression`` and ``block`` are part of the `standard Python +grammar `_. + +The rule ``patterns`` is specified below. + +For context, ``match_stmt`` is a new alternative for +``compound_statement``:: + + compound_statement: + | if_stmt + ... + | match_stmt + + +The ``match`` and ``case`` keywords are soft keywords, i.e. they are +not reserved words in other grammatical contexts (including at the +start of a line if there is no colon where expected). This implies +that they are recognized as keywords when part of a ``match`` +statement or ``case`` block only, and are allowed to be used in all +other context as variable or argument names. + + +Match semantics +~~~~~~~~~~~~~~~ + +TODO: Make the language about choosing a block more precise. + +The overall semantics for choosing the match is to choose the first +matching pattern (including guard) and execute the corresponding +block. The remaining patterns are not tried. If there are no +matching patterns, execution continues at the following statement. + +Name bindings made during a successful pattern match outlive the +executed block and can be used after the match statement. + +During failed pattern matches, some sub-patterns may succeed. For +example, while matching the value ``[0, 1, 2]`` with the pattern ``(0, +x, 1)``, the sub-pattern ``x`` may succeed if the list elements are +matched from left to right. The implementation may choose to either +make persistent bindings for those partial matches or not. User code +including a ``match`` statement should not rely on the bindings being +made for a failed match, but also shouldn't assume that variables are +unchanged by a failed match. This part of the behavior is left +intentionally unspecified so different implementations can add +optimizations, and to prevent introducing semantic restrictions that +could limit the extensibility of this feature. + +The precise pattern binding rules vary per pattern type and are +specified below. + + +.. _patterns: + +Patterns +-------- + +TODO: rewrite the ``patterns`` rule to be easier to follow -- why +isn't it just ``','.value_pattern+ [',']``? Also, ``value_pattern`` +is a confusing name, since it is unrelated to "constant value +pattern". + +The top-grammar for patterns is as follows:: + + patterns: value_pattern ',' [values_pattern] | pattern + pattern: walrus_pattern | or_pattern + walrus_pattern: NAME ':=' or_pattern + or_pattern: '|'.closed_pattern+ + closed_pattern: + | literal_pattern + | capture_pattern + | wildcard_pattern + | constant_pattern + | group_pattern + | sequence_pattern + | mapping_pattern + | class_pattern + values_pattern: ','.value_pattern+ ','? + value_pattern: '*' capture_pattern | pattern + + +.. _literal_pattern: + +Literal Patterns +~~~~~~~~~~~~~~~~ + +Syntax:: + + literal_pattern: + | signed_number !('+' | '-') + | signed_number '+' NUMBER + | signed_number '-' NUMBER + | strings + | 'None' + | 'True' + | 'False' + signed_number: NUMBER | '-' NUMBER + +The rule ``strings`` and the token ``NUMBER`` are defined in the +standard Python grammar. + +Triple-quoted strings are supported. Raw strings and byte strings +are supported. F-strings are not supported. + +The forms ``signed_number '+' NUMBER`` and ``signed_number '-' +NUMBER`` are only permitted to express complex numbers; they require a +real number on the left and an imaginary number on the right. + +A literal pattern succeeds if the target value compares equal to the +value expressed by the literal, using the following comparisons rules: + +- Numbers and strings are compared using the ``==`` operator. + +- The singleton literals ``None``, ``True`` and ``False`` are compared + using the ``is`` operator. + + +.. _capture_pattern: + +Capture Patterns +~~~~~~~~~~~~~~~~ + +Syntax:: + + capture_pattern: !"_" NAME !('.' | '(' | '=') + +The single underscore (``_``) is not considered a ``NAME``. It is +treated specially as a `wildcard pattern`_. + +A capture pattern always succeeds. It binds the target value to the +name using the scoping rules for name binding established for the +walrus operator in PEP 572. (Summary: the name becomes a local +variable in the nearest function scope unless there's an applicable +``nonlocal`` or ``global`` statement.) + +In a given pattern, a given name may be bound only once. This +disallows e.g. ``case x, x: ...`` but allows ``case x | x: ...``. + +.. _wildcard_pattern: + +Wildcard Pattern +~~~~~~~~~~~~~~~~ + +Syntax:: + + wildcard_pattern: "_" + +A wildcard pattern always succeeds. It binds no name. + +.. _constant_value_pattern: + +Constant Value Patterns +~~~~~~~~~~~~~~~~~~~~~~~ + +TODO: Rename to Value Patterns? (But ``value_pattern[s]`` is already +a grammatical rule.) + +Syntax:: + + constant_pattern: attr !('.' | '(' | '=') + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + +The dotted name in the pattern is looked up using the standard Python +name resolution rules. However, when the same constant pattern occurs +multiple times in the same match statement, the interpreter may cache +the first value found and reuse it, rather than repeat the same +lookup. (To clarify, this cache is strictly tied to a given execution +of a given ``match`` statement.) + +The pattern succeeds if the value found thus compares equal to the +target value (using the `==` operator). + + +.. _sequence_pattern: + +Sequence Patterns +~~~~~~~~~~~~~~~~~ + +TODO: This is how far I got. Need to take a break. + +Simplified syntax:: + + sequence_pattern: + | '[' [values_pattern] ']' + | '(' [value_pattern ',' [values pattern]] ')' + values_pattern: ','.value_pattern+ ','? + value_pattern: '*' capture_pattern | pattern + +A sequence pattern follows the same semantics as unpacking assignment. +Like unpacking assignment, both tuple-like and list-like syntax can be +used, with identical semantics. Each element can be an arbitrary +pattern; there may also be at most one ``*name`` pattern to catch all +remaining items:: + + match collection: + case 1, [x, *others]: + print("Got 1 and a nested sequence") + case (1, x): + print(f"Got 1 and {x}") + +To match a sequence pattern the subject must be an instance of +``collections.abc.Sequence``, and it cannot be any kind of string +(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching +on a specific collection class, see class pattern below. + +The ``_`` wildcard can be starred to match sequences of varying lengths. For +example: + +* ``[*_]`` matches a sequence of any length. +* ``(_, _, *_)``, matches any sequence of length two or more. +* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with + ``"a"`` and ends with ``"z"``. + + +.. _mapping_pattern: + +Mapping Patterns +~~~~~~~~~~~~~~~~ + +Simplified syntax:: + + mapping_pattern: '{' [items_pattern] '}' + items_pattern: ','.key_value_pattern+ ','? + key_value_pattern: + | (literal_pattern | constant_pattern) ':' or_pattern + | '**' capture_pattern -Specification -============= -TBD. +Mapping pattern is a generalization of iterable unpacking to mappings. +Its syntax is similar to dictionary display but each key and value are +patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**rest`` pattern is also +allowed, to extract the remaining items. Only literal and constant value +patterns are allowed in key positions:: + + import constants + + match config: + case {"route": route}: + process_route(route) + case {constants.DEFAULT_PORT: sub_config, **rest}: + process_config(sub_config, rest) + +The subject must be an instance of ``collections.abc.Mapping``. +Extra keys in the subject are ignored even if ``**rest`` is not present. +This is different from sequence pattern, where extra items will cause a +match to fail. But mappings are actually different from sequences: they +have natural structural sub-typing behavior, i.e., passing a dictionary +with extra keys somewhere will likely just work. + +For this reason, ``**_`` is invalid in mapping patterns; it would always be a +no-op that could be removed without consequence. + +Matched key-value pairs must already be present in the mapping, and not created +on-the-fly by ``__missing__`` or ``__getitem__``. For example, +``collections.defaultdict`` instances will only match patterns with keys that +were already present when the ``match`` block was entered. + + +.. _class_pattern: + +Class Patterns +~~~~~~~~~~~~~~ + +Simplified syntax:: + + class_pattern: + | name_or_attr '(' ')' + | name_or_attr '(' ','.pattern+ ','? ')' + | name_or_attr '(' ','.keyword_pattern+ ','? ')' + | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' + keyword_pattern: NAME '=' or_pattern + + +A class pattern provides support for destructuring arbitrary objects. +There are two possible ways of matching on object attributes: by position +like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These +two can be combined, but a positional match cannot follow a match by name. +Each item in a class pattern can be an arbitrary pattern. A simple +example:: + + match shape: + case Point(x, y): + ... + case Rectangle(x0, y0, x1, y1, painted=True): + ... + +Whether a match succeeds or not is determined by the equivalent of an +``isinstance`` call. If the subject (``shape``, in the example) is not +an instance of the named class (``Point`` or ``Rectangle``), the match +fails. Otherwise, it continues (see details in the `runtime`_ +section). + +The named class must inherit from ``type``. It may be a single name +or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``). +The leading name must not be ``_``, so e.g. ``_(...)`` and +``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the +matched object has an attribute ``foo``. + +By default, sub-patterns may only be matched by keyword for +user-defined classes. In order to support positional sub-patterns, a +custom ``__match_args__`` attribute is required. +The runtime allows matching against +arbitrarily nested patterns by chaining all of the instance checks and +attribute lookups appropriately. + + +Combining multiple patterns (OR patterns) +----------------------------------------- + +Multiple alternative patterns can be combined into one using ``|``. This means +the whole pattern matches if at least one alternative matches. +Alternatives are tried from left to right and have a short-circuit property, +subsequent patterns are not tried if one matched. Examples:: + + match something: + case 0 | 1 | 2: + print("Small number") + case [] | [_]: + print("A short sequence") + case str() | bytes(): + print("Something string-like") + case _: + print("Something else") + +The alternatives may bind variables, as long as each alternative binds +the same set of variables (excluding ``_``). For example:: + + match something: + case 1 | x: # Error! + ... + case x | 1: # Error! + ... + case one := [1] | two := [2]: # Error! + ... + case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x' + ... + case [x] | x: # Valid, both arms bind 'x' + ... + + +.. _guards: + +Guards +------ + +Each *top-level* pattern can be followed by a **guard** of the form +``if expression``. A case clause succeeds if the pattern matches and the guard +evaluates to a true value. For example:: + + match input: + case [x, y] if x > MAX_INT and y > MAX_INT: + print("Got a pair of large numbers") + case x if x > MAX_INT: + print("Got a large number") + case [x, y] if x == y: + print("Got equal items") + case _: + print("Not an outstanding input") + +If evaluating a guard raises an exception, it is propagated onwards rather +than fail the case clause. Names that appear in a pattern are bound before the +guard succeeds. So this will work:: + + values = [0] + + match values: + case [x] if x: + ... # This is not executed + case _: + ... + print(x) # This will print "0" + +Note that guards are not allowed for nested patterns, so that ``[x if x > 0]`` +is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as +``(1 | 2) if (3 | 4)``. + + +Walrus patterns +--------------- + +It is often useful to match a sub-pattern *and* bind the corresponding +value to a name. For example, it can be useful to write more efficient +matches, or simply to avoid repetition. To simplify such cases, any pattern +(other than the walrus pattern itself) can be preceded by a name and +the walrus operator (``:=``). For example:: + + match get_shape(): + case Line(start := Point(x, y), end) if start == end: + print(f"Zero length line at {x}, {y}") + +The name on the left of the walrus operator can be used in a guard, in +the case block, or after the match statement. However, the name will +*only* be bound if the sub-pattern succeeds. Another example:: + + match group_shapes(): + case [], [point := Point(x, y), *other]: + print(f"Got {point} in the second group") + process_coordinates(x, y) + ... + +Technically, most such examples can be rewritten using guards and/or nested +match statements, but this will be less readable and/or will produce less +efficient code. Essentially, most of the arguments in PEP 572 apply here +equally. + +The wildcard ``_`` is not a valid name here. + + +.. _runtime: + +Runtime specification +===================== + +The Match Protocol +------------------ + +The equivalent of an ``isinstance`` call is used to decide whether an +object matches a given class pattern and to extract the corresponding +attributes. Classes requiring different matching semantics (such as +duck-typing) can do so by defining ``__instancecheck__`` (a +pre-existing metaclass hook) or by using ``typing.Protocol``. + +The procedure is as following: + +* The class object for ``Class`` in ``Class()`` is + looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is + the value being matched. If false, the match fails. + +* Otherwise, if any sub-patterns are given in the form of positional + or keyword arguments, these are matched from left to right, as + follows. The match fails as soon as a sub-pattern fails; if all + sub-patterns succeed, the overall class pattern match succeeds. + +* If there are match-by-position items and the class has a + ``__match_args__`` attribute, the item at position ``i`` + is matched against the value looked up by attribute + ``__match_args__[i]``. For example, a pattern ``Point2d(5, 8)``, + where ``Point2d.__match_args__ == ["x", "y"]``, is translated + (approximately) into ``obj.x == 5 and obj.y == 8``. + +* If there are more positional items than the length of + ``__match_args__``, a ``TypeError`` is raised. + +* If the ``__match_args__`` attribute is absent on the matched class, + and one or more positional item appears in a match, + ``TypeError`` is also raised. We don't fall back on + using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity, + refuse the temptation to guess." + +* If there are any match-by-keyword items the keywords are looked up + as attributes on the subject. If the lookup succeeds the value is + matched against the corresponding sub-pattern. If the lookup fails, + the match fails. + +Such a protocol favors simplicity of implementation over flexibility and +performance. For other considered alternatives, see "extended matching". + +For the most commonly-matched built-in types (``bool``, +``bytearray``, ``bytes``, ``dict``, ``float``, +``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a +single positional sub-pattern is allowed to be passed to +the call. Rather than being matched against any particular attribute +on the subject, it is instead matched against the subject itself. This +creates behavior that is useful and intuitive for these objects: + +* ``bool(False)`` matches ``False`` (but not ``0``). +* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``). +* ``int(i)`` matches any ``int`` and binds it to the name ``i``. + + +Overlapping sub-patterns +------------------------ + +Certain classes of overlapping matches are detected at +runtime and will raise exceptions. In addition to basic checks +described in the previous subsection: + +* The interpreter will check that two match items are not targeting the same + attribute, for example ``Point2d(1, 2, y=3)`` is an error. + +* It will also check that a mapping pattern does not attempt to match + the same key more than once. + + +Special attribute ``__match_args__`` +------------------------------------ + +The ``__match_args__`` attribute is always looked up on the type +object named in the pattern. If present, it must be a list or tuple +of strings naming the allowed positional arguments. + +In deciding what names should be available for matching, the +recommended practice is that class patterns should be the mirror of +construction; that is, the set of available names and their types +should resemble the arguments to ``__init__()``. + +Only match-by-name will work by default, and classes should define +``__match_args__`` as a class attribute if they would like to support +match-by-position. Additionally, dataclasses and named tuples will +support match-by-position out of the box. See below for more details. + +Exceptions and side effects +--------------------------- + +While matching each case, the ``match`` statement may trigger execution of other +functions (for example ``__getitem__()``, ``__len__()`` or +a property). Almost every exception caused by those propagates outside of the +match statement normally. The only case where an exception is not propagated is +an ``AttributeError`` raised while trying to lookup an attribute while matching +attributes of a Class Pattern; that case results in just a matching failure, +and the rest of the statement proceeds normally. + +The only side-effect carried on explicitly by the matching process is the binding of +names. However, the process relies on attribute access, +instance checks, ``len()``, equality and item access on the subject and some of +its components. It also evaluates constant value patterns and the left side of +class patterns. While none of those typically create any side-effects, some of +these objects could. This proposal intentionally leaves out any specification +of what methods are called or how many times. User code relying on that +behavior should be considered buggy. + +The standard library +-------------------- + +To facilitate the use of pattern matching, several changes will be made to +the standard library: + +* Namedtuples and dataclasses will have auto-generated ``__match_args__``. + +* For dataclasses the order of attributes in the generated ``__match_args__`` + will be the same as the order of corresponding arguments in the generated + ``__init__()`` method. This includes the situations where attributes are + inherited from a superclass. + +In addition, a systematic effort will be put into going through +existing standard library classes and adding ``__match_args__`` where +it looks beneficial. + + +.. _static checkers: + .. _Appendix A: @@ -39,11 +607,11 @@ Appendix A -- Full Grammar ========================== Here is the full grammar for ``match_stmt``. This is an additional -alternative for ``compound_stmt``. It should be understood that -``match`` and ``case`` are soft keywords, i.e. they are not reserved -words in other grammatical contexts (including at the start of a line -if there is no colon where expected). By convention, hard keywords -use single quotes while soft keywords use double quotes. +alternative for ``compound_stmt``. Remember that ``match`` and +``case`` are soft keywords, i.e. they are not reserved words in other +grammatical contexts (including at the start of a line if there is no +colon where expected). By convention, hard keywords use single quotes +while soft keywords use double quotes. Other notation used beyond standard EBNF: @@ -52,25 +620,29 @@ Other notation used beyond standard EBNF: :: + match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT match_expr: | star_named_expression ',' star_named_expressions? | named_expression - match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT case_block: "case" patterns [guard] ':' block guard: 'if' named_expression + patterns: value_pattern ',' [values_pattern] | pattern pattern: walrus_pattern | or_pattern walrus_pattern: NAME ':=' or_pattern or_pattern: '|'.closed_pattern+ closed_pattern: - | capture_pattern | literal_pattern + | capture_pattern + | wildcard_pattern | constant_pattern | group_pattern | sequence_pattern | mapping_pattern | class_pattern - capture_pattern: NAME !('.' | '(' | '=') + values_pattern: ','.value_pattern+ ','? + value_pattern: '*' capture_pattern | pattern + literal_pattern: | signed_number !('+' | '-') | signed_number '+' NUMBER @@ -79,25 +651,32 @@ Other notation used beyond standard EBNF: | 'None' | 'True' | 'False' + signed_number: NUMBER | '-' NUMBER + + capture_pattern: !"_" NAME !('.' | '(' | '=') + + wildcard_pattern: "_" + constant_pattern: attr !('.' | '(' | '=') + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + group_pattern: '(' patterns ')' + sequence_pattern: '[' [values_pattern] ']' | '(' ')' + mapping_pattern: '{' items_pattern? '}' + items_pattern: ','.key_value_pattern+ ','? + key_value_pattern: + | (literal_pattern | constant_pattern) ':' or_pattern + | '**' capture_pattern + class_pattern: | name_or_attr '(' ')' | name_or_attr '(' ','.pattern+ ','? ')' | name_or_attr '(' ','.keyword_pattern+ ','? ')' | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' - signed_number: NUMBER | '-' NUMBER - attr: name_or_attr '.' NAME - name_or_attr: attr | NAME - values_pattern: ','.value_pattern+ ','? - items_pattern: ','.key_value_pattern+ ','? keyword_pattern: NAME '=' or_pattern - value_pattern: '*' capture_pattern | pattern - key_value_pattern: - | (literal_pattern | constant_pattern) ':' or_pattern - | '**' capture_pattern Copyright From 8220a48a42895b55d3e1f7ba9ec0f63d52603e13 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 13:11:40 -0700 Subject: [PATCH 03/54] Target->subject; started on sequence and mapping --- pep-0634.rst | 112 ++++++++++++++++++++++++++++----------------------- 1 file changed, 62 insertions(+), 50 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index e08311a92e6..fcbae75d8ac 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -22,6 +22,9 @@ Abstract TODO: Do we write ``match`` statement or just match statement? Currently we're inconsistent. +TODO: Do we use "the subject matches the pattern" or "the pattern +matches the subject"? + This PEP provides the technical specification for the ``match`` statement. It replaces PEP 622, which is hereby split in three parts: @@ -135,7 +138,7 @@ The top-grammar for patterns is as follows:: | mapping_pattern | class_pattern values_pattern: ','.value_pattern+ ','? - value_pattern: '*' capture_pattern | pattern + value_pattern: '*' NAME | pattern .. _literal_pattern: @@ -165,7 +168,7 @@ The forms ``signed_number '+' NUMBER`` and ``signed_number '-' NUMBER`` are only permitted to express complex numbers; they require a real number on the left and an imaginary number on the right. -A literal pattern succeeds if the target value compares equal to the +A literal pattern succeeds if the subject value compares equal to the value expressed by the literal, using the following comparisons rules: - Numbers and strings are compared using the ``==`` operator. @@ -183,10 +186,10 @@ Syntax:: capture_pattern: !"_" NAME !('.' | '(' | '=') -The single underscore (``_``) is not considered a ``NAME``. It is -treated specially as a `wildcard pattern`_. +The single underscore (``_``) is not a capture pattern. It is +treated as a `wildcard pattern`_. -A capture pattern always succeeds. It binds the target value to the +A capture pattern always succeeds. It binds the subject value to the name using the scoping rules for name binding established for the walrus operator in PEP 572. (Summary: the name becomes a local variable in the nearest function scope unless there's an applicable @@ -228,7 +231,20 @@ lookup. (To clarify, this cache is strictly tied to a given execution of a given ``match`` statement.) The pattern succeeds if the value found thus compares equal to the -target value (using the `==` operator). +subject value (using the `==` operator). + + +Group Patterns +~~~~~~~~~~~~~~ + +Syntax: + + group_pattern: '(' pattern ')' + +(For the syntax of ``pattern``, see Patterns above.) + +A parenthesized pattern has no additional syntax. It allows uses to +add parentheses around patterns to emphasize the intended grouping. .. _sequence_pattern: @@ -236,40 +252,49 @@ target value (using the `==` operator). Sequence Patterns ~~~~~~~~~~~~~~~~~ -TODO: This is how far I got. Need to take a break. -Simplified syntax:: +Syntax:: sequence_pattern: - | '[' [values_pattern] ']' - | '(' [value_pattern ',' [values pattern]] ')' - values_pattern: ','.value_pattern+ ','? - value_pattern: '*' capture_pattern | pattern + | '[' [values_pattern] ']' + | '(' [value_pattern ',' [values_pattern]] ')' -A sequence pattern follows the same semantics as unpacking assignment. -Like unpacking assignment, both tuple-like and list-like syntax can be -used, with identical semantics. Each element can be an arbitrary -pattern; there may also be at most one ``*name`` pattern to catch all -remaining items:: +(For ``values_pattern`` and ``value_pattern``, see Patterns above.) - match collection: - case 1, [x, *others]: - print("Got 1 and a nested sequence") - case (1, x): - print(f"Got 1 and {x}") +(There is no syntactic overlap between group patterns and sequence +patterns.) -To match a sequence pattern the subject must be an instance of -``collections.abc.Sequence``, and it cannot be any kind of string -(``str``, ``bytes``, ``bytearray``). It cannot be an iterator. For matching -on a specific collection class, see class pattern below. +A sequence pattern may (directly) contain at most one subpattern of +the form ``'*' NAME``; all other subpatterns must be walrus patterns, +OR patterns or closed patterns. -The ``_`` wildcard can be starred to match sequences of varying lengths. For -example: +A sequence pattern fails if the subject value is not an instance of +``collections.abc.Sequence``. It also fails if the subject value is +an instance of ``str``, ``bytes`` or ``bytearray``. -* ``[*_]`` matches a sequence of any length. -* ``(_, _, *_)``, matches any sequence of length two or more. -* ``["a", *_, "z"]`` matches any sequence of length two or more that starts with - ``"a"`` and ends with ``"z"``. +If the one of the subpatterns has the form ``'*' NAME``, this is +called a variable-length sequence pattern. A variable-length sequence +pattern fails if the length of the subject sequence is less than the +number of subpatterns not of that form. + +If no such subpattern is present, the sequence pattern is considered +fixed-length. A fixed-length sequence pattern fails if the length of +the subject sequence is not equal to the number of subpatterns. + +The length of the subject sequence is obtained using the builtin +``len()`` function (i.e., via the ``__len__`` protocol). However, the +interpreter may cache this value in a similar manner as described for +constant value patterns. + +TODO: Do we have left-to-right semantics here? + +A fixed-length sequence pattern matches the subpatterns to +corresponding items of the subject sequence, from left to right. +Matching stops (with a failure) as soon as a subpattern fails. If all +subpatterns succeed in matching their corresponding item, the sequence +pattern succeeds. + +TODO: Describe variable-length sequence patterns. (Brandt?) .. _mapping_pattern: @@ -277,30 +302,17 @@ example: Mapping Patterns ~~~~~~~~~~~~~~~~ -Simplified syntax:: +Syntax:: - mapping_pattern: '{' [items_pattern] '}' + mapping_pattern: '{' items_pattern? '}' items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' capture_pattern - - -Mapping pattern is a generalization of iterable unpacking to mappings. -Its syntax is similar to dictionary display but each key and value are -patterns ``"{" (pattern ":" pattern)+ "}"``. A ``**rest`` pattern is also -allowed, to extract the remaining items. Only literal and constant value -patterns are allowed in key positions:: - - import constants + | '**' (!"_" NAME) - match config: - case {"route": route}: - process_route(route) - case {constants.DEFAULT_PORT: sub_config, **rest}: - process_config(sub_config, rest) +TODO: Describe. -The subject must be an instance of ``collections.abc.Mapping``. +The subject value must be an instance of ``collections.abc.Mapping``. Extra keys in the subject are ignored even if ``**rest`` is not present. This is different from sequence pattern, where extra items will cause a match to fail. But mappings are actually different from sequences: they From aaa31802f9508d5790041daf53da002541aeca7e Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 13:13:40 -0700 Subject: [PATCH 04/54] Always use ``match`` statement --- pep-0634.rst | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index fcbae75d8ac..a2d23a57037 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -19,9 +19,6 @@ Resolution: Abstract ======== -TODO: Do we write ``match`` statement or just match statement? -Currently we're inconsistent. - TODO: Do we use "the subject matches the pattern" or "the pattern matches the subject"? @@ -94,7 +91,7 @@ block. The remaining patterns are not tried. If there are no matching patterns, execution continues at the following statement. Name bindings made during a successful pattern match outlive the -executed block and can be used after the match statement. +executed block and can be used after the ``match`` statement. During failed pattern matches, some sub-patterns may succeed. For example, while matching the value ``[0, 1, 2]`` with the pattern ``(0, @@ -225,7 +222,7 @@ Syntax:: The dotted name in the pattern is looked up using the standard Python name resolution rules. However, when the same constant pattern occurs -multiple times in the same match statement, the interpreter may cache +multiple times in the same ``match`` statement, the interpreter may cache the first value found and reuse it, rather than repeat the same lookup. (To clarify, this cache is strictly tied to a given execution of a given ``match`` statement.) @@ -461,7 +458,7 @@ the walrus operator (``:=``). For example:: print(f"Zero length line at {x}, {y}") The name on the left of the walrus operator can be used in a guard, in -the case block, or after the match statement. However, the name will +the case block, or after the ``match`` statement. However, the name will *only* be bound if the sub-pattern succeeds. Another example:: match group_shapes(): @@ -471,7 +468,7 @@ the case block, or after the match statement. However, the name will ... Technically, most such examples can be rewritten using guards and/or nested -match statements, but this will be less readable and/or will produce less +``match`` statements, but this will be less readable and/or will produce less efficient code. Essentially, most of the arguments in PEP 572 apply here equally. @@ -577,7 +574,7 @@ Exceptions and side effects While matching each case, the ``match`` statement may trigger execution of other functions (for example ``__getitem__()``, ``__len__()`` or a property). Almost every exception caused by those propagates outside of the -match statement normally. The only case where an exception is not propagated is +``match`` statement normally. The only case where an exception is not propagated is an ``AttributeError`` raised while trying to lookup an attribute while matching attributes of a Class Pattern; that case results in just a matching failure, and the rest of the statement proceeds normally. From 9b46594a76dbbf17d6a0d32343d900650132deaa Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 13:31:37 -0700 Subject: [PATCH 05/54] Finish mapping pattern, some tweaks --- pep-0634.rst | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index a2d23a57037..34699bcb865 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -258,8 +258,8 @@ Syntax:: (For ``values_pattern`` and ``value_pattern``, see Patterns above.) -(There is no syntactic overlap between group patterns and sequence -patterns.) +(Note that a single parenthesized pattern without a trailing comma is +a group pattern, not a sequence pattern.) A sequence pattern may (directly) contain at most one subpattern of the form ``'*' NAME``; all other subpatterns must be walrus patterns, @@ -307,22 +307,27 @@ Syntax:: | (literal_pattern | constant_pattern) ':' or_pattern | '**' (!"_" NAME) -TODO: Describe. +(Note that ``'**' "_"`` is disallowed by this grammar.) -The subject value must be an instance of ``collections.abc.Mapping``. -Extra keys in the subject are ignored even if ``**rest`` is not present. -This is different from sequence pattern, where extra items will cause a -match to fail. But mappings are actually different from sequences: they -have natural structural sub-typing behavior, i.e., passing a dictionary -with extra keys somewhere will likely just work. +A mapping pattern fails if the subject value is not an instance of +``collections.abc.Mapping``. -For this reason, ``**_`` is invalid in mapping patterns; it would always be a -no-op that could be removed without consequence. +A mapping pattern succeeds if every key given in the mapping pattern +matches the corresponding item of the subject mapping. If a ``'**' +NAME`` form is present, that name is bound to a ``dict`` containing +remaining key-value pairs from the subject mapping. -Matched key-value pairs must already be present in the mapping, and not created -on-the-fly by ``__missing__`` or ``__getitem__``. For example, -``collections.defaultdict`` instances will only match patterns with keys that -were already present when the ``match`` block was entered. +If duplicate keys are detected in the mapping pattern, the pattern is +considered invalid, and a ``ValueError`` is raised. + +TODO: Should we be more prescriptive about using ``in``? + +Matched key-value pairs must already be present in the mapping, and +not created on-the-fly by ``__missing__`` or ``__getitem__``. For +example, ``collections.defaultdict`` instances will only match +patterns with keys that were already present when the ``match`` block +was entered. This may be implemented by using the ``in`` operator +(i.e., the ``__contains__`` protocol) before using ``__getitem__``. .. _class_pattern: @@ -606,10 +611,6 @@ existing standard library classes and adding ``__match_args__`` where it looks beneficial. -.. _static checkers: - - - .. _Appendix A: Appendix A -- Full Grammar From 5879ee5ee2a35c5d653c0e49ce6411d6fc55dbd5 Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 13:35:37 -0700 Subject: [PATCH 06/54] PEP 636: Sponsored by GvR --- pep-0636.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pep-0636.rst b/pep-0636.rst index bf378ed84aa..d5a5ab73dca 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -4,6 +4,7 @@ Version: $Revision$ Last-Modified: $Date$ Author: Daniel F Moisset , Tobias Kohn +Sponsor: Guido van Rossum BDFL-Delegate: Discussions-To: Python-Dev Status: Draft @@ -11,7 +12,7 @@ Type: Informational Content-Type: text/x-rst Created: 12-Sep-2020 Python-Version: 3.10 -Post-History: +Post-History: Resolution: From f3b262ca25a7b189bc41994c6af7dec0be7ad5cd Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 13:43:36 -0700 Subject: [PATCH 07/54] Remove bool from "match self" examples --- pep-0634.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 34699bcb865..7172a6a77a5 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -537,9 +537,9 @@ the call. Rather than being matched against any particular attribute on the subject, it is instead matched against the subject itself. This creates behavior that is useful and intuitive for these objects: -* ``bool(False)`` matches ``False`` (but not ``0``). +* ``int(0)`` matches ``0`` (but not ``0.0``). * ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``). -* ``int(i)`` matches any ``int`` and binds it to the name ``i``. +* ``float(f)`` matches any ``float`` and binds it to the name ``f``. Overlapping sub-patterns From 520a68f55c35504dc480a7687cb799956c45121b Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 15:38:27 -0700 Subject: [PATCH 08/54] Clean up the simplified grammar --- pep-0634.rst | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 7172a6a77a5..1315a7eca8e 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -38,10 +38,6 @@ concepts, syntax and semantics of patterns. Syntax and Semantics ==================== -TODO: Should we show the lookaheads in the grammar? They are a bit -ugly. Maybe as a compromise show them in the appendix but not in the -main text? - See `Appendix A`_ for the complete grammar. @@ -146,7 +142,7 @@ Literal Patterns Syntax:: literal_pattern: - | signed_number !('+' | '-') + | signed_number | signed_number '+' NUMBER | signed_number '-' NUMBER | strings @@ -181,7 +177,7 @@ Capture Patterns Syntax:: - capture_pattern: !"_" NAME !('.' | '(' | '=') + capture_pattern: NAME The single underscore (``_``) is not a capture pattern. It is treated as a `wildcard pattern`_. @@ -193,7 +189,7 @@ variable in the nearest function scope unless there's an applicable ``nonlocal`` or ``global`` statement.) In a given pattern, a given name may be bound only once. This -disallows e.g. ``case x, x: ...`` but allows ``case x | x: ...``. +disallows e.g. ``case x, x: ...`` but allows ``case [x] | x: ...``. .. _wildcard_pattern: @@ -211,12 +207,12 @@ A wildcard pattern always succeeds. It binds no name. Constant Value Patterns ~~~~~~~~~~~~~~~~~~~~~~~ -TODO: Rename to Value Patterns? (But ``value_pattern[s]`` is already +TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already a grammatical rule.) Syntax:: - constant_pattern: attr !('.' | '(' | '=') + constant_pattern: attr attr: name_or_attr '.' NAME name_or_attr: attr | NAME @@ -228,7 +224,7 @@ lookup. (To clarify, this cache is strictly tied to a given execution of a given ``match`` statement.) The pattern succeeds if the value found thus compares equal to the -subject value (using the `==` operator). +subject value (using the ``==`` operator). Group Patterns @@ -240,7 +236,7 @@ Syntax: (For the syntax of ``pattern``, see Patterns above.) -A parenthesized pattern has no additional syntax. It allows uses to +A parenthesized pattern has no additional syntax. It allows users to add parentheses around patterns to emphasize the intended grouping. @@ -254,7 +250,8 @@ Syntax:: sequence_pattern: | '[' [values_pattern] ']' - | '(' [value_pattern ',' [values_pattern]] ')' + | value_pattern ',' [values_pattern] + | '(' ')' (For ``values_pattern`` and ``value_pattern``, see Patterns above.) @@ -305,7 +302,7 @@ Syntax:: items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' (!"_" NAME) + | '**' NAME (Note that ``'**' "_"`` is disallowed by this grammar.) From 15fdb0ff171eda75e6781fdc88cb7e7bc0c63f87 Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 15:45:18 -0700 Subject: [PATCH 09/54] Change float example to use bool --- pep-0634.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0634.rst b/pep-0634.rst index 1315a7eca8e..3f153ab967c 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -536,7 +536,7 @@ creates behavior that is useful and intuitive for these objects: * ``int(0)`` matches ``0`` (but not ``0.0``). * ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``). -* ``float(f)`` matches any ``float`` and binds it to the name ``f``. +* ``bool(b)`` matches any ``bool`` and binds it to the name ``b``. Overlapping sub-patterns From 4295a583b8112c776d5de3e6af02e34a77fbf312 Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 16:04:33 -0700 Subject: [PATCH 10/54] Patterns match subjects --- pep-0634.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 3f153ab967c..b891d72e648 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -367,10 +367,10 @@ The leading name must not be ``_``, so e.g. ``_(...)`` and ``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the matched object has an attribute ``foo``. -By default, sub-patterns may only be matched by keyword for +By default, sub-patterns may only match by keyword for user-defined classes. In order to support positional sub-patterns, a custom ``__match_args__`` attribute is required. -The runtime allows matching against +The runtime allows matching arbitrarily nested patterns by chaining all of the instance checks and attribute lookups appropriately. @@ -449,7 +449,7 @@ is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as Walrus patterns --------------- -It is often useful to match a sub-pattern *and* bind the corresponding +It is often useful for a pattern to match *and* bind the corresponding value to a name. For example, it can be useful to write more efficient matches, or simply to avoid repetition. To simplify such cases, any pattern (other than the walrus pattern itself) can be preceded by a name and @@ -486,7 +486,7 @@ The Match Protocol ------------------ The equivalent of an ``isinstance`` call is used to decide whether an -object matches a given class pattern and to extract the corresponding +a given class pattern matches a subject and to extract the corresponding attributes. Classes requiring different matching semantics (such as duck-typing) can do so by defining ``__instancecheck__`` (a pre-existing metaclass hook) or by using ``typing.Protocol``. @@ -530,8 +530,8 @@ For the most commonly-matched built-in types (``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, ``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a single positional sub-pattern is allowed to be passed to -the call. Rather than being matched against any particular attribute -on the subject, it is instead matched against the subject itself. This +the call. Rather than matching any particular attribute +on the subject, it is instead matches the subject itself. This creates behavior that is useful and intuitive for these objects: * ``int(0)`` matches ``0`` (but not ``0.0``). @@ -546,7 +546,7 @@ Certain classes of overlapping matches are detected at runtime and will raise exceptions. In addition to basic checks described in the previous subsection: -* The interpreter will check that two match items are not targeting the same +* The interpreter will check that two sub-patterns are not targeting the same attribute, for example ``Point2d(1, 2, y=3)`` is an error. * It will also check that a mapping pattern does not attempt to match From 1518f84ed88049ed8ef7389a71fe016959787be1 Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 16:04:59 -0700 Subject: [PATCH 11/54] Mapping patterns use get() --- pep-0634.rst | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index b891d72e648..df4afe1f8a3 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -317,14 +317,13 @@ remaining key-value pairs from the subject mapping. If duplicate keys are detected in the mapping pattern, the pattern is considered invalid, and a ``ValueError`` is raised. -TODO: Should we be more prescriptive about using ``in``? - -Matched key-value pairs must already be present in the mapping, and -not created on-the-fly by ``__missing__`` or ``__getitem__``. For -example, ``collections.defaultdict`` instances will only match -patterns with keys that were already present when the ``match`` block -was entered. This may be implemented by using the ``in`` operator -(i.e., the ``__contains__`` protocol) before using ``__getitem__``. +Key-value pairs are matched using the two-argument form of the +subject's ``get()`` method. As a consequence, matched key-value pairs +must already be present in the mapping, and not created on-the-fly by +``__missing__`` or ``__getitem__``. For example, +``collections.defaultdict`` instances will only be matched by patterns +with keys that were already present when the ``match`` block was +entered. .. _class_pattern: From 8c63bafea28cb1b2cc4abf1cb99cefcf512c5ab9 Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 16:06:13 -0700 Subject: [PATCH 12/54] Clean up TODOs --- pep-0634.rst | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index df4afe1f8a3..43b0191a5f8 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -19,9 +19,6 @@ Resolution: Abstract ======== -TODO: Do we use "the subject matches the pattern" or "the pattern -matches the subject"? - This PEP provides the technical specification for the ``match`` statement. It replaces PEP 622, which is hereby split in three parts: @@ -110,8 +107,7 @@ specified below. Patterns -------- -TODO: rewrite the ``patterns`` rule to be easier to follow -- why -isn't it just ``','.value_pattern+ [',']``? Also, ``value_pattern`` +TODO: ``value_pattern`` is a confusing name, since it is unrelated to "constant value pattern". From 846df5b217b154ea64bc7cdc0eb8a32c1c68331f Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 17:18:48 -0700 Subject: [PATCH 13/54] Allow "_" in class patterns. --- pep-0634.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 43b0191a5f8..3f8675fda37 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -358,8 +358,7 @@ section). The named class must inherit from ``type``. It may be a single name or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``). -The leading name must not be ``_``, so e.g. ``_(...)`` and -``_.C(...)`` are invalid. Use ``object(foo=_)`` to check whether the +Use ``object(foo=_)`` to check whether the matched object has an attribute ``foo``. By default, sub-patterns may only match by keyword for From cfd9e7433f869a0dc31be3b117f98d5ea3aca19d Mon Sep 17 00:00:00 2001 From: Brandt Bucher Date: Sat, 12 Sep 2020 17:26:01 -0700 Subject: [PATCH 14/54] Remove comment --- pep-0634.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 3f8675fda37..d81adb9a8fe 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -276,8 +276,6 @@ The length of the subject sequence is obtained using the builtin interpreter may cache this value in a similar manner as described for constant value patterns. -TODO: Do we have left-to-right semantics here? - A fixed-length sequence pattern matches the subpatterns to corresponding items of the subject sequence, from left to right. Matching stops (with a failure) as soon as a subpattern fails. If all From 944bda9a23bd0894b6aa149272039a0a5fbf1c89 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 17:16:21 -0700 Subject: [PATCH 15/54] Say 'the pattern matches the subject', not vice versa --- pep-0634.rst | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index d81adb9a8fe..30b8cd6476b 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -87,8 +87,8 @@ Name bindings made during a successful pattern match outlive the executed block and can be used after the ``match`` statement. During failed pattern matches, some sub-patterns may succeed. For -example, while matching the value ``[0, 1, 2]`` with the pattern ``(0, -x, 1)``, the sub-pattern ``x`` may succeed if the list elements are +example, while matching the pattern ``(0, x, 1)`` with the value ``[0, +1, 2]``, the sub-pattern ``x`` may succeed if the list elements are matched from left to right. The implementation may choose to either make persistent bindings for those partial matches or not. User code including a ``match`` statement should not rely on the bindings being @@ -241,7 +241,6 @@ add parentheses around patterns to emphasize the intended grouping. Sequence Patterns ~~~~~~~~~~~~~~~~~ - Syntax:: sequence_pattern: @@ -487,7 +486,7 @@ The procedure is as following: * The class object for ``Class`` in ``Class()`` is looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is - the value being matched. If false, the match fails. + the subject value. If false, the match fails. * Otherwise, if any sub-patterns are given in the form of positional or keyword arguments, these are matched from left to right, as @@ -511,7 +510,7 @@ The procedure is as following: refuse the temptation to guess." * If there are any match-by-keyword items the keywords are looked up - as attributes on the subject. If the lookup succeeds the value is + as attributes on the subject. If the lookup succeeds, the value is matched against the corresponding sub-pattern. If the lookup fails, the match fails. @@ -523,7 +522,7 @@ For the most commonly-matched built-in types (``bool``, ``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a single positional sub-pattern is allowed to be passed to the call. Rather than matching any particular attribute -on the subject, it is instead matches the subject itself. This +on the subject, it instead matches the subject itself. This creates behavior that is useful and intuitive for these objects: * ``int(0)`` matches ``0`` (but not ``0.0``). From ae2ab7d9e9ec5a8b897a1b6724fa47a6ea135c74 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sat, 12 Sep 2020 17:24:22 -0700 Subject: [PATCH 16/54] Put back some subtleties around *_, **_ --- pep-0634.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 30b8cd6476b..a8111850edc 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -41,7 +41,7 @@ See `Appendix A`_ for the complete grammar. The ``match`` statement ----------------------- -A ``match`` statement has the following top-level grammar:: +A ``match`` statement has the following top-level syntax:: match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT match_expr: @@ -111,7 +111,7 @@ TODO: ``value_pattern`` is a confusing name, since it is unrelated to "constant value pattern". -The top-grammar for patterns is as follows:: +The top-level syntax for patterns is as follows:: patterns: value_pattern ',' [values_pattern] | pattern pattern: walrus_pattern | or_pattern @@ -173,10 +173,10 @@ Capture Patterns Syntax:: - capture_pattern: NAME + capture_pattern: !"_" NAME -The single underscore (``_``) is not a capture pattern. It is -treated as a `wildcard pattern`_. +The single underscore (``_``) is not a capture pattern (this is what +``!"_"`` expresses). It is treated as a `wildcard pattern`_. A capture pattern always succeeds. It binds the subject value to the name using the scoping rules for name binding established for the @@ -295,9 +295,9 @@ Syntax:: items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' NAME + | '**' !"_" NAME -(Note that ``'**' "_"`` is disallowed by this grammar.) +(Note that ``'**_`` is disallowed by this syntax.) A mapping pattern fails if the subject value is not an instance of ``collections.abc.Mapping``. @@ -639,7 +639,7 @@ Other notation used beyond standard EBNF: | mapping_pattern | class_pattern values_pattern: ','.value_pattern+ ','? - value_pattern: '*' capture_pattern | pattern + value_pattern: '*' NAME | pattern literal_pattern: | signed_number !('+' | '-') @@ -667,7 +667,7 @@ Other notation used beyond standard EBNF: items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' capture_pattern + | '**' !"_" NAME class_pattern: | name_or_attr '(' ')' From 1a33f3c016d8d11dd0b7dbaee5be49237c59fed7 Mon Sep 17 00:00:00 2001 From: Daniel F Moisset Date: Sun, 13 Sep 2020 19:16:26 +0100 Subject: [PATCH 17/54] Add abstract and first section --- pep-0636.rst | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/pep-0636.rst b/pep-0636.rst index bf378ed84aa..aa775576798 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -20,11 +20,103 @@ Abstract This PEP is a tutorial for the pattern matching introduced by PEP 634. +PEP 622 proposed syntax for pattern matching, which received detailed discussion +both from the community and the Steering Council. A frequent concern was +about how easy would be to explain (and learn) about this feature. This PEP +addresses that concern providing the kind of document which learners could use +to learn about pattern matching in Python. -Body +This is considered supporting material for PEP 634 (the technical specification +for pattern matching) and PEP 635 (the motivation and rational for having pattern +matching and design considerations). + +Meta ==== -TBD. +This section is intended to get in sync about style and language with +co-authors. It should be removed from the released PEP + +The following are design decisions I made while writing this: + +1. Who is the target audience? +I'm considering "People with general Python experience" (i.e. who shouldn't be surprised +at anything in the Python tutorial), but not necessarily involved with the +design/development or Python. I'm assuming someone who hasn't been exposed to pattern +matching in other languages. + +2. How detailed should this document be? +I considered a range from "very superficial" (like the detail level you might find about +statements in the Python tutorial) to "terse but complete" like +https://github.com/gvanrossum/patma/#tutorial +to "long and detailed". I chose the later, we can always trim down from that. + +3. What kind of examples to use? +I tried to write examples that are could that I might write using pattern matching. I +avoided going +for a full application (because the examples I have in mind are too large for a PEP) but +I tried to follow ideas related to a single project to thread the story-telling more +easily. This is probably the most controversial thing here, and if the rest of +the authors dislike it, we can change to a more formal explanatory style. + +Other rules I'm following (let me know if I forgot to): + +* I'm not going to reference/compare with other languages +* I'm not trying to convince the reader that this is a good idea (that's the job of + PEP 635) just explain how to use it +* I'm not trying to cover every corner case (that's the job of PEP 634), just cover + how to use the full functionality in the "normal" cases. +* I talk to the learner in second person + +Tutorial +======== + +Getting Started +--------------- + +As an example to motivate this tutorial, you will be writing a text-adventure. That is +a form of interactive fiction where the user enters text commands to interact with a +fictional world and receives text descriptions of what happens. Commands will be +simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``, +``enter shop`` or ``buy cheese``. + +Your main loop will need to get input from the user and split it into words, let's say +a list of strings like this:: + + command = input("What are you doing next? ") + words = command.split() + +The next step is to interpret the words. Most of our commands will be two words: an +action and an object. So you may be tempted to do the following:: + + [action, obj] = words + # interpret action, obj + +The problem with that line of code is that it's missing something: what if the user +types more or less than 2 words? To prevent this problem you can either check the length +of the list of words, or capture the ``ValueError`` that the statement above would raise. + +You can use a matching statement instead:: + + match words: + case [action, obj]: + # interpret action, obj + +The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks +it against the **pattern** next to ``case``. A pattern is able to do two different +things: + + * Verify that the subject has certain structure. In your case, the ``[action, obj]`` + pattern matches any sequence of exactly two elements. This is called **matching** + * It will bind some names in the pattern to component elements of your subject. In + this case, if the list has two elements, it will bind ``action = words[0]`` and + ``obj = words[1]``. This is called **destructuring** + +If there's a match, the statements inside the ``case`` clause will be run with the +bound variables. If there's no match, nothing happens and the next statement after +``match`` keeps running. + + + Copyright From 1cc25bc0407b437f72a601816ccef916f12cb431 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Sun, 13 Sep 2020 17:30:08 -0700 Subject: [PATCH 18/54] Finished sequence pattern; added some TODOs; moved walrus around; tweaks --- pep-0634.rst | 174 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 104 insertions(+), 70 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index a8111850edc..ec12b23195a 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -107,13 +107,9 @@ specified below. Patterns -------- -TODO: ``value_pattern`` -is a confusing name, since it is unrelated to "constant value -pattern". - The top-level syntax for patterns is as follows:: - patterns: value_pattern ',' [values_pattern] | pattern + patterns: open_sequence_pattern | pattern pattern: walrus_pattern | or_pattern walrus_pattern: NAME ':=' or_pattern or_pattern: '|'.closed_pattern+ @@ -126,8 +122,6 @@ The top-level syntax for patterns is as follows:: | sequence_pattern | mapping_pattern | class_pattern - values_pattern: ','.value_pattern+ ','? - value_pattern: '*' NAME | pattern .. _literal_pattern: @@ -230,7 +224,9 @@ Syntax: group_pattern: '(' pattern ')' -(For the syntax of ``pattern``, see Patterns above.) +(For the syntax of ``pattern``, see Patterns above. Note that it +contains no comma -- a parenthesized series of items with at least one +comma is a sequence pattern, as is ``()``.) A parenthesized pattern has no additional syntax. It allows users to add parentheses around patterns to emphasize the intended grouping. @@ -245,30 +241,34 @@ Syntax:: sequence_pattern: | '[' [values_pattern] ']' - | value_pattern ',' [values_pattern] - | '(' ')' - -(For ``values_pattern`` and ``value_pattern``, see Patterns above.) + | '(' [open_sequence_pattern] ')' + open_sequence_pattern: value_pattern ',' [values_pattern] + values_pattern: ','.value_pattern+ ','? + value_pattern: star_pattern | pattern + star_pattern: '*' (capture_pattern | wildcard_pattern) (Note that a single parenthesized pattern without a trailing comma is -a group pattern, not a sequence pattern.) +a group pattern, not a sequence pattern. However a single pattern +enclosed in ``[...]`` is still a sequence pattern.) + +There is no semantic difference between a sequence pattern using +``[...]``, a sequence pattern using ``(...)``, and an open sequence +pattern. -A sequence pattern may (directly) contain at most one subpattern of -the form ``'*' NAME``; all other subpatterns must be walrus patterns, -OR patterns or closed patterns. +A sequence pattern may contain at most one star subpattern. The star +subpattern may occur in any position. If no star subpattern is +present, the sequence pattern is a fixed-length sequence pattern; +otherwise it is a variable-length sequence pattern. A sequence pattern fails if the subject value is not an instance of ``collections.abc.Sequence``. It also fails if the subject value is an instance of ``str``, ``bytes`` or ``bytearray``. -If the one of the subpatterns has the form ``'*' NAME``, this is -called a variable-length sequence pattern. A variable-length sequence -pattern fails if the length of the subject sequence is less than the -number of subpatterns not of that form. +A fixed-length sequence pattern fails if the length of the subject +sequence is not equal to the number of subpatterns. -If no such subpattern is present, the sequence pattern is considered -fixed-length. A fixed-length sequence pattern fails if the length of -the subject sequence is not equal to the number of subpatterns. +A variable-length sequence pattern fails if the length of the subject +sequence is less than the number of non-star subpatterns. The length of the subject sequence is obtained using the builtin ``len()`` function (i.e., via the ``__len__`` protocol). However, the @@ -281,7 +281,14 @@ Matching stops (with a failure) as soon as a subpattern fails. If all subpatterns succeed in matching their corresponding item, the sequence pattern succeeds. -TODO: Describe variable-length sequence patterns. (Brandt?) +A variable-length sequence pattern first matches the leading non-star +subpatterns to the curresponding items of the subject sequence, as for +a fixed-length sequence. If this succeeds, the star subpattern +matches a list formed of the remaining subject items, with items +removed from the end corresponding to the non-star subpatterns +following the star subpattern. The remaining non-star subpatterns are +then matched to the corresponding subject items, as for a fixed-length +sequence. .. _mapping_pattern: @@ -291,13 +298,17 @@ Mapping Patterns Syntax:: - mapping_pattern: '{' items_pattern? '}' + mapping_pattern: '{' [items_pattern] '}' items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' !"_" NAME + | double_star_pattern + double_star_pattern: '**' capture_pattern + +(Note that ``**_`` is disallowed by this syntax.) -(Note that ``'**_`` is disallowed by this syntax.) +A mapping pattern may contain at most one double star pattern, +and it must be last. A mapping pattern fails if the subject value is not an instance of ``collections.abc.Mapping``. @@ -324,7 +335,9 @@ entered. Class Patterns ~~~~~~~~~~~~~~ -Simplified syntax:: +TODO: Modernize this section. + +Syntax:: class_pattern: | name_or_attr '(' ')' @@ -333,7 +346,6 @@ Simplified syntax:: | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' keyword_pattern: NAME '=' or_pattern - A class pattern provides support for destructuring arbitrary objects. There are two possible ways of matching on object attributes: by position like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These @@ -366,8 +378,10 @@ arbitrarily nested patterns by chaining all of the instance checks and attribute lookups appropriately. -Combining multiple patterns (OR patterns) ------------------------------------------ +OR patterns +~~~~~~~~~~~ + +TODO: Modernize this section. Multiple alternative patterns can be combined into one using ``|``. This means the whole pattern matches if at least one alternative matches. @@ -400,11 +414,46 @@ the same set of variables (excluding ``_``). For example:: ... +Walrus patterns +~~~~~~~~~~~~~~~ + +TODO: Modernize this section. + +It is often useful for a pattern to match *and* bind the corresponding +value to a name. For example, it can be useful to write more efficient +matches, or simply to avoid repetition. To simplify such cases, any pattern +(other than the walrus pattern itself) can be preceded by a name and +the walrus operator (``:=``). For example:: + + match get_shape(): + case Line(start := Point(x, y), end) if start == end: + print(f"Zero length line at {x}, {y}") + +The name on the left of the walrus operator can be used in a guard, in +the case block, or after the ``match`` statement. However, the name will +*only* be bound if the sub-pattern succeeds. Another example:: + + match group_shapes(): + case [], [point := Point(x, y), *other]: + print(f"Got {point} in the second group") + process_coordinates(x, y) + ... + +Technically, most such examples can be rewritten using guards and/or nested +``match`` statements, but this will be less readable and/or will produce less +efficient code. Essentially, most of the arguments in PEP 572 apply here +equally. + +The wildcard ``_`` is not a valid name here. + + .. _guards: Guards ------ +TODO: Modernize this section. + Each *top-level* pattern can be followed by a **guard** of the form ``if expression``. A case clause succeeds if the pattern matches and the guard evaluates to a true value. For example:: @@ -437,42 +486,13 @@ is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as ``(1 | 2) if (3 | 4)``. -Walrus patterns ---------------- - -It is often useful for a pattern to match *and* bind the corresponding -value to a name. For example, it can be useful to write more efficient -matches, or simply to avoid repetition. To simplify such cases, any pattern -(other than the walrus pattern itself) can be preceded by a name and -the walrus operator (``:=``). For example:: - - match get_shape(): - case Line(start := Point(x, y), end) if start == end: - print(f"Zero length line at {x}, {y}") - -The name on the left of the walrus operator can be used in a guard, in -the case block, or after the ``match`` statement. However, the name will -*only* be bound if the sub-pattern succeeds. Another example:: - - match group_shapes(): - case [], [point := Point(x, y), *other]: - print(f"Got {point} in the second group") - process_coordinates(x, y) - ... - -Technically, most such examples can be rewritten using guards and/or nested -``match`` statements, but this will be less readable and/or will produce less -efficient code. Essentially, most of the arguments in PEP 572 apply here -equally. - -The wildcard ``_`` is not a valid name here. - - .. _runtime: Runtime specification ===================== +TODO: Modernize this section. + The Match Protocol ------------------ @@ -584,6 +604,8 @@ behavior should be considered buggy. The standard library -------------------- +TODO: Make this a top-level section? + To facilitate the use of pattern matching, several changes will be made to the standard library: @@ -604,6 +626,13 @@ it looks beneficial. Appendix A -- Full Grammar ========================== +TODO: Double-check that the syntax sections above match what's written +here (except for trailing lookaheads). + +TODO: Go over the differences with the reference implementation and +resolve them (either by fixing the PEP or by fixing the reference +implementation). + Here is the full grammar for ``match_stmt``. This is an additional alternative for ``compound_stmt``. Remember that ``match`` and ``case`` are soft keywords, i.e. they are not reserved words in other @@ -620,12 +649,12 @@ Other notation used beyond standard EBNF: match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT match_expr: - | star_named_expression ',' star_named_expressions? + | star_named_expression ',' [star_named_expressions] | named_expression case_block: "case" patterns [guard] ':' block guard: 'if' named_expression - patterns: value_pattern ',' [values_pattern] | pattern + patterns: open_sequence_pattern | pattern pattern: walrus_pattern | or_pattern walrus_pattern: NAME ':=' or_pattern or_pattern: '|'.closed_pattern+ @@ -638,8 +667,6 @@ Other notation used beyond standard EBNF: | sequence_pattern | mapping_pattern | class_pattern - values_pattern: ','.value_pattern+ ','? - value_pattern: '*' NAME | pattern literal_pattern: | signed_number !('+' | '-') @@ -659,15 +686,22 @@ Other notation used beyond standard EBNF: attr: name_or_attr '.' NAME name_or_attr: attr | NAME - group_pattern: '(' patterns ')' + group_pattern: '(' pattern ')' - sequence_pattern: '[' [values_pattern] ']' | '(' ')' + sequence_pattern: + | '[' [values_pattern] ']' + | '(' [open_sequence_pattern] ')' + open_sequence_pattern: value_pattern ',' [values_pattern] + values_pattern: ','.value_pattern+ ','? + value_pattern: star_pattern | pattern + star_pattern: '*' (capture_pattern | wildcard_pattern) - mapping_pattern: '{' items_pattern? '}' + mapping_pattern: '{' [items_pattern] '}' items_pattern: ','.key_value_pattern+ ','? key_value_pattern: | (literal_pattern | constant_pattern) ':' or_pattern - | '**' !"_" NAME + | double_star_pattern + double_star_pattern: '**' capture_pattern class_pattern: | name_or_attr '(' ')' From 6f1cac7b039430e63ddb97d3498bbe4ffb5123b9 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Mon, 14 Sep 2020 13:29:34 -0700 Subject: [PATCH 19/54] Add 'Overview and terminology'; s/sub-pattern/subpattern/g --- pep-0634.rst | 62 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index ec12b23195a..8b0929b5f00 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -37,6 +37,35 @@ Syntax and Semantics See `Appendix A`_ for the complete grammar. +Overview and terminology +------------------------ + +The pattern matching process takes as input a pattern (following +``case``) and a subject value (following ``match``). Phrases to +describe the process include "the pattern is matched with (or against) +the subject value" and "we match the pattern against (or with) the +subject value". + +The primary outcome of pattern matching is success or failure. In +case of success we may say "the pattern succeeds", "the match +succeeds", or "the pattern matches the subject value". + +In many cases a pattern contains subpatterns, and success or failure +is determined by the success or failure of matching those subpatterns +against the value (e.g., for OR patterns) or against parts of the +value (e.g., for sequence patterns). This process typically processes +the subpatterns from left to right until the overall outcome is +determined. E.g., an OR pattern succeeds at the first succeeding +subpattern, while a sequence patterns fails at the first failing +subpattern. + +A secondary outcome of pattern matching may be one or more name +bindings. We may say "the pattern binds a value to a name". When +subpatterns tried until the first success, only the bindings due to +the successful subpattern are valid; when trying until the first +failure, the bindings are merged. Several more rules, explained +below, apply to these cases. + The ``match`` statement ----------------------- @@ -86,9 +115,9 @@ matching patterns, execution continues at the following statement. Name bindings made during a successful pattern match outlive the executed block and can be used after the ``match`` statement. -During failed pattern matches, some sub-patterns may succeed. For +During failed pattern matches, some subpatterns may succeed. For example, while matching the pattern ``(0, x, 1)`` with the value ``[0, -1, 2]``, the sub-pattern ``x`` may succeed if the list elements are +1, 2]``, the subpattern ``x`` may succeed if the list elements are matched from left to right. The implementation may choose to either make persistent bindings for those partial matches or not. User code including a ``match`` statement should not rely on the bindings being @@ -179,7 +208,8 @@ variable in the nearest function scope unless there's an applicable ``nonlocal`` or ``global`` statement.) In a given pattern, a given name may be bound only once. This -disallows e.g. ``case x, x: ...`` but allows ``case [x] | x: ...``. +disallows for example ``case x, x: ...`` but allows ``case [x] | x: +...``. .. _wildcard_pattern: @@ -366,12 +396,12 @@ fails. Otherwise, it continues (see details in the `runtime`_ section). The named class must inherit from ``type``. It may be a single name -or a dotted name (e.g. ``some_mod.SomeClass`` or ``mod.pkg.Class``). +or a dotted name (e.g., ``some_mod.SomeClass`` or ``mod.pkg.Class``). Use ``object(foo=_)`` to check whether the matched object has an attribute ``foo``. -By default, sub-patterns may only match by keyword for -user-defined classes. In order to support positional sub-patterns, a +By default, subpatterns may only match by keyword for +user-defined classes. In order to support positional subpatterns, a custom ``__match_args__`` attribute is required. The runtime allows matching arbitrarily nested patterns by chaining all of the instance checks and @@ -431,7 +461,7 @@ the walrus operator (``:=``). For example:: The name on the left of the walrus operator can be used in a guard, in the case block, or after the ``match`` statement. However, the name will -*only* be bound if the sub-pattern succeeds. Another example:: +*only* be bound if the subpattern succeeds. Another example:: match group_shapes(): case [], [point := Point(x, y), *other]: @@ -504,14 +534,14 @@ pre-existing metaclass hook) or by using ``typing.Protocol``. The procedure is as following: -* The class object for ``Class`` in ``Class()`` is +* The class object for ``Class`` in ``Class()`` is looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is the subject value. If false, the match fails. -* Otherwise, if any sub-patterns are given in the form of positional +* Otherwise, if any subpatterns are given in the form of positional or keyword arguments, these are matched from left to right, as - follows. The match fails as soon as a sub-pattern fails; if all - sub-patterns succeed, the overall class pattern match succeeds. + follows. The match fails as soon as a subpattern fails; if all + subpatterns succeed, the overall class pattern match succeeds. * If there are match-by-position items and the class has a ``__match_args__`` attribute, the item at position ``i`` @@ -531,7 +561,7 @@ The procedure is as following: * If there are any match-by-keyword items the keywords are looked up as attributes on the subject. If the lookup succeeds, the value is - matched against the corresponding sub-pattern. If the lookup fails, + matched against the corresponding subpattern. If the lookup fails, the match fails. Such a protocol favors simplicity of implementation over flexibility and @@ -540,7 +570,7 @@ performance. For other considered alternatives, see "extended matching". For the most commonly-matched built-in types (``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, ``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a -single positional sub-pattern is allowed to be passed to +single positional subpattern is allowed to be passed to the call. Rather than matching any particular attribute on the subject, it instead matches the subject itself. This creates behavior that is useful and intuitive for these objects: @@ -550,14 +580,14 @@ creates behavior that is useful and intuitive for these objects: * ``bool(b)`` matches any ``bool`` and binds it to the name ``b``. -Overlapping sub-patterns ------------------------- +Overlapping subpatterns +----------------------- Certain classes of overlapping matches are detected at runtime and will raise exceptions. In addition to basic checks described in the previous subsection: -* The interpreter will check that two sub-patterns are not targeting the same +* The interpreter will check that two subpatterns are not targeting the same attribute, for example ``Point2d(1, 2, y=3)`` is an error. * It will also check that a mapping pattern does not attempt to match From 77f82e5e57709811bd9265c75d0d6c8ea77242a6 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Mon, 14 Sep 2020 14:00:22 -0700 Subject: [PATCH 20/54] Add/update some TODO sections --- pep-0634.rst | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 8b0929b5f00..9aad885c5ce 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -136,6 +136,12 @@ specified below. Patterns -------- +TODO: I dislike that "or_pattern" can refer to both something that +*definitely* has a ``|`` in it (in the specification of OR patterns) +and to something that merely has *operator precedence* allowing ``|`` +in it (in the specification of walrus patterns). But to fix this we'd +need to come up with a new name for the latter. + The top-level syntax for patterns is as follows:: patterns: open_sequence_pattern | pattern @@ -411,7 +417,9 @@ attribute lookups appropriately. OR patterns ~~~~~~~~~~~ -TODO: Modernize this section. +TODO: Modernize this section. Also, move it earlier (so that the +order in which pattern types are introduced in the top-level grammar +matches the order of the sections?) Multiple alternative patterns can be combined into one using ``|``. This means the whole pattern matches if at least one alternative matches. @@ -447,7 +455,9 @@ the same set of variables (excluding ``_``). For example:: Walrus patterns ~~~~~~~~~~~~~~~ -TODO: Modernize this section. +TODO: Modernize this section. Also, move it earlier (same as OR +patterns TODO). Also, consider changing the syntax from ``v := P`` to +``P as v`` and renaming (e.g. to AS pattern?). It is often useful for a pattern to match *and* bind the corresponding value to a name. For example, it can be useful to write more efficient From a8187687c380b4515f9f1cdc989b39f4386dada7 Mon Sep 17 00:00:00 2001 From: Daniel F Moisset Date: Tue, 15 Sep 2020 00:51:54 +0100 Subject: [PATCH 21/54] Completed first draft of sequence based tutorial --- pep-0636.rst | 271 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 257 insertions(+), 14 deletions(-) diff --git a/pep-0636.rst b/pep-0636.rst index aa775576798..461615a224b 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -70,26 +70,26 @@ Other rules I'm following (let me know if I forgot to): Tutorial ======== -Getting Started ---------------- - As an example to motivate this tutorial, you will be writing a text-adventure. That is a form of interactive fiction where the user enters text commands to interact with a fictional world and receives text descriptions of what happens. Commands will be simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``, ``enter shop`` or ``buy cheese``. +Matching sequences +------------------ + Your main loop will need to get input from the user and split it into words, let's say a list of strings like this:: command = input("What are you doing next? ") - words = command.split() + # analyze the result of command.split() -The next step is to interpret the words. Most of our commands will be two words: an +The next step is to interpret the words. Most of our commands will have two words: an action and an object. So you may be tempted to do the following:: - [action, obj] = words - # interpret action, obj + [action, obj] = command.split() + ... # interpret action, obj The problem with that line of code is that it's missing something: what if the user types more or less than 2 words? To prevent this problem you can either check the length @@ -97,27 +97,270 @@ of the list of words, or capture the ``ValueError`` that the statement above wou You can use a matching statement instead:: - match words: + match command.split(): case [action, obj]: - # interpret action, obj + ... # interpret action, obj The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks it against the **pattern** next to ``case``. A pattern is able to do two different things: - * Verify that the subject has certain structure. In your case, the ``[action, obj]`` - pattern matches any sequence of exactly two elements. This is called **matching** - * It will bind some names in the pattern to component elements of your subject. In - this case, if the list has two elements, it will bind ``action = words[0]`` and - ``obj = words[1]``. This is called **destructuring** +* Verify that the subject has certain structure. In your case, the ``[action, obj]`` + pattern matches any sequence of exactly two elements. This is called **matching** +* It will bind some names in the pattern to component elements of your subject. In + this case, if the list has two elements, it will bind ``action = subject[0]`` and + ``obj = subject[1]``. This is called **destructuring** If there's a match, the statements inside the ``case`` clause will be run with the bound variables. If there's no match, nothing happens and the next statement after ``match`` keeps running. +TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss +iterators? discuss [x, x] possibly later on? +Matching multiple patterns +-------------------------- +Even if most commands have the action/object form, you might want to have user commands +of different lengths. For example you might want to add single verbs with no object like +``look`` or ``quit``. A match statement can (and is likely to) have more than one +``case``:: + + match command.split(): + case [action]: + ... # interpret single-verb action + case [action, obj]: + ... # interpret action, obj + +The ``match`` statement will check patterns from top to bottom. If the pattern doesn't +match the subject, the next pattern will be tried. However, once the *first* +matching ``case`` clause is found, the body of that clause is executed, and all further +``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...`` +statement works. + +Matching specific values +------------------------ + +Your code still needs to look at the specific actions and conditionally run +different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``). +You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of +function, but here we'll leverage pattern matching to solve that task. Instead of a +variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``). +This allows you to write:: + + match command.split(): + case ["quit"]: + print("Goodbye!") + quit_game() + case ["look"]: + current_room.describe() + case ["get", obj]: + character.get(obj, current_room) + case ["go", direction]: + current_room = current_room.neighbor(direction) + # The rest of your commands go here + +A pattern like ``["get", obj]`` will match only 2-element sequences that have a first +element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``. + +As you can see in the ``go`` case, we also can use different variable names in +different patterns. + +FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it +literally, and a "soft constant" will not work :) + +Matching slices +--------------- +A player may be able to drop multiple objects by using a series of commands +``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and +you might like to allow dropping multiple items in a single command, like +``drop key sword cheese``. In this case you don't know beforehand how many words will +be in the command, but you can use extended unpacking in patterns in the same way that +they are allowed in assignments:: + + match command.split(): + case ["drop", *objects]: + for obj in objects: + character.drop(obj, current_room) + # The rest of your commands go here + +This will match any sequences having "drop" as its first elements. All remaining +elements will be captured in a ``list`` object which will be bound to the ``objects`` +variable. + +This syntax has similar restrictions as sequence unpacking: you can not have more than one +starred name in a pattern. + +Adding a catch-all +------------------ + +You may want to print an error message saying that the command wasn't recognized when +all the patterns fail. You could use the feature we just learned and write the +following:: + + match command.split(): + case ["quit"]: ... # Code omitted for brevity + case ["go", direction]: ... + case ["drop", *objects]: ... + ... # Other case clauses + case [*ignored_words]: + print(f"Sorry, I couldn't understand {command!r}") + +Note that you must add this last pattern at the end, otherwise it will match before other +possible patterns that should be considered. This works but it's a bit verbose and +somewhat wasteful: this will make a full copy of the word list, which will be bound to +``ignored_words`` even if it's never used. + +You can use an special pattern which is written ``_``, which always matches but it +doesn't bind anything. which would allow you to rewrite:: + + match command.split(): + ... # Other case clauses + case [*_]: + print(f"Sorry, I couldn't understand {command!r}") + +This pattern will match for any sequence. In this case we can simplify even more and +match any object:: + + match command.split(): + ... # Other case clauses + case _: + print(f"Sorry, I couldn't understand {command!r}") + +TODO: Explain about syntaxerror when having an irrefutable pattern above others? + +How patterns are being composed +------------------------------- + +This is a good moment to step back from the examples and understand how the patterns +that you have been using are built. Patterns can be nested within each other, and we +have being doing that implicitly in the examples above. + +There are some "simple" patterns ("simple" here meaning that they do not contain other +patterns) that we've seen: + +* **Literal patterns** (string literals, number literals, ``True``, ``False``, and + ``None``) +* The **wildcard pattern** ``_`` +* **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We + never discussed these separately, but used them as part of other patterns. Note that + a capture pattern by itself will always match, and usually makes sense only + as a catch-all at the end of your ``match`` if you desire to bind the name to the + subject. + +Until now, the only non-simple pattern we have experimented with is the +Then we have seen sequence patterns. Each element in a sequence pattern can in fact be +any other pattern. This means that you could write a pattern like +``["first", (left, right), *rest]``. This will match subjects which are a sequence of at +least two elements, where the first one is equal to ``"first"`` and the second one is +in turn a sequence of two elements. It will also bind ``left=subject[1][0]``, +``right=subject[1][1]``, and ``rest = subject[2:]`` + +Alternate patterns +------------------ + +Going back to the adventure game example, you may find that you'd like to have several +patterns resulting in the same outcome. For example, you might want the commands +``north`` and ``go north`` be equivalent. You may also desire to have aliases for +``get X``, ``pick up X`` and ``pick X up`` for any X. + +The ``|`` symbol in patterns combines them as alternatives. You could for example write:: + + match command.split(): + ... # Other case clauses + case ["north"] | ["go", "north"]: + current_room = current_room.neighbor("north") + case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]: + ... # Code for picking up the given object + +This is called an **or pattern** and will produce the expected result. Patterns are +attempted from left to right; this may be relevant to know what is bound if more than +one alternative matches. An important restriction when writing or patterns is that all +alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not +allowed because it would make unclear which variable would be bound after a successful +match. + +Capturing matched sub-patterns +------------------------------ + +An older version of our "go" command was written with a ``["go", direction]`` pattern. +The change we did in our last version using the pattern ``["north"] | ["go", "north"]`` +has some benefits but also some drawbacks in comparison: the latest version allows the +alias, but also has the direction hardcoded, which will force us to actually have +separate patterns for north/south/east/west. This leads to some code duplication, but at +the same time we get better input validation, and we will not be getting into that +branch if the command entered by the user is ``"go figure!"`` instead of an direction. + +We could try to get the best of both worlds doing the following (I'll omit the aliased +version without "go" for brevity):: + + match command.split(): + case ["go", ("north" | "south" | "east" | "west")]: + current_room = current_room.neighbor(...) + # how do I know which direction to go? + +This code is a single branch, and it verifies that the word after "go" is really a +direction. But the code moving the player around needs to know which one was chosen and +has no way to do so. What we need is a pattern that behaves like the or pattern but at +the same time does a capture. We can do so with a **walrus pattern**:: + + match command.split(): + case ["go", direction := ("north" | "south" | "east" | "west")]: + current_room = current_room.neighbor(direction) + +The walrus pattern (named like that because the ``:=`` operator looks like a sideways +walrus) matches whatever pattern is on its right hand side, but also binds the value to +a name. + +Conditional pattern matching +---------------------------- + +The patterns we have explored above can do some powerful data filtering, but sometimes +you may wish for the full power of a boolean expression. Let's say that you would actually +like to allow a "go" command only in a restricted set of directions based on the possible +exits from the current_room. We can achieve that by adding a **guard** to our +case-clause. Guards consist of the ``if`` keyword followed by any expression:: + + match command.split(): + case ["go", direction] if direction in current_room.exits: + current_room = current_room.neighbor(direction) + case ["go", _]: + print("Sorry, you can't go that way") + +The guard is not part of the pattern, it's part of the case clause. It's only checked if +the pattern matches, and after all the pattern variables have been bound (that's why the +condition can use the ``direction`` variable in the example above). If the pattern +matches and the condition is truthy, the body of the case clause runs normally. If the +pattern matches but the condition is falsy, the match statement proceeds to check the +next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of +having already bound some variables). + +The sequence of these steps must be considered carefully when combining or-patterns and +guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is +``[0, 100]``, the clause will be skipped. This happens because: + * The or-pattern finds the first alternative that matches the subject, which happens to + be ``[x, 100]`` + * ``x`` is bound to 0 + * The condition x > 10 is checked. Given that it's false, the whole case clause is + skipped. The ``[0, x]`` pattern is never attempted. + +Going to the cloud: Mappings +---------------------------- + +TODO: Give the motivating example of netowrk requests, describe JSON based "protocol" + +TODO: partial matches, double stars + +Matching objects +---------------- + +UI events motivations. describe events in dataclasses. inspiration for event objects +can be taken from https://www.pygame.org/docs/ref/event.html + +example of getting constants from module (like key names for keyboard events) + +customizing match_args? Copyright ========= From 4a254fa785a42be9138e3ed8372a277bfe3800a4 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Mon, 14 Sep 2020 17:21:36 -0700 Subject: [PATCH 22/54] Flesh out motivation section a bit --- pep-0635.rst | 95 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 91 insertions(+), 4 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 8e94efdef92..0e92bd6a6b7 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -18,16 +18,103 @@ Resolution: Abstract ======== -This PEP provides the motivation and rationale for PEP 634. +This PEP provides the motivation and rationale for PEP 634 +("Structural Pattern Matching: Specification"). First-time readers +are encouraged to start with PEP 636, which provides a gentler +introduction to the concepts, syntax and semantics of patterns. Motivation ========== -TBD. +(Structural) pattern matching syntax is found in many languages, from +Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for +JavaScript is also under consideration.) + +Python already supports a limited form of this through sequence +unpacking assignments, which the new proposal leverages. + +Several other common Python idioms are also relevant: + +- The ``if ... elif ... elif ... else`` idiom is often used to find + out the type or shape of an object in an ad-hoc fashion, using one + or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``, + ``len(x) == n`` or ``"key" in x`` as guards to select an applicable + block. The block can then assume ``x`` supports the interface + checked by the guard. For example:: + + if isinstance(x, tuple) and len(x) == 2: + host, port = x + mode = "http" + elif isinstance(x, tuple) and len(x) == 3: + host, port, mode = x + # Etc. + + Code like this is more elegantly rendered using ``match``:: + + match x: + case host, port: + mode = "http" + case host, port, mode: + pass + # Etc. + +- AST traversal code often looks for nodes matching a given pattern, + for example the code to detect a node of the shape "A + B * C" might + look like this:: + + if (isinstance(node, BinOp) and node.op == "+" + and isinstance(node.right, BinOp) and node.right.op == "*"): + a, b, c = node.left, node.right.left, node.right.right + # Handle a + b*c + + Using ``match`` this becomes more readable:: + + match node: + case BinOp("+", a, BinOp("*", b, c): + # Handle a + b*c + +- TODO: Other compelling examples? + +We believe that adding pattern matching to Python will enable Python +users to write cleaner, more readable code for examples like those +above, and many others. + +Pattern matching and OO +----------------------- + +Pattern matching is complimentary to the object-oriented paradigm. +Using OO and inheritance we can easily define a method on a base class +that defines default behavior for a specific operation on that class, +and we can override this default behavior in subclasses. We can also +use the Visitor pattern to separate actions from data. + +But this is not sufficient for all situations. For example, a code +generator may consume an AST, and have many operations where the +generated code needs to vary based not just on the class of a node, +but also on the value of some class attributes, like the ``BinOp`` +example above. The Visitor pattern is insufficiently flexible for +this: it can only select based on the class. + +For a complete example, see +https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231 + +TODO: Could we say more here? + +Pattern and functional style +---------------------------- + +Most Python applications and libraries are not written in a consistent +OO style -- unlike Java, Python encourages defining functions at the +top-level of a module, and for simple data structures, tuples (or +named tuples or lists) and dictionaries are often used exclusively or +mixed with classes or data classes. + +Pattern matching is particularly suitable for picking apart such data +structures. As an extreme example, it's easy to write code that picks +a JSON data structure using ``match``. -This section should explain why we think pattern matching is a good -addition for Python. +TODO: Example code. Rationale From b299ee753825c591f6ee5cf1362bc308f73482f4 Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Tue, 15 Sep 2020 11:05:43 +0200 Subject: [PATCH 23/54] Added motivation and history --- pep-0635.rst | 174 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 171 insertions(+), 3 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 8e94efdef92..0546f96aec6 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -24,10 +24,72 @@ This PEP provides the motivation and rationale for PEP 634. Motivation ========== -TBD. +**This section should explain why we think pattern matching is a good +addition for Python.** + +Since Python is a dynamically typed language, Python code frequently +has to deal with data that comes in different forms and shapes. This, +in turn, gives rise to a high degree of versatility by favouring a +'duck typing' style, imposing only the bare minimum of requirements +on the form and shape of data, i.e. the structure of data objects. +Nonetheless, we find that Python code is often sparkled with +conditions depending on `isinstance`, `len`, `getattr`, `hasattr`, +etc. Despite the benefits of 'duck typing', actual code still +requires to query the format or type of an object in order to choose +an appropriate processing action or to extract relevant bits of +information. + +Unfortunately, the conditions for slightly more complex structures +quickly grow into sequences of partly interdependent bits of +structural tests, hurting readbility and maintability. Pattern +matching offers here a more direct scheme of expressing the minimal +structure that an objects needs to have in order to allow for further +processing. Rather than writing a series of manual tests, patterns +follow a _declarative_ style, which is well-known for improved +readability, maintability, and for delegating the burden of efficient +execution on the compiler and interpreter. + +The concept of pattern matching is similar to regular expressions, +where succinct patterns describe a textual structure. A dedicated +compiler then transforms these declarative patterns into highly +efficient finite state machines. In contrast to regular expressions, +pattern matching targets Python objects rather than textual data, +and builds on _decision trees_ as the driving motor for finding a +match. Moreover, pattern matching blends the matching of a suitable +pattern with that of a function, i.e. code that is executed in order +to handle and process the information provided by a specific kind of +object. + +One of the simplest forms of pattern matching that we find in other +languages comes in the form of _function overloading_. The type and +number of arguments then determine which implementation of a specific +function will be executed. Object-oriented languages (including +Python) may also use the _visitor pattern_ to differentiate an action +based on the type or class of an object. Both of these approaches, +however, are aimed at 'shallow' structures with little or no direct +support for nested structures or structural information that is not +directly encoded in an object's class or type. For instance, it is +simple to differentiate between an integer, a string, and a tuple, say, +but becomes quite cumbersome and difficult to differentiate between +tuples of different lengths, or between one containing string elements +vs. one containing numeric elements. This is where pattern matching +shines: for structures that go beyond simple class distinctions. + +Although pattern matching is a concept that has been known and used +for decades, we propose a re-interpretation that centres around the +principle of 'duck typing' and builds on existing features of the +Python language such as iterable unpacking. Patterns adopt the syntax +of parameters as far as possible and, to a somewhat lesser degree, +that of targets in iterable unpacking. In contrast to iterable +unpacking, pattern matching is a 'conditional' feature that has to +avoid side-effects, i.e. extracting elements from an abstract iterable +(thus working with actual sequences instead) or assigning to non-local +targets such as object attributes or container elements. Overall, we +followed the guiding principle that patterns be static templates for +the structure and type of objects, i.e. patterns should depend as +little as possible on the surrounding context or current values of +variables (other than the subject to be matched, that is). -This section should explain why we think pattern matching is a good -addition for Python. Rationale @@ -40,6 +102,112 @@ It takes the place of "Rejected ideas" in the standard PEP format. It is organized in sections corresponding to the specification (PEP 634). + +History +======= + +Pattern matching emerged in the late 1970s in the form of tuple unpacking +and as a means to handle recursive data structures such as linked lists or +trees (object-oriented languages use the visitor pattern for handling +recursive data structures). The early proponents of pattern matching +organised structured data in 'tagged tuples' rather than `struct`s as in +_C_ or the objects introduced later. A node in a binary tree would, for +instance, be a tuple with two elements for the left and right branches, +respectively, and a `Node`-tag, written as `Node(left, right)`. In Python +we would probably put the tag inside the tuple as `('Node', left, right)` +or define a data class `Node` to achieve the same effect. + +Using modern syntax, a depth-first search (DFS) would then be written as +follows: +``` +def DFS(node): + node match: + case Node(left, right): + DFS(left) + DFS(right) + case Leaf(value): + handle(value) +``` + +The notion of handling recursive data structures with pattern matching +immediately gave rise to the idea of handling general recursive patterns +with pattern matching. Pattern matching would thus also be used to define +recursive functions such as: +``` +def fib(arg): + match arg: + case 0: + return 1 + case 1: + return 1 + case n: + return fib(n-1) + fib(n-2) +``` + +As pattern matching was repeatedly integrated into new and emerging +programming languages, its syntax slightly evolved and expanded. The two +first cases in the `fib` example above could be written more succinctly as +`case 0 | 1:` with `|` denoting alternative patterns. Moreover, the +underscore `_` was generally accepted as a wildcard, a filler where neither +the structure nor value of parts of a pattern were of substance. Since the +underscore is already frequently used in equivalent capacity in Python's +iterable unpacking (e.g., `_, _, third, _* = something`) we kept these +universal standards. + +It is noteworthy that the concept of pattern matching has always been +closely linked to the concept of functions. The different case clauses +have always been considered as something like semi-indepedent functions +where pattern variables take on the role of parameters. This becomes +most apparent when pattern matching is written as an overloaded function, +along the lines of: +``` +def fib( 0 | 1 ): + return 1 +def fib( n ): + return fib(n-1) + fib(n-2) +``` +Even though such a strict separation of case clauses into independent +functions does not make sense in Python, we find that patterns share many +syntactic rules with parameters, such as binding arguments to local +variables only or that variable/parameter names must not be repeated for +a particular pattern/function. + +With its emphasis on abstraction and encapsulation, object-oriented +programming posed a serious challenge to pattern matching. In short: in +object-oriented programming, we can no longer view objects as tagged tuples. +The arguments passed into the constructor do not necessarily specify the +attributes or fields of the objects. Moreover, there is no longer a strict +ordering of an object's fields and some of the fields might be private and +thus inaccessible. And on top of this, the given object might actually be +an instance of a subclass with slightly different structure. + +To address this challenge, patterns became increasingly independent of the +original tuple constructors. In a pattern like `Node(left, right)`, `Node` +is no longer a passive tag, but rather a function that can actively check +for any given object whether it has the right structure and extract a `left` +and `right` field. In other words: the `Node`-tag becomes a function that +transforms an object into a tuple or returns `None` to indicate that it is +not possible. + +In Python, we simply use `isinstance()` together with the `__match_args__` +field of a class to check whether an object has the correct structure and +then transform some of its attributes into a tuple. For the `Node` example +above, for instance, we would have `__match_args__ = ('left', 'right')` to +indicate that these two attributes should be extracted to form the tuple. +That is, `case Node(x, y)` would first check whether a given object is an +instance of `Node` and then assign `left` to `x` and `right` to `y`, +respectively. + +Paying tribute to Python's dynamic nature with 'duck typing', however, we +also added a more direct way to specify the presence of, or constraints on +specific attributes. Instead of `Node(x, y)` you could also write +`object(left=x, right=y)`, effectively eliminating the `isinstance()` check +and thus supporting any object with `left` and `right` attributes. Or you +would combine these ideas to write `Node(right=y)` so as to require an +instance of `Node` but only extract the value of the `right` attribute. + + + Copyright ========= From 87ef7efc136dbe35da5ea1dc289d1c89c9b804ac Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Tue, 15 Sep 2020 16:28:52 +0200 Subject: [PATCH 24/54] Adding to PEP-635 --- pep-0635.rst | 212 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 153 insertions(+), 59 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 0546f96aec6..265b2215eab 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -33,11 +33,11 @@ in turn, gives rise to a high degree of versatility by favouring a 'duck typing' style, imposing only the bare minimum of requirements on the form and shape of data, i.e. the structure of data objects. Nonetheless, we find that Python code is often sparkled with -conditions depending on `isinstance`, `len`, `getattr`, `hasattr`, -etc. Despite the benefits of 'duck typing', actual code still -requires to query the format or type of an object in order to choose -an appropriate processing action or to extract relevant bits of -information. +conditions depending on ``isinstance``, ``len``, ``getattr``, +``hasattr``, etc. Despite the benefits of 'duck typing', actual +code still requires to query the format or type of an object in order +to choose an appropriate processing action or to extract relevant bits +of information. Unfortunately, the conditions for slightly more complex structures quickly grow into sequences of partly interdependent bits of @@ -90,6 +90,18 @@ the structure and type of objects, i.e. patterns should depend as little as possible on the surrounding context or current values of variables (other than the subject to be matched, that is). +Pattern matching is a structure that _maps_ different patterns/templates +to 'function bodies' or actions. This general mapping structure can be +found in different context as well. Algol-derived languages usually +provide a switch table that maps ordinal values to actions, whereas Lisp +has a more general mapping from general conditions to actions. Although +all these constructs share a similar overall structure and some syntax, +their intents and motivation differs highly. In particular, pattern +matching as proposed here is not intended as or an extension of a switch +structure, although it is possible to emulate it to a large degree with +the syntax proposed here. + + Rationale @@ -103,55 +115,137 @@ It is organized in sections corresponding to the specification (PEP 634). -History -======= +The ``match`` statement +----------------------- + +TBD. + +The overall syntax of each case clause is similar to that of lambda +functions, although the body of case clauses are blocks of statements +rather than expressions. Compare, for instance a lambda function to +add two values:: + + lambda x, y: x + y + +with a case clause performing the same operation:: + + case x, y: + return x + y + +The case clause would, of course, be embedded in a match statement and +ultimately in a function. Nonetheless, understanding the patterns +following the ``case`` as a generalisation of parameters is a solid +mental model to approach and understand pattern matching. + + + +.. _patterns: + +Patterns +-------- + +Patterns are most aptly described as a generalisation of parameters as +in function definitions. They also share some characteristics with +targets of iterable unpacking. Most importantly, however, patterns are +not expressions. A pattern cannot be evaluated or executed, it is a +static declaration of a structural template. This declarative nature +is a characteristic it shares with ``global`` statements, for instance, +but also with regular expressions or context-free grammars. + +Python's iterable unpacking can assign values to any valid target, +including attributes and subscripts. This allows you to write, e.g., +``self.x, self.y = x, y`` in a class' initialisator, or +``a[i+1], a[i] = a[i], a[i+1]`` to swap two elements in a list. The +same approach, however, does not work for patterns due to their +'conditional' nature. It is at the very core of pattern matching that +a pattern may safely fail to match a given subject and reject it. In +order for this to make sense and to reason about patterns, it is +imperative to avoid any side effects (as far as possible within the +bounds of a dynamic language). Patterns can therefore not assign +values to arbitrary targets, but rather bind _local_ variables to +values extracted from the data provided. + +Another consequence of the static declarative nature of patterns is that +they cannot contain expressions. Nonetheless, as some structures are +discerned by specific _values_ (e.g., an object for 'addition' might be +discerned by the ``operator`` field holding the string value ``'+'``), +patterns can contain such values/constants. The overall rules, however, +specifically exclude actual expressions and make sure that only specific +values are integrated into patterns. The value ``-3``, for instance, is +syntactically interpreted as the expression comprising the unary operation +'negate' applied to the positive integer '3' (i.e. Python's syntax does +not support negative numbers as atomic literals). The overall syntax of +patterns is carefully crafted to ensure that entities such as negative +numbers can be included despite the exclusion of expressions in general. + +Nonetheless, it is desirable to express some constant values through named +constants. ``HttpStatus.OK``, for instance, might be much more readable +than the plain number ``200``. This poses a challenge, though, because +the Python compiler cannot infer reliable from context, which names are +meant to denote variables/parameters and which are meant to denote named +constants. Noting that many meaningful constants are organised in specific +modules or enumerations, we follow a pragmatic approach here and interpret +any dotted names as constants (recall that assignments to attributes are +not possible because of side effects, anyway). We acknowledge that this +rule may seem restrictive as it leaves out support for named constants +coming from the current namespace. However, all alternatives turned out +to either introduce much more complex rules or additional syntax. We would +also like to emphasise that better syntactic support for named constants +could still be added in future proposals, thus warranting our focus on a +minimal viable specficiation. + + + + + +History and Context +=================== Pattern matching emerged in the late 1970s in the form of tuple unpacking and as a means to handle recursive data structures such as linked lists or trees (object-oriented languages use the visitor pattern for handling recursive data structures). The early proponents of pattern matching -organised structured data in 'tagged tuples' rather than `struct`s as in +organised structured data in 'tagged tuples' rather than ``struct``s as in _C_ or the objects introduced later. A node in a binary tree would, for instance, be a tuple with two elements for the left and right branches, -respectively, and a `Node`-tag, written as `Node(left, right)`. In Python -we would probably put the tag inside the tuple as `('Node', left, right)` -or define a data class `Node` to achieve the same effect. +respectively, and a ``Node``-tag, written as ``Node(left, right)``. In +Python we would probably put the tag inside the tuple as +``('Node', left, right)`` or define a data class `Node` to achieve the +same effect. Using modern syntax, a depth-first search (DFS) would then be written as -follows: -``` -def DFS(node): - node match: - case Node(left, right): - DFS(left) - DFS(right) - case Leaf(value): - handle(value) -``` +follows:: + + def DFS(node): + node match: + case Node(left, right): + DFS(left) + DFS(right) + case Leaf(value): + handle(value) The notion of handling recursive data structures with pattern matching immediately gave rise to the idea of handling general recursive patterns with pattern matching. Pattern matching would thus also be used to define -recursive functions such as: -``` -def fib(arg): - match arg: - case 0: - return 1 - case 1: - return 1 - case n: - return fib(n-1) + fib(n-2) -``` +recursive functions such as:: + + def fib(arg): + match arg: + case 0: + return 1 + case 1: + return 1 + case n: + return fib(n-1) + fib(n-2) As pattern matching was repeatedly integrated into new and emerging programming languages, its syntax slightly evolved and expanded. The two -first cases in the `fib` example above could be written more succinctly as -`case 0 | 1:` with `|` denoting alternative patterns. Moreover, the -underscore `_` was generally accepted as a wildcard, a filler where neither +first cases in the ``fib`` example above could be written more succinctly +as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the +underscore ``_`` was generally accepted as a wildcard, a filler where neither the structure nor value of parts of a pattern were of substance. Since the underscore is already frequently used in equivalent capacity in Python's -iterable unpacking (e.g., `_, _, third, _* = something`) we kept these +iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these universal standards. It is noteworthy that the concept of pattern matching has always been @@ -159,13 +253,13 @@ closely linked to the concept of functions. The different case clauses have always been considered as something like semi-indepedent functions where pattern variables take on the role of parameters. This becomes most apparent when pattern matching is written as an overloaded function, -along the lines of: -``` -def fib( 0 | 1 ): - return 1 -def fib( n ): - return fib(n-1) + fib(n-2) -``` +along the lines of:: + + def fib( 0 | 1 ): + return 1 + def fib( n ): + return fib(n-1) + fib(n-2) + Even though such a strict separation of case clauses into independent functions does not make sense in Python, we find that patterns share many syntactic rules with parameters, such as binding arguments to local @@ -182,29 +276,29 @@ thus inaccessible. And on top of this, the given object might actually be an instance of a subclass with slightly different structure. To address this challenge, patterns became increasingly independent of the -original tuple constructors. In a pattern like `Node(left, right)`, `Node` -is no longer a passive tag, but rather a function that can actively check -for any given object whether it has the right structure and extract a `left` -and `right` field. In other words: the `Node`-tag becomes a function that -transforms an object into a tuple or returns `None` to indicate that it is -not possible. - -In Python, we simply use `isinstance()` together with the `__match_args__` +original tuple constructors. In a pattern like ``Node(left, right)``, +``Node`` is no longer a passive tag, but rather a function that can actively +check for any given object whether it has the right structure and extract a +``left`` and ``right`` field. In other words: the ``Node``-tag becomes a +function that transforms an object into a tuple or returns ``None`` to +indicate that it is not possible. + +In Python, we simply use ``isinstance()`` together with the ``__match_args__`` field of a class to check whether an object has the correct structure and then transform some of its attributes into a tuple. For the `Node` example -above, for instance, we would have `__match_args__ = ('left', 'right')` to +above, for instance, we would have ``__match_args__ = ('left', 'right')`` to indicate that these two attributes should be extracted to form the tuple. -That is, `case Node(x, y)` would first check whether a given object is an -instance of `Node` and then assign `left` to `x` and `right` to `y`, +That is, ``case Node(x, y)`` would first check whether a given object is an +instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``, respectively. Paying tribute to Python's dynamic nature with 'duck typing', however, we also added a more direct way to specify the presence of, or constraints on -specific attributes. Instead of `Node(x, y)` you could also write -`object(left=x, right=y)`, effectively eliminating the `isinstance()` check -and thus supporting any object with `left` and `right` attributes. Or you -would combine these ideas to write `Node(right=y)` so as to require an -instance of `Node` but only extract the value of the `right` attribute. +specific attributes. Instead of ``Node(x, y)`` you could also write +``object(left=x, right=y)``, effectively eliminating the ``isinstance()`` +check and thus supporting any object with ``left`` and ``right`` attributes. +Or you would combine these ideas to write ``Node(right=y)`` so as to require +an instance of ``Node`` but only extract the value of the `right` attribute. From 110f40f046f263025f695a5eafab1b9bfcce94ec Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Tue, 15 Sep 2020 17:17:56 -0700 Subject: [PATCH 25/54] Fix ReST markup - Use *words*, not _words_, for emphasis - Fix some bullets --- pep-0635.rst | 20 ++++++++++++-------- pep-0636.rst | 3 ++- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 7215909dc60..c1d8d5a7ace 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -15,6 +15,7 @@ Post-History: Resolution: + Abstract ======== @@ -24,6 +25,7 @@ are encouraged to start with PEP 636, which provides a gentler introduction to the concepts, syntax and semantics of patterns. + Motivation (Guido's version) ============================ @@ -116,6 +118,8 @@ a JSON data structure using ``match``. TODO: Example code. + + Motivation (Tobias' version) ============================ @@ -140,7 +144,7 @@ structural tests, hurting readbility and maintability. Pattern matching offers here a more direct scheme of expressing the minimal structure that an objects needs to have in order to allow for further processing. Rather than writing a series of manual tests, patterns -follow a _declarative_ style, which is well-known for improved +follow a *declarative* style, which is well-known for improved readability, maintability, and for delegating the burden of efficient execution on the compiler and interpreter. @@ -149,17 +153,17 @@ where succinct patterns describe a textual structure. A dedicated compiler then transforms these declarative patterns into highly efficient finite state machines. In contrast to regular expressions, pattern matching targets Python objects rather than textual data, -and builds on _decision trees_ as the driving motor for finding a +and builds on *decision trees* as the driving motor for finding a match. Moreover, pattern matching blends the matching of a suitable pattern with that of a function, i.e. code that is executed in order to handle and process the information provided by a specific kind of object. One of the simplest forms of pattern matching that we find in other -languages comes in the form of _function overloading_. The type and +languages comes in the form of *function overloading*. The type and number of arguments then determine which implementation of a specific function will be executed. Object-oriented languages (including -Python) may also use the _visitor pattern_ to differentiate an action +Python) may also use the *visitor pattern* to differentiate an action based on the type or class of an object. Both of these approaches, however, are aimed at 'shallow' structures with little or no direct support for nested structures or structural information that is not @@ -185,7 +189,7 @@ the structure and type of objects, i.e. patterns should depend as little as possible on the surrounding context or current values of variables (other than the subject to be matched, that is). -Pattern matching is a structure that _maps_ different patterns/templates +Pattern matching is a structure that *maps* different patterns/templates to 'function bodies' or actions. This general mapping structure can be found in different context as well. Algol-derived languages usually provide a switch table that maps ordinal values to actions, whereas Lisp @@ -197,6 +201,7 @@ structure, although it is possible to emulate it to a large degree with the syntax proposed here. + Rationale ========= @@ -255,12 +260,12 @@ a pattern may safely fail to match a given subject and reject it. In order for this to make sense and to reason about patterns, it is imperative to avoid any side effects (as far as possible within the bounds of a dynamic language). Patterns can therefore not assign -values to arbitrary targets, but rather bind _local_ variables to +values to arbitrary targets, but rather bind *local* variables to values extracted from the data provided. Another consequence of the static declarative nature of patterns is that they cannot contain expressions. Nonetheless, as some structures are -discerned by specific _values_ (e.g., an object for 'addition' might be +discerned by specific *values* (e.g., an object for 'addition' might be discerned by the ``operator`` field holding the string value ``'+'``), patterns can contain such values/constants. The overall rules, however, specifically exclude actual expressions and make sure that only specific @@ -290,7 +295,6 @@ minimal viable specficiation. - History and Context =================== diff --git a/pep-0636.rst b/pep-0636.rst index f8881473ed7..a0d393bca96 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -340,11 +340,12 @@ having already bound some variables). The sequence of these steps must be considered carefully when combining or-patterns and guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is ``[0, 100]``, the clause will be skipped. This happens because: + * The or-pattern finds the first alternative that matches the subject, which happens to be ``[x, 100]`` * ``x`` is bound to 0 * The condition x > 10 is checked. Given that it's false, the whole case clause is - skipped. The ``[0, x]`` pattern is never attempted. + skipped. The ``[0, x]`` pattern is never attempted. Going to the cloud: Mappings ---------------------------- From 3da455739d1ce8c0d4f09334beebaeba6ba622e3 Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Wed, 16 Sep 2020 17:36:15 +0200 Subject: [PATCH 26/54] Adding rationales to individual patterns --- pep-0635.rst | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/pep-0635.rst b/pep-0635.rst index c1d8d5a7ace..870965e0787 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -294,6 +294,100 @@ minimal viable specficiation. +.. _capture_pattern: + +Capture Patterns +~~~~~~~~~~~~~~~~ + +Capture patterns take on the form of a name that accepts any value and binds +it to a (local) variable. In that sense, a simple capture pattern is +basically equivalent to a parameter in a function definition (when the +function is called, eacg parameter binds the respective argument to a local +variable in the function's scope). + +A name used for a capture pattern must not coincide with another capture +pattern in the same pattern. This, again, is similar to parameters, which +equally require each parameter name to be unique within the list of +parameters. It differs, however, from iterable unpacking assignment, where +the repeated use of a variable name as target is permissible (e.g., +``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns +is its ambiguous reading: it could be seen as in iterable unpacking where +only the second binding to ``x`` survives. But it could be equally seen as +expressing a tuple with two equal elements (which comes with its own issues). +Should the need arise, then it is still possible to introduce support for +repeated use of names later on. + + +.. _wildcard_pattern: + +Wildcard Pattern +~~~~~~~~~~~~~~~~ + +The wildcard pattern is a special case of a 'capture' pattern: it accepts +any value, but does not bind it to a variable. The idea behind this rule +is to support repeated use of the wildcard in patterns. While ``(x, x)`` +constitutes an error, ``(_, _)`` is legal. + +Particularly in larger (sequence) patterns, it is important to allow the +pattern to concentrate on values with actual significance while ignoring +anything else. Without a wildcard, it would become necessary to 'invent' +a number of local variables, which would be bound but never used. Even +when sticking to naming conventions and using ``__1, __2, __3`` to name +irrelevant values, say, this still introduces visual clutter and can hurt +performance (compare the sqeuence pattern ``(x, y, *z)`` to ``(_, y, *_)``, +where the ``*z`` forces the interpreter to copy a potentially very long +sequence, whereas the second version simply compiles to code along the +lines of ``y = seq[1]``). + +There has been much discussion about the choice of the underscore as ``_`` +as a wildcard pattern, i.e. making this one name not-binding. However, the +underscore is already heavily used as an 'ignore value' marker in iterable +unpacking. Since the wildcard pattern ``_`` never binds, this use of the +underscore does not interfere with other uses such as inside the REPL or +internationalisation packages. + +Finally note that the underscore is as a wildcard pattern in *every* +programming language with pattern matching that we could find. Keeping +in mind that many users of Python also work with other programming +languages, have prior experience when learning Python, or moving on to +other languages after having learnt Python, we find that such well +established standards are important and relevant with respect to +readability and learnability. Moreover, concerns that this wildcard +means that a regular name received special treatment are not strong +enough to introduce syntax that would make Python special. + + +.. _literal_pattern: + +Literal Patterns +~~~~~~~~~~~~~~~~ + + +TBD. + +Literal patterns not only occur as patterns in their own right, but also +as keys in *mapping patterns*. + + +.. _constant_value_pattern: + +Constant Value Patterns +~~~~~~~~~~~~~~~~~~~~~~~ + + +.. _sequence_pattern: + +Sequence Patterns +~~~~~~~~~~~~~~~~~ + + + +.. _mapping_pattern: + +Mapping Patterns +~~~~~~~~~~~~~~~~ + + History and Context =================== From ebf907a74fd8e75f481db34a4937c19cad3794dc Mon Sep 17 00:00:00 2001 From: Daniel F Moisset Date: Wed, 16 Sep 2020 23:15:32 +0100 Subject: [PATCH 27/54] Minor editorial fixes, and moving into new branch --- pep-0636.rst | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/pep-0636.rst b/pep-0636.rst index a0d393bca96..30cf1d43fb2 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -23,12 +23,12 @@ This PEP is a tutorial for the pattern matching introduced by PEP 634. PEP 622 proposed syntax for pattern matching, which received detailed discussion both from the community and the Steering Council. A frequent concern was -about how easy would be to explain (and learn) about this feature. This PEP -addresses that concern providing the kind of document which learners could use +about how easy would be to explain (and learn) this feature. This PEP +addresses that concern providing the kind of document which developers could use to learn about pattern matching in Python. This is considered supporting material for PEP 634 (the technical specification -for pattern matching) and PEP 635 (the motivation and rational for having pattern +for pattern matching) and PEP 635 (the motivation and rationale for having pattern matching and design considerations). Meta @@ -145,7 +145,7 @@ Matching specific values Your code still needs to look at the specific actions and conditionally run different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``). You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of -function, but here we'll leverage pattern matching to solve that task. Instead of a +functions, but here we'll leverage pattern matching to solve that task. Instead of a variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``). This allows you to write:: @@ -209,7 +209,7 @@ following:: print(f"Sorry, I couldn't understand {command!r}") Note that you must add this last pattern at the end, otherwise it will match before other -possible patterns that should be considered. This works but it's a bit verbose and +possible patterns that could be considered. This works but it's a bit verbose and somewhat wasteful: this will make a full copy of the word list, which will be bound to ``ignored_words`` even if it's never used. @@ -250,8 +250,8 @@ patterns) that we've seen: as a catch-all at the end of your ``match`` if you desire to bind the name to the subject. -Until now, the only non-simple pattern we have experimented with is the -Then we have seen sequence patterns. Each element in a sequence pattern can in fact be +Until now, the only non-simple pattern we have experimented with is the sequence pattern. +Each element in a sequence pattern can in fact be any other pattern. This means that you could write a pattern like ``["first", (left, right), *rest]``. This will match subjects which are a sequence of at least two elements, where the first one is equal to ``"first"`` and the second one is @@ -280,7 +280,8 @@ attempted from left to right; this may be relevant to know what is bound if more one alternative matches. An important restriction when writing or patterns is that all alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not allowed because it would make unclear which variable would be bound after a successful -match. +match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful. + Capturing matched sub-patterns ------------------------------ @@ -341,11 +342,11 @@ The sequence of these steps must be considered carefully when combining or-patte guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is ``[0, 100]``, the clause will be skipped. This happens because: - * The or-pattern finds the first alternative that matches the subject, which happens to - be ``[x, 100]`` - * ``x`` is bound to 0 - * The condition x > 10 is checked. Given that it's false, the whole case clause is - skipped. The ``[0, x]`` pattern is never attempted. +* The or-pattern finds the first alternative that matches the subject, which happens to + be ``[x, 100]`` +* ``x`` is bound to 0 +* The condition x > 10 is checked. Given that it's false, the whole case clause is + skipped. The ``[0, x]`` pattern is never attempted. Going to the cloud: Mappings ---------------------------- From 44271fea32725f98751608bc836bd6e8e3c2b20b Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Thu, 17 Sep 2020 11:17:06 +0200 Subject: [PATCH 28/54] Adding rationale for literal and constant patterns --- pep-0635.rst | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 80 insertions(+), 2 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 870965e0787..805fa108f0f 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -362,8 +362,44 @@ enough to introduce syntax that would make Python special. Literal Patterns ~~~~~~~~~~~~~~~~ - -TBD. +Literal patterns are a convenient way for imposing constraints on the +value of a subject, rather than its type or structure. Literal patterns +even allow you to emulate a switch statement using pattern matching. On +the flipside, if you think of patterns as building on parameters and +assignment targets, literal patterns are a novel addition (i.e. you would +not write, e.g., ``(2, a, b) = c`` in iterable unpacking). + +Originally, literal patterns came from the idea of expressing unstructured +singleton objects such as ``None``: instead of requiring that a subject has +type ``NoneType``, it makes much more sense to directly write ``None``. +More generally, literal patterns could also be seen as syntactic sugar for +guards. Rather than ``case x if x == 0:``, you can simply write ``case 0:``. + +Generally, the subject is compared to a literal pattern by means of standard +equality (``x == y`` in Python syntax). Consequently, the literal patterns +``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:`` +and ``case 1:`` are fully interchangable. In principle, ``True`` would also +match the same set of objects because ``True == 1`` holds. However, we +believe that many users would be surprised findings that ``case True:`` +matched the object ``1.0``, resulting in some subtle bugs and convoluted +work arounds. We therefore adopted the rule that the three singleton +objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in +Python syntax) rather than equality. Hence, ``case True:`` will match only +``True`` and nothing else. Note that ``case 1:`` would still match ``True``, +though, because the literal pattern ``1`` works by equality and not identity. + +Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would +match both the integer ``1`` and the floating point number ``1.0``, whereas +``case 1:`` would only match the integer ``1`` very eventually dropped in +favour of the simpler and consistent rule based on equality. + +Recall that literal patterns are *not* expressions, but directly denote a +specific value or object. From a syntactical point of view, we have to +ensure that negative and complex numbers can equally be used as patterns, +although they are not atomic literal values (i.e. ``-3+4j`` is syntactically +an expression of the form ``BinOp(UnaryOp('-', 3), '+', 4j)``). +Interpolated *f*-strings, on the other hand, are not literal values, despite +their appearance and can therefore not be used as literal patterns. Literal patterns not only occur as patterns in their own right, but also as keys in *mapping patterns*. @@ -374,6 +410,48 @@ as keys in *mapping patterns*. Constant Value Patterns ~~~~~~~~~~~~~~~~~~~~~~~ +It is good programming style to use named constants for parametric values or +to clarify the meaning of particular values. Clearly, it would be desirable +to also write ``case (HttpStatus.OK, body):`` rather than +``case (200, body):``, say. The main issue that arises here is how to +discern capture patterns (variables) and constant value patterns. The +general discussion surrounding this issue has brought forward a plethora of +options, which we cannot all fully list here. + +Strictly speaking, constant value patterns are not really necessary, but +could be implemented using guards, i.e. +``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the +convenience of constant value patterns is unquestioned and obvious. + +The observation that constants tend to be written in uppercase letters or +collected in enumeration-like namespaces suggests possible rules to discern +constants syntactically. However, the idea of using upper vs. lower case as +a marker has been met with scepticism since there is no similar precedence +in core Python (although it is common in other languages). We therefore only +adopted the rule that any dotted name (i.e. attribute access) is to be +interpreted as a constant value pattern as exemplified by ``HttpStatus.OK`` +above. This excludes, in particular, local variables from acting as +constants. + +Global variables can only be directly used as constent when defined in other +modules, although there are work arounds to access the current module as a +namespace as well. A proposed rule to use a leading dot (e.g. +``.CONSTANT``) for that purpose was critisised because it was felt that the +dot would not be a visible-enough marker for that purpose. Partly inspired +by use cases in other programming languages, a number of different +markers/sigils was proposed (such as ``^CONSTANT``, ``$CONSTANT``, +``==CONSTANT``, etc.), although there was no obvious or natural choice. +The current proposal therefore leaves the discussion and possible +introduction of such a 'constant' marker for future PEPs. + +Distinguishing the semantics of names based on whether it is a global +variable (i.e. the compiler would treat global variables as constants rather +than capture patterns) leads to various issues. The addition or alteration +of a global variable in the module could have unintended side effects on +patterns. Moreover, pattern matching could not be used directly inside a +module's scope because all variables would be global, making capture +patterns impossible. + .. _sequence_pattern: From 34234263468d817f2826e873c0a06dbdd5d537e6 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 17 Sep 2020 14:19:51 -0700 Subject: [PATCH 29/54] Small tweaks; add some TODOs --- pep-0634.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/pep-0634.rst b/pep-0634.rst index 9aad885c5ce..b20a775de58 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -15,6 +15,12 @@ Post-History: Replaces: 622 Resolution: +TODO: Apparently subsubsections should use ^^^, not ~~~: +https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections +The amazing thing is actually that ReST allows you to use *any* +punctuation and determines the hierarchy automatically from usage! +https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#sections + Abstract ======== From 8b46c480b46766c66b3ea512314437ffda2b36dc Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Fri, 18 Sep 2020 09:09:07 +0200 Subject: [PATCH 30/54] Added class patterns to PEP-635 --- pep-0635.rst | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) diff --git a/pep-0635.rst b/pep-0635.rst index 805fa108f0f..a1a2372460c 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -212,6 +212,10 @@ It takes the place of "Rejected ideas" in the standard PEP format. It is organized in sections corresponding to the specification (PEP 634). +Overview and terminology +------------------------ + + The ``match`` statement ----------------------- @@ -236,6 +240,10 @@ following the ``case`` as a generalisation of parameters is a solid mental model to approach and understand pattern matching. +Match semantics +~~~~~~~~~~~~~~~ + + .. _patterns: @@ -453,6 +461,11 @@ module's scope because all variables would be global, making capture patterns impossible. +Group Patterns +~~~~~~~~~~~~~~ + + + .. _sequence_pattern: Sequence Patterns @@ -467,6 +480,94 @@ Mapping Patterns +.. _class_pattern: + +Class Patterns +~~~~~~~~~~~~~~ + +Class patterns fulfil two purposes: checking whether a given subject is +indeed an instance of a specific class and extracting data from specific +attributes of the subject. A quick survey revealed that ``isinstance()`` +is indeed one of the most often used functions in Python in terms of +static occurrences in programs. Such instance checks typically precede +a subsequent access to information stored in the object, or a possible +manipulation thereof. A typical pattern might be along the lines of:: + + def DFS(node): + if isinstance(node, Node): + DFS(node.left) + DFS(node.right) + elif isinstance(node, Leaf): + print(node.value) + +The class pattern lets you to concisely specify both an instance-check as +well as relevant attributes (with possible further constraints). It is +thereby very tempting to write, e.g., ``case Node(left, right):`` in the +first case above and ``case Leaf(value):`` in the second. While this +indeed works well for languages with strict algebraic data types, it is +problematic with the structure of Python objects. + +When dealing with general Python objects, we face a potentially very large +number of unordered attributes: an instance of ``Node`` contains a large +number of attributes (most of which are 'private methods' such as, e.g., +``__repr__``). Moreover, the interpreter cannot reliable deduce which of +the attributes comes first and which is second. For an object that +represents a circle, say, there is no inherently obvious ordering of the +attributes ``x``, ``y`` and ``radius``. + +We envision two possibilities for dealing with this issue: either explicitly +name the attributes of interest or provide an additional mapping that tells +the interpreter which attributes to extract and in which order. Both +approaches are supported. Moreover, explicitly naming the attributes of +interest lets you further specify the required structure of an object; if +an object lacks an attribute specified by the pattern, the match fails. + +- Attributes that are explicitly named pick up the syntax of named arguments. + If an object of class ``Node`` has two attributes ``left`` and ``right`` + as above, the pattern ``Node(left=x, right=y)`` will extract the values of + both attributes and assign them to ``x`` and ``y``, respectively. The data + flow from left to right seems unusual, but is in line with mapping patterns + and has precedents such as assignments via ``as`` in *with*- or + *import*-statements. + + Naming the attributes in question explicitly will be mostly used for more + complex cases where the positional form (below) is insufficient. + +- The class field ``__match_args__`` specifies a number of attributes + together with their ordering, allowing class patterns to rely on positional + sub-patterns without having to explicitly name the attributes in question. + This is particularly handy for smaller objects or instances data classes, + where the attributes of interest are rather obvious and often have a + well-defined ordering. In a way, ``__match_args__`` is similar to the + declaration of formal parameters, which allows to call functions with + positional arguments rather than naming all the parameters. + + +The syntax of class patterns is based on the idea that de-construction +mirrors the syntax of construction. This is already the case in virtually +any Python construct, be assignment targets, function definitions or +iterable unpacking. In all these cases, we find that the syntax for +sending and that for receiving 'data' are virtually identical. + +- Assignment targets such as variables, attributes and subscripts: + ``foo.bar[2] = foo.bar[3]``; + +- Function definitions: a function defined with ``def foo(x, y, z=6)`` + is called as, e.g., ``foo(123, y=45)``, where the actual arguments + provided at the call site are matched against the formal parameters + at the definition site; + +- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or + ``(a, b) = (b, a)``, just to name a few equivalent possibilities. + +Using the same syntax for reading and writing, l- and r-values, or +construction and de-construction is widely accepted for its benefits in +thinking about data, its flow and manipulation. This equally extends to +the explicit construction of instances, where class patterns ``c(p, q)`` +deliberately mirror the syntax of creating instances. + + + History and Context =================== From 4d9c62683f5d500051383ce598295cb23f19211f Mon Sep 17 00:00:00 2001 From: Daniel F Moisset Date: Fri, 18 Sep 2020 13:39:00 +0100 Subject: [PATCH 31/54] Add wording fixes suggested in review --- pep-0636.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/pep-0636.rst b/pep-0636.rst index 30cf1d43fb2..c5d0e856b13 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -23,7 +23,7 @@ This PEP is a tutorial for the pattern matching introduced by PEP 634. PEP 622 proposed syntax for pattern matching, which received detailed discussion both from the community and the Steering Council. A frequent concern was -about how easy would be to explain (and learn) this feature. This PEP +about how easy it would be to explain (and learn) this feature. This PEP addresses that concern providing the kind of document which developers could use to learn about pattern matching in Python. @@ -93,7 +93,7 @@ action and an object. So you may be tempted to do the following:: ... # interpret action, obj The problem with that line of code is that it's missing something: what if the user -types more or less than 2 words? To prevent this problem you can either check the length +types more or fewer than 2 words? To prevent this problem you can either check the length of the list of words, or capture the ``ValueError`` that the statement above would raise. You can use a matching statement instead:: @@ -112,7 +112,7 @@ things: this case, if the list has two elements, it will bind ``action = subject[0]`` and ``obj = subject[1]``. This is called **destructuring** -If there's a match, the statements inside the ``case`` clause will be run with the +If there's a match, the statements inside the ``case`` clause will be executed with the bound variables. If there's no match, nothing happens and the next statement after ``match`` keeps running. @@ -231,8 +231,8 @@ match any object:: TODO: Explain about syntaxerror when having an irrefutable pattern above others? -How patterns are being composed -------------------------------- +How patterns are composed +------------------------- This is a good moment to step back from the examples and understand how the patterns that you have been using are built. Patterns can be nested within each other, and we @@ -286,7 +286,7 @@ match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if succe Capturing matched sub-patterns ------------------------------ -An older version of our "go" command was written with a ``["go", direction]`` pattern. +The first version of our "go" command was written with a ``["go", direction]`` pattern. The change we did in our last version using the pattern ``["north"] | ["go", "north"]`` has some benefits but also some drawbacks in comparison: the latest version allows the alias, but also has the direction hardcoded, which will force us to actually have @@ -315,8 +315,8 @@ The walrus pattern (named like that because the ``:=`` operator looks like a sid walrus) matches whatever pattern is on its right hand side, but also binds the value to a name. -Conditional pattern matching ----------------------------- +Adding conditions to patterns +----------------------------- The patterns we have explored above can do some powerful data filtering, but sometimes you may wish for the full power of a boolean expression. Let's say that you would actually From 43ae8e87f50b3984359c6de6a6338a501a225d7d Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 10:40:42 -0700 Subject: [PATCH 32/54] Typos and such --- pep-0635.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index a1a2372460c..a8fbf09df49 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -73,7 +73,7 @@ Several other common Python idioms are also relevant: Using ``match`` this becomes more readable:: match node: - case BinOp("+", a, BinOp("*", b, c): + case BinOp("+", a, BinOp("*", b, c)): # Handle a + b*c - TODO: Other compelling examples? @@ -131,7 +131,7 @@ has to deal with data that comes in different forms and shapes. This, in turn, gives rise to a high degree of versatility by favouring a 'duck typing' style, imposing only the bare minimum of requirements on the form and shape of data, i.e. the structure of data objects. -Nonetheless, we find that Python code is often sparkled with +Nonetheless, we find that Python code is often sprinkled with conditions depending on ``isinstance``, ``len``, ``getattr``, ``hasattr``, etc. Despite the benefits of 'duck typing', actual code still requires to query the format or type of an object in order From a33d22df7d65e816d4496d1b9715452615fc9072 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 13:31:26 -0700 Subject: [PATCH 33/54] Use ^^^ for sub-sub-headings --- pep-0634.rst | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index b20a775de58..0850cac58f2 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -15,12 +15,6 @@ Post-History: Replaces: 622 Resolution: -TODO: Apparently subsubsections should use ^^^, not ~~~: -https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections -The amazing thing is actually that ReST allows you to use *any* -punctuation and determines the hierarchy automatically from usage! -https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#sections - Abstract ======== @@ -109,7 +103,7 @@ other context as variable or argument names. Match semantics -~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^ TODO: Make the language about choosing a block more precise. @@ -168,7 +162,7 @@ The top-level syntax for patterns is as follows:: .. _literal_pattern: Literal Patterns -~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^ Syntax:: @@ -204,7 +198,7 @@ value expressed by the literal, using the following comparisons rules: .. _capture_pattern: Capture Patterns -~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^ Syntax:: @@ -226,7 +220,7 @@ disallows for example ``case x, x: ...`` but allows ``case [x] | x: .. _wildcard_pattern: Wildcard Pattern -~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^ Syntax:: @@ -237,7 +231,7 @@ A wildcard pattern always succeeds. It binds no name. .. _constant_value_pattern: Constant Value Patterns -~~~~~~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^^^^^^ TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already a grammatical rule.) @@ -260,7 +254,7 @@ subject value (using the ``==`` operator). Group Patterns -~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^ Syntax: @@ -277,7 +271,7 @@ add parentheses around patterns to emphasize the intended grouping. .. _sequence_pattern: Sequence Patterns -~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^ Syntax:: @@ -336,7 +330,7 @@ sequence. .. _mapping_pattern: Mapping Patterns -~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^ Syntax:: @@ -375,7 +369,7 @@ entered. .. _class_pattern: Class Patterns -~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^ TODO: Modernize this section. @@ -421,7 +415,7 @@ attribute lookups appropriately. OR patterns -~~~~~~~~~~~ +^^^^^^^^^^^ TODO: Modernize this section. Also, move it earlier (so that the order in which pattern types are introduced in the top-level grammar @@ -459,7 +453,7 @@ the same set of variables (excluding ``_``). For example:: Walrus patterns -~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^ TODO: Modernize this section. Also, move it earlier (same as OR patterns TODO). Also, consider changing the syntax from ``v := P`` to From bec9f8a1b34ccecef87bc6aadbcbbedb239657df Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 14:58:54 -0700 Subject: [PATCH 34/54] Modernize class pattern, and more - Disallow duplicate keys in mapping pattern - Get rid of "The Match Protocol", "Overlapping subpatterns" and "Special attribute __match_args__"; these are subsumed in the sections above - Add TODOs to "Exceptions and side effects" --- pep-0634.rst | 208 ++++++++++++++++++++------------------------------- 1 file changed, 81 insertions(+), 127 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 0850cac58f2..62ae55a98ba 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -346,6 +346,11 @@ Syntax:: A mapping pattern may contain at most one double star pattern, and it must be last. +A mapping pattern may not contain duplicate key values. +(If all key patterns are literal patterns this is considered a +syntax error; otherwise this is a runtime error and will +raise ``TypeError``.) + A mapping pattern fails if the subject value is not an instance of ``collections.abc.Mapping``. @@ -371,47 +376,82 @@ entered. Class Patterns ^^^^^^^^^^^^^^ -TODO: Modernize this section. - Syntax:: class_pattern: - | name_or_attr '(' ')' - | name_or_attr '(' ','.pattern+ ','? ')' - | name_or_attr '(' ','.keyword_pattern+ ','? ')' - | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' + | name_or_attr '(' [pattern_arguments ','?] ')' + pattern_arguments: + | positional_patterns [',' keyword_patterns] + | keyword_patterns + positional_patterns: ','.pattern+ + keyword_patterns: ','.keyword_pattern+ keyword_pattern: NAME '=' or_pattern -A class pattern provides support for destructuring arbitrary objects. -There are two possible ways of matching on object attributes: by position -like ``Point(1, 2)``, and by name like ``Point(x=1, y=2)``. These -two can be combined, but a positional match cannot follow a match by name. -Each item in a class pattern can be an arbitrary pattern. A simple -example:: - - match shape: - case Point(x, y): - ... - case Rectangle(x0, y0, x1, y1, painted=True): - ... - -Whether a match succeeds or not is determined by the equivalent of an -``isinstance`` call. If the subject (``shape``, in the example) is not -an instance of the named class (``Point`` or ``Rectangle``), the match -fails. Otherwise, it continues (see details in the `runtime`_ -section). - -The named class must inherit from ``type``. It may be a single name -or a dotted name (e.g., ``some_mod.SomeClass`` or ``mod.pkg.Class``). -Use ``object(foo=_)`` to check whether the -matched object has an attribute ``foo``. - -By default, subpatterns may only match by keyword for -user-defined classes. In order to support positional subpatterns, a -custom ``__match_args__`` attribute is required. -The runtime allows matching -arbitrarily nested patterns by chaining all of the instance checks and -attribute lookups appropriately. +(Note that positional patterns may be unparenthesized walrus patterns, +but keyword patterns may not.) + +A class pattern may not repeat the same keyword multiple times. + +If ``name_or_attr`` is not an instance of the builtin ``type``, +``TypeError`` is raised. + +A class pattern fails if the target is not an instance of ``name_or_attr``. +This is tested using ``isinstance()``. + +If no arguments are present, the pattern succeeds if the ``isinstance()`` +check succeeds. Otherwise: + +- If only keyword patterns are present, they are processed as follows, + one by one: + + - The keyword is looked up as an attribute on the target. + - If this raises an exception other than ``AttributeError``, + the exception bubbles up. + - If this raises ``AttributeError`` the class pattern fails. + - Otherwise, the subpattern associated with the keyword is matched + against the attribute value. If this fails, the class pattern fails. + If it succeeds, the match proceeds to the next keyword. + - If all keyword patterns succeed, the class pattern as a whole succeeds. + +- If any positional patterns are present, they are converted to keyword + patterns (see below) and treated as additional keyword patterns, + preceding the syntactic keyword patterns (if any). + +Positional patterns are converted to keyword patterns using the +``__match_args__`` attribute on the class designated by ``name_or_attr``, +as follows: + +- For a number of built-in types (specified below), + a single positional subpattern is accepted which will match + the entire target; for these types no keyword patterns are accepted. +- The equivalent of ``getattr(cls, "__match_args__", ()))`` is called. +- If this raises an exception the exception bubbles up. +- If the returned value is not a list or tuple, the conversion fails + and ``TypeError`` is raised. +- If there are more positional patterns than the length of + ``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised. +- Otherwise, positional pattern ``i`` is converted to a keyword pattern + using ``__match_args__[i]`` as the keyword, + provided it the latter is a string; + if it is not, ``TypeError`` is raised. +- For duplicate keywords, ``TypeError`` is raised. + +Once the positional patterns have been converted to keyword patterns, +the match proceeds as if there were only keyword patterns. + +As mentioned above, for the following built-in types the handling of + positional +subpatterns is different: +``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, +``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``. + +This behavior is roughly equivalent to the following:: + + class C: + __match_args__ = ["__match_self_prop__"] + @property + def __match_self_prop__(self): + return self OR patterns @@ -531,107 +571,21 @@ is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as Runtime specification ===================== -TODO: Modernize this section. - -The Match Protocol ------------------- - -The equivalent of an ``isinstance`` call is used to decide whether an -a given class pattern matches a subject and to extract the corresponding -attributes. Classes requiring different matching semantics (such as -duck-typing) can do so by defining ``__instancecheck__`` (a -pre-existing metaclass hook) or by using ``typing.Protocol``. - -The procedure is as following: - -* The class object for ``Class`` in ``Class()`` is - looked up and ``isinstance(obj, Class)`` is called, where ``obj`` is - the subject value. If false, the match fails. - -* Otherwise, if any subpatterns are given in the form of positional - or keyword arguments, these are matched from left to right, as - follows. The match fails as soon as a subpattern fails; if all - subpatterns succeed, the overall class pattern match succeeds. - -* If there are match-by-position items and the class has a - ``__match_args__`` attribute, the item at position ``i`` - is matched against the value looked up by attribute - ``__match_args__[i]``. For example, a pattern ``Point2d(5, 8)``, - where ``Point2d.__match_args__ == ["x", "y"]``, is translated - (approximately) into ``obj.x == 5 and obj.y == 8``. - -* If there are more positional items than the length of - ``__match_args__``, a ``TypeError`` is raised. - -* If the ``__match_args__`` attribute is absent on the matched class, - and one or more positional item appears in a match, - ``TypeError`` is also raised. We don't fall back on - using ``__slots__`` or ``__annotations__`` -- "In the face of ambiguity, - refuse the temptation to guess." - -* If there are any match-by-keyword items the keywords are looked up - as attributes on the subject. If the lookup succeeds, the value is - matched against the corresponding subpattern. If the lookup fails, - the match fails. - -Such a protocol favors simplicity of implementation over flexibility and -performance. For other considered alternatives, see "extended matching". - -For the most commonly-matched built-in types (``bool``, -``bytearray``, ``bytes``, ``dict``, ``float``, -``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``), a -single positional subpattern is allowed to be passed to -the call. Rather than matching any particular attribute -on the subject, it instead matches the subject itself. This -creates behavior that is useful and intuitive for these objects: - -* ``int(0)`` matches ``0`` (but not ``0.0``). -* ``tuple((0, 1, 2))`` matches ``(0, 1, 2)`` (but not ``[0, 1, 2]``). -* ``bool(b)`` matches any ``bool`` and binds it to the name ``b``. - - -Overlapping subpatterns ------------------------ - -Certain classes of overlapping matches are detected at -runtime and will raise exceptions. In addition to basic checks -described in the previous subsection: - -* The interpreter will check that two subpatterns are not targeting the same - attribute, for example ``Point2d(1, 2, y=3)`` is an error. - -* It will also check that a mapping pattern does not attempt to match - the same key more than once. - - -Special attribute ``__match_args__`` ------------------------------------- - -The ``__match_args__`` attribute is always looked up on the type -object named in the pattern. If present, it must be a list or tuple -of strings naming the allowed positional arguments. - -In deciding what names should be available for matching, the -recommended practice is that class patterns should be the mirror of -construction; that is, the set of available names and their types -should resemble the arguments to ``__init__()``. - -Only match-by-name will work by default, and classes should define -``__match_args__`` as a class attribute if they would like to support -match-by-position. Additionally, dataclasses and named tuples will -support match-by-position out of the box. See below for more details. - Exceptions and side effects --------------------------- +TODO: Arguably the first paragraph below is duplicate from the class pattern spec. + While matching each case, the ``match`` statement may trigger execution of other -functions (for example ``__getitem__()``, ``__len__()`` or +functions (for example ``__getitem__()``, ``__getattribute__``, ``__len__()`` or a property). Almost every exception caused by those propagates outside of the ``match`` statement normally. The only case where an exception is not propagated is an ``AttributeError`` raised while trying to lookup an attribute while matching attributes of a Class Pattern; that case results in just a matching failure, and the rest of the statement proceeds normally. +TODO: Write this more strictly. (Also, isn't there another section about caching?) + The only side-effect carried on explicitly by the matching process is the binding of names. However, the process relies on attribute access, instance checks, ``len()``, equality and item access on the subject and some of From f1489042cf0f90fd52a8b9dbb8fcc0ce85a8ae13 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:18:36 -0700 Subject: [PATCH 35/54] Modernize OR patterns --- pep-0634.rst | 41 ++++++++++------------------------------- 1 file changed, 10 insertions(+), 31 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 62ae55a98ba..8f147714733 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -457,39 +457,18 @@ This behavior is roughly equivalent to the following:: OR patterns ^^^^^^^^^^^ -TODO: Modernize this section. Also, move it earlier (so that the -order in which pattern types are introduced in the top-level grammar -matches the order of the sections?) - -Multiple alternative patterns can be combined into one using ``|``. This means -the whole pattern matches if at least one alternative matches. -Alternatives are tried from left to right and have a short-circuit property, -subsequent patterns are not tried if one matched. Examples:: - - match something: - case 0 | 1 | 2: - print("Small number") - case [] | [_]: - print("A short sequence") - case str() | bytes(): - print("Something string-like") - case _: - print("Something else") +Syntax:: -The alternatives may bind variables, as long as each alternative binds -the same set of variables (excluding ``_``). For example:: + or_pattern: '|'.closed_pattern+ - match something: - case 1 | x: # Error! - ... - case x | 1: # Error! - ... - case one := [1] | two := [2]: # Error! - ... - case Foo(arg=x) | Bar(arg=x): # Valid, both arms bind 'x' - ... - case [x] | x: # Valid, both arms bind 'x' - ... +When two or more patterns are separated by vertical bars (``|``), +this is called an OR pattern. (A single closed pattern is just that.) + +Each subpattern must bind the same set of names. + +An OR pattern matches each of its subpatterns in turn to the target, +until one succeeds. The OR pattern is then deemed to succeed. +If none of the subpatterns succeed the OR pattern fails. Walrus patterns From 68c7226ea9d39073ed7e2fe4fd2c4f36d7e9bc6e Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:28:41 -0700 Subject: [PATCH 36/54] Modernize walrus pattern Also tweak remark about scope --- pep-0634.rst | 42 ++++++++++++------------------------------ 1 file changed, 12 insertions(+), 30 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 8f147714733..01c76f32ddc 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -142,11 +142,13 @@ and to something that merely has *operator precedence* allowing ``|`` in it (in the specification of walrus patterns). But to fix this we'd need to come up with a new name for the latter. +TODO: move descriptions of walrus and OR patterns up. + The top-level syntax for patterns is as follows:: patterns: open_sequence_pattern | pattern pattern: walrus_pattern | or_pattern - walrus_pattern: NAME ':=' or_pattern + walrus_pattern: capture_pattern ':=' or_pattern or_pattern: '|'.closed_pattern+ closed_pattern: | literal_pattern @@ -210,8 +212,8 @@ The single underscore (``_``) is not a capture pattern (this is what A capture pattern always succeeds. It binds the subject value to the name using the scoping rules for name binding established for the walrus operator in PEP 572. (Summary: the name becomes a local -variable in the nearest function scope unless there's an applicable -``nonlocal`` or ``global`` statement.) +variable in the closest containing function scope unless there's an +applicable ``nonlocal`` or ``global`` statement.) In a given pattern, a given name may be bound only once. This disallows for example ``case x, x: ...`` but allows ``case [x] | x: @@ -474,36 +476,16 @@ If none of the subpatterns succeed the OR pattern fails. Walrus patterns ^^^^^^^^^^^^^^^ -TODO: Modernize this section. Also, move it earlier (same as OR -patterns TODO). Also, consider changing the syntax from ``v := P`` to -``P as v`` and renaming (e.g. to AS pattern?). - -It is often useful for a pattern to match *and* bind the corresponding -value to a name. For example, it can be useful to write more efficient -matches, or simply to avoid repetition. To simplify such cases, any pattern -(other than the walrus pattern itself) can be preceded by a name and -the walrus operator (``:=``). For example:: - - match get_shape(): - case Line(start := Point(x, y), end) if start == end: - print(f"Zero length line at {x}, {y}") - -The name on the left of the walrus operator can be used in a guard, in -the case block, or after the ``match`` statement. However, the name will -*only* be bound if the subpattern succeeds. Another example:: +Syntax:: - match group_shapes(): - case [], [point := Point(x, y), *other]: - print(f"Got {point} in the second group") - process_coordinates(x, y) - ... + walrus_pattern: capture_pattern ':=' or_pattern -Technically, most such examples can be rewritten using guards and/or nested -``match`` statements, but this will be less readable and/or will produce less -efficient code. Essentially, most of the arguments in PEP 572 apply here -equally. +(Note: the name on the left may not be ``_``.) -The wildcard ``_`` is not a valid name here. +A walrus pattern matches the OR pattern on the right of the ``:=`` +operator against the target. If this fails, the walrus pattern fails. +Otherwise, the walrus pattern binds the target to the name on the left +of the ``:=`` operator and succeeds. .. _guards: From a03c44a9b56d2c50aa9814f8a3970a6fb554f798 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:41:33 -0700 Subject: [PATCH 37/54] Fix markup errors --- pep-0634.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 01c76f32ddc..d1af8b588a5 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -407,12 +407,16 @@ check succeeds. Otherwise: one by one: - The keyword is looked up as an attribute on the target. + - If this raises an exception other than ``AttributeError``, the exception bubbles up. + - If this raises ``AttributeError`` the class pattern fails. + - Otherwise, the subpattern associated with the keyword is matched against the attribute value. If this fails, the class pattern fails. If it succeeds, the match proceeds to the next keyword. + - If all keyword patterns succeed, the class pattern as a whole succeeds. - If any positional patterns are present, they are converted to keyword @@ -442,8 +446,7 @@ Once the positional patterns have been converted to keyword patterns, the match proceeds as if there were only keyword patterns. As mentioned above, for the following built-in types the handling of - positional -subpatterns is different: +positional subpatterns is different: ``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, ``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``. From f6944599103465c2c375e80c47a1cbc9b56ce5f2 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:47:07 -0700 Subject: [PATCH 38/54] Modernize description of guards --- pep-0634.rst | 42 ++++++++++-------------------------------- 1 file changed, 10 insertions(+), 32 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index d1af8b588a5..7e0d2df13f1 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -496,38 +496,16 @@ of the ``:=`` operator and succeeds. Guards ------ -TODO: Modernize this section. - -Each *top-level* pattern can be followed by a **guard** of the form -``if expression``. A case clause succeeds if the pattern matches and the guard -evaluates to a true value. For example:: - - match input: - case [x, y] if x > MAX_INT and y > MAX_INT: - print("Got a pair of large numbers") - case x if x > MAX_INT: - print("Got a large number") - case [x, y] if x == y: - print("Got equal items") - case _: - print("Not an outstanding input") - -If evaluating a guard raises an exception, it is propagated onwards rather -than fail the case clause. Names that appear in a pattern are bound before the -guard succeeds. So this will work:: - - values = [0] - - match values: - case [x] if x: - ... # This is not executed - case _: - ... - print(x) # This will print "0" - -Note that guards are not allowed for nested patterns, so that ``[x if x > 0]`` -is a ``SyntaxError`` and ``1 | 2 if 3 | 4`` will be parsed as -``(1 | 2) if (3 | 4)``. +Syntax:: + + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + +If a guard is present on a case block, once all patterns succeed, +the expression in the guard is evaluated. +If this raises an exception, the exception bubbles up. +Otherwise, if the condition is "truthy" the block is selected; +if it is "falsy" the next case block (if any) is tried. .. _runtime: From d342c4b79fc1153bbeee4206bd7f3ae8bb325709 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:53:57 -0700 Subject: [PATCH 39/54] Move guards, walrus and OR to the top --- pep-0634.rst | 98 ++++++++++++++++++++++++++-------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 7e0d2df13f1..a1a036cec65 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -131,6 +131,23 @@ The precise pattern binding rules vary per pattern type and are specified below. +.. _guards: + +Guards +^^^^^^ + +Syntax:: + + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + +If a guard is present on a case block, once all patterns succeed, +the expression in the guard is evaluated. +If this raises an exception, the exception bubbles up. +Otherwise, if the condition is "truthy" the block is selected; +if it is "falsy" the next case block (if any) is tried. + + .. _patterns: Patterns @@ -161,6 +178,38 @@ The top-level syntax for patterns is as follows:: | class_pattern +Walrus patterns +^^^^^^^^^^^^^^^ + +Syntax:: + + walrus_pattern: capture_pattern ':=' or_pattern + +(Note: the name on the left may not be ``_``.) + +A walrus pattern matches the OR pattern on the right of the ``:=`` +operator against the target. If this fails, the walrus pattern fails. +Otherwise, the walrus pattern binds the target to the name on the left +of the ``:=`` operator and succeeds. + + +OR patterns +^^^^^^^^^^^ + +Syntax:: + + or_pattern: '|'.closed_pattern+ + +When two or more patterns are separated by vertical bars (``|``), +this is called an OR pattern. (A single closed pattern is just that.) + +Each subpattern must bind the same set of names. + +An OR pattern matches each of its subpatterns in turn to the target, +until one succeeds. The OR pattern is then deemed to succeed. +If none of the subpatterns succeed the OR pattern fails. + + .. _literal_pattern: Literal Patterns @@ -459,55 +508,6 @@ This behavior is roughly equivalent to the following:: return self -OR patterns -^^^^^^^^^^^ - -Syntax:: - - or_pattern: '|'.closed_pattern+ - -When two or more patterns are separated by vertical bars (``|``), -this is called an OR pattern. (A single closed pattern is just that.) - -Each subpattern must bind the same set of names. - -An OR pattern matches each of its subpatterns in turn to the target, -until one succeeds. The OR pattern is then deemed to succeed. -If none of the subpatterns succeed the OR pattern fails. - - -Walrus patterns -^^^^^^^^^^^^^^^ - -Syntax:: - - walrus_pattern: capture_pattern ':=' or_pattern - -(Note: the name on the left may not be ``_``.) - -A walrus pattern matches the OR pattern on the right of the ``:=`` -operator against the target. If this fails, the walrus pattern fails. -Otherwise, the walrus pattern binds the target to the name on the left -of the ``:=`` operator and succeeds. - - -.. _guards: - -Guards ------- - -Syntax:: - - case_block: "case" patterns [guard] ':' block - guard: 'if' named_expression - -If a guard is present on a case block, once all patterns succeed, -the expression in the guard is evaluated. -If this raises an exception, the exception bubbles up. -Otherwise, if the condition is "truthy" the block is selected; -if it is "falsy" the next case block (if any) is tried. - - .. _runtime: Runtime specification From 0b1c43ce75f82f48bc7d26b16b54fc7ccb0f79e3 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 15:56:42 -0700 Subject: [PATCH 40/54] Make stdlib section toplevel Add a TODO, remove another --- pep-0634.rst | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index a1a036cec65..04963008d8c 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -159,8 +159,6 @@ and to something that merely has *operator precedence* allowing ``|`` in it (in the specification of walrus patterns). But to fix this we'd need to come up with a new name for the latter. -TODO: move descriptions of walrus and OR patterns up. - The top-level syntax for patterns is as follows:: patterns: open_sequence_pattern | pattern @@ -538,9 +536,7 @@ of what methods are called or how many times. User code relying on that behavior should be considered buggy. The standard library --------------------- - -TODO: Make this a top-level section? +==================== To facilitate the use of pattern matching, several changes will be made to the standard library: @@ -551,6 +547,9 @@ the standard library: will be the same as the order of corresponding arguments in the generated ``__init__()`` method. This includes the situations where attributes are inherited from a superclass. + +TODO: Is it possible to exclude dataclass fields from ``__init__``? +If so, should those be excluded from ``__match_args__``? In addition, a systematic effort will be put into going through existing standard library classes and adding ``__match_args__`` where From 71ce943f8568d1da01fd7e49745becd5cf1b1b21 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 16:12:16 -0700 Subject: [PATCH 41/54] Deal with a few more TODOs - Remove redundant text about exceptions - Side effects becomes a toplevel section - Reflow stdlib section --- pep-0634.rst | 59 ++++++++++++++++++++-------------------------------- 1 file changed, 22 insertions(+), 37 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 04963008d8c..4d5d1395450 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -157,7 +157,8 @@ TODO: I dislike that "or_pattern" can refer to both something that *definitely* has a ``|`` in it (in the specification of OR patterns) and to something that merely has *operator precedence* allowing ``|`` in it (in the specification of walrus patterns). But to fix this we'd -need to come up with a new name for the latter. +need to come up with a new name for the latter. (Scala cops out +with Pattern1, Pattern2, Pattern3...) The top-level syntax for patterns is as follows:: @@ -506,50 +507,34 @@ This behavior is roughly equivalent to the following:: return self -.. _runtime: +Side effects +============ -Runtime specification -===================== +The only side-effect produced explicitly by the matching process is +the binding of names. However, the process relies on attribute +access, instance checks, ``len()``, equality and item access on the +subject and some of its components. It also evaluates constant value +patterns and the class name of class patterns. While none of those +typically create any side-effects, in theory they could. This +proposal intentionally leaves out any specification of what methods +are called or how many times. This behavior is therefore undefined +and user code should not rely on it. -Exceptions and side effects ---------------------------- - -TODO: Arguably the first paragraph below is duplicate from the class pattern spec. - -While matching each case, the ``match`` statement may trigger execution of other -functions (for example ``__getitem__()``, ``__getattribute__``, ``__len__()`` or -a property). Almost every exception caused by those propagates outside of the -``match`` statement normally. The only case where an exception is not propagated is -an ``AttributeError`` raised while trying to lookup an attribute while matching -attributes of a Class Pattern; that case results in just a matching failure, -and the rest of the statement proceeds normally. - -TODO: Write this more strictly. (Also, isn't there another section about caching?) - -The only side-effect carried on explicitly by the matching process is the binding of -names. However, the process relies on attribute access, -instance checks, ``len()``, equality and item access on the subject and some of -its components. It also evaluates constant value patterns and the left side of -class patterns. While none of those typically create any side-effects, some of -these objects could. This proposal intentionally leaves out any specification -of what methods are called or how many times. User code relying on that -behavior should be considered buggy. The standard library ==================== -To facilitate the use of pattern matching, several changes will be made to -the standard library: +To facilitate the use of pattern matching, several changes will be +made to the standard library: -* Namedtuples and dataclasses will have auto-generated ``__match_args__``. +- Namedtuples and dataclasses will have auto-generated + ``__match_args__``. -* For dataclasses the order of attributes in the generated ``__match_args__`` - will be the same as the order of corresponding arguments in the generated - ``__init__()`` method. This includes the situations where attributes are - inherited from a superclass. - -TODO: Is it possible to exclude dataclass fields from ``__init__``? -If so, should those be excluded from ``__match_args__``? +- For dataclasses the order of attributes in the generated + ``__match_args__`` will be the same as the order of corresponding + arguments in the generated ``__init__()`` method. This includes the + situations where attributes are inherited from a superclass. Fields + with ``init=False`` are excluded from ``__match_args__``. In addition, a systematic effort will be put into going through existing standard library classes and adding ``__match_args__`` where From 1a313d2bf8c23d7505db14563746fb049005078a Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 16:24:33 -0700 Subject: [PATCH 42/54] Add main TODOs --- pep-0634.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/pep-0634.rst b/pep-0634.rst index 4d5d1395450..a58bbec1189 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -31,6 +31,14 @@ design choices are in PEP 635. First-time readers are encouraged to start with PEP 636, which provides a gentler introduction to the concepts, syntax and semantics of patterns. +TODO: Maybe we should add simple examples back to each section? +There's no rule saying a spec can't include examples, and currently +it's *very* dry. + +TODO: Go over the feedback from the SC and make sure everything's +somehow incorporated (either here or in PEP 635, which has to answer +why we didn't budge on most of the SC's initial requests). + Syntax and Semantics ==================== From 8b3f98cabe9855f1eefeaacdc4e12de96553b61c Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 23 Sep 2020 16:31:14 -0700 Subject: [PATCH 43/54] Add TODO: disallow open pattern w. guard --- pep-0634.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pep-0634.rst b/pep-0634.rst index a58bbec1189..10cf8162033 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -78,6 +78,8 @@ below, apply to these cases. The ``match`` statement ----------------------- +TODO: disallow open pattern with guard. + A ``match`` statement has the following top-level syntax:: match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT From 4d775de0a84befaad71338403b36c68e82e3ee30 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Fri, 25 Sep 2020 08:59:57 -0700 Subject: [PATCH 44/54] Twiddle TODOs --- pep-0634.rst | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 10cf8162033..024516d0dc1 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -78,8 +78,6 @@ below, apply to these cases. The ``match`` statement ----------------------- -TODO: disallow open pattern with guard. - A ``match`` statement has the following top-level syntax:: match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT @@ -163,13 +161,6 @@ if it is "falsy" the next case block (if any) is tried. Patterns -------- -TODO: I dislike that "or_pattern" can refer to both something that -*definitely* has a ``|`` in it (in the specification of OR patterns) -and to something that merely has *operator precedence* allowing ``|`` -in it (in the specification of walrus patterns). But to fix this we'd -need to come up with a new name for the latter. (Scala cops out -with Pattern1, Pattern2, Pattern3...) - The top-level syntax for patterns is as follows:: patterns: open_sequence_pattern | pattern @@ -190,6 +181,8 @@ The top-level syntax for patterns is as follows:: Walrus patterns ^^^^^^^^^^^^^^^ +TODO: Change to or_pattern 'as' capture_pattern (and rename)? + Syntax:: walrus_pattern: capture_pattern ':=' or_pattern From 502f0a960692071f5a686455e6f1dd3e8dc39b70 Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Sat, 26 Sep 2020 12:55:12 +0200 Subject: [PATCH 45/54] Rewrote sections on match statement rationale --- pep-0635.rst | 143 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 126 insertions(+), 17 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index a8fbf09df49..fc50b1538cf 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -220,29 +220,138 @@ Overview and terminology The ``match`` statement ----------------------- -TBD. - -The overall syntax of each case clause is similar to that of lambda -functions, although the body of case clauses are blocks of statements -rather than expressions. Compare, for instance a lambda function to -add two values:: - - lambda x, y: x + y +The match statement evaluates an expression to yield a subject, finds the +first pattern that matches the subject and executes the associated block +of code. Syntactically, the match statement thus takes an expression and +a sequence of case clauses, where each case clause comprises a pattern and +a block of code. + +Since case clauses comprise a block of code, they adhere to the existing +indentation scheme with the syntactic structure of +`` ...: <(indented) block>``, which in turn makes it a (compound) +statement. The chosen keyword ``case`` reflects its widespread use in +pattern matching languages, ignoring those languages that use other +syntactic means such as a symbol like ``|`` because it would not fit +established Python structures. The syntax of patterns following the +keyword is discussed below. + +Given that the case clauses follow the structure of a compound statement, +the match statement itself naturally becomes a compoung statement itself +as well, following the same syntactic structure. This naturally leads to +``match : +``. Note that the match statement determines +a quasi-scope in which the evaluated subject is kept alive (although not in +a local variable), similar to how a with statement might keep a resource +alive during execution of its block. Furthermore, control flows from the +match statement to a case clause and then leaves the block of the match +statement. The block of the match statement thus has both syntactic and +semantic meaning. + +Various suggestions have sought to eliminate or avoid the naturally arising +"double indentation" of a case clause's code block. Unfortunately, all such +proposals of *flat indentation schemes* come at the expense of violating +Python's establish structural paradigm, leading to additional syntactic +rules: + +- *Do no indent case clauses.* + The idea is to align case clauses with the ``match``, i.e.:: + + match expression: + case pattern_1: + ... + case pattern_2: + ... + + This may look awkward to the eye of a Python programmer, because + everywhere else colon is followed by an indent. The ``match`` would + neither follow the syntactic scheme of simple nor composite statements + but rather establish a category of its own. + +- *Put the expression on a line after ``match``.* + The idea is to use the expression yielding the subject as a statement + to avoid the singularity of ``match`` having no actual block despite + the colons:: + + match: + expression + case pattern_1: + ... + case pattern_2: + ... + + This was ultimately rejected because the first block would be another + novelty in Python's grammar: a block whose only content is a single + expression rather than a sequence of statements. Attempts to amend this + issue by adding or repurposing yet another keyword along the lines of + ``match: return expression`` did not yield any satisfactory solution. + +Although flat indentation would save some horizontal space, the cost of +increased complexity or unusual rules is too high. It would also complicate +life for simple-minded code editors. Finally, the horizontal space issue can +be alleviated by allowing "half-indent" (i.e. two spaces instead of four) +for match statements. + +In sample programs using match, written as part of the development of this +PEP, a noticeable improvement in code brevity is observed, more than making +up for the additional indentation level. + + +*Make it an expression.* Some suggestions centered around the idea of +making ``match`` an expression rather than a statement. However, this +would fit poorly with Python's statement-oriented nature and lead to +unusually long and complex expressions with the need to invent new +syntactic constructs or break well established syntactic rules. An +obvious consequence of ``match`` as an expression would be that case +clauses could no longer have abitrary blocks of code attached, but only +a single expression. Overall, the strong limitations could in no way +offset the slight simplification in some special use cases. -with a case clause performing the same operation:: - - case x, y: - return x + y - -The case clause would, of course, be embedded in a match statement and -ultimately in a function. Nonetheless, understanding the patterns -following the ``case`` as a generalisation of parameters is a solid -mental model to approach and understand pattern matching. Match semantics ~~~~~~~~~~~~~~~ +The patterns of different case clauses might overlap in that more than +one case clause would match a given subject. The first-to-match rule +ensures that the selection of a case clause for a given subject is +unambiguous. Furthermore, case clauses can have increasingly general +patterns matching wider classes of subjects. The first-to-match rule +then ensures that the most precise pattern can be chosen (although it +is the programmer's responsibility to order the case clauses correctly). + +In a statically typed language, the match statement would be compiled to +a decision tree to select a matching pattern quickly and very efficiently. +This would, however, require that all patterns be purely declarative and +static, running against the established dynamic semantics of Python. The +proposed semantics thus represent a path incorporating the best of both +worlds: patterns are tried in a strictly sequential order so that each +case clause constitutes an actual stement. At the same time, we allow +the interpreter to cache any information about the subject or change the +order in which subpatterns are tried. In other words: if the interpreter +has found that the subject is not an instance of a class ``C``, it can +directly skip case clauses testing for this again, without having to +perform repeated instance-checks. If a guard stipulates that a variable +``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might +check this directly after binding ``x`` and before any further +subpatterns are considered. + + +*Binding and scoping.* In many pattern matching implementations, each +case clause would establish a separate scope of its own. Variables bound +by a pattern would then only be visible inside the corresponding case block. +In Python, however, this does not make sense. Establishing separate scopes +would essentially mean that each case clause is a separate function without +direct access to the variables in the surrounding scope (without having to +resort to ``nonlocal`` that is). Moreover, a case clause could no longer +influence any surrounding control flow through standard statement such as +``return`` or ``break``. Hence, such script scoping would lead to +unintuitive and surprising behavior. + +A direct consequence of this is that any variable bindings outlive the +respective case or match statements. Even patterns that only match a +subject partially might bind local variables (this is, in fact, necessary +for guards to function properly). However, this escaping of variable +bindings is in line with existing Python structures such as for loops and +with statements. .. _patterns: From 51bd14181d3d5bba8e6b352a87bd625fe72f4294 Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Tue, 29 Sep 2020 09:10:24 +0200 Subject: [PATCH 46/54] Cleaning up pep-0635 --- pep-0635.rst | 198 +++++++++++++-------------------------------------- 1 file changed, 50 insertions(+), 148 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index fc50b1538cf..2cf4db50a22 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -120,87 +120,6 @@ TODO: Example code. -Motivation (Tobias' version) -============================ - -**This section should explain why we think pattern matching is a good -addition for Python.** - -Since Python is a dynamically typed language, Python code frequently -has to deal with data that comes in different forms and shapes. This, -in turn, gives rise to a high degree of versatility by favouring a -'duck typing' style, imposing only the bare minimum of requirements -on the form and shape of data, i.e. the structure of data objects. -Nonetheless, we find that Python code is often sprinkled with -conditions depending on ``isinstance``, ``len``, ``getattr``, -``hasattr``, etc. Despite the benefits of 'duck typing', actual -code still requires to query the format or type of an object in order -to choose an appropriate processing action or to extract relevant bits -of information. - -Unfortunately, the conditions for slightly more complex structures -quickly grow into sequences of partly interdependent bits of -structural tests, hurting readbility and maintability. Pattern -matching offers here a more direct scheme of expressing the minimal -structure that an objects needs to have in order to allow for further -processing. Rather than writing a series of manual tests, patterns -follow a *declarative* style, which is well-known for improved -readability, maintability, and for delegating the burden of efficient -execution on the compiler and interpreter. - -The concept of pattern matching is similar to regular expressions, -where succinct patterns describe a textual structure. A dedicated -compiler then transforms these declarative patterns into highly -efficient finite state machines. In contrast to regular expressions, -pattern matching targets Python objects rather than textual data, -and builds on *decision trees* as the driving motor for finding a -match. Moreover, pattern matching blends the matching of a suitable -pattern with that of a function, i.e. code that is executed in order -to handle and process the information provided by a specific kind of -object. - -One of the simplest forms of pattern matching that we find in other -languages comes in the form of *function overloading*. The type and -number of arguments then determine which implementation of a specific -function will be executed. Object-oriented languages (including -Python) may also use the *visitor pattern* to differentiate an action -based on the type or class of an object. Both of these approaches, -however, are aimed at 'shallow' structures with little or no direct -support for nested structures or structural information that is not -directly encoded in an object's class or type. For instance, it is -simple to differentiate between an integer, a string, and a tuple, say, -but becomes quite cumbersome and difficult to differentiate between -tuples of different lengths, or between one containing string elements -vs. one containing numeric elements. This is where pattern matching -shines: for structures that go beyond simple class distinctions. - -Although pattern matching is a concept that has been known and used -for decades, we propose a re-interpretation that centres around the -principle of 'duck typing' and builds on existing features of the -Python language such as iterable unpacking. Patterns adopt the syntax -of parameters as far as possible and, to a somewhat lesser degree, -that of targets in iterable unpacking. In contrast to iterable -unpacking, pattern matching is a 'conditional' feature that has to -avoid side-effects, i.e. extracting elements from an abstract iterable -(thus working with actual sequences instead) or assigning to non-local -targets such as object attributes or container elements. Overall, we -followed the guiding principle that patterns be static templates for -the structure and type of objects, i.e. patterns should depend as -little as possible on the surrounding context or current values of -variables (other than the subject to be matched, that is). - -Pattern matching is a structure that *maps* different patterns/templates -to 'function bodies' or actions. This general mapping structure can be -found in different context as well. Algol-derived languages usually -provide a switch table that maps ordinal values to actions, whereas Lisp -has a more general mapping from general conditions to actions. Although -all these constructs share a similar overall structure and some syntax, -their intents and motivation differs highly. In particular, pattern -matching as proposed here is not intended as or an extension of a switch -structure, although it is possible to emulate it to a large degree with -the syntax proposed here. - - Rationale ========= @@ -359,56 +278,27 @@ with statements. Patterns -------- -Patterns are most aptly described as a generalisation of parameters as -in function definitions. They also share some characteristics with -targets of iterable unpacking. Most importantly, however, patterns are -not expressions. A pattern cannot be evaluated or executed, it is a -static declaration of a structural template. This declarative nature -is a characteristic it shares with ``global`` statements, for instance, -but also with regular expressions or context-free grammars. - -Python's iterable unpacking can assign values to any valid target, -including attributes and subscripts. This allows you to write, e.g., -``self.x, self.y = x, y`` in a class' initialisator, or -``a[i+1], a[i] = a[i], a[i+1]`` to swap two elements in a list. The -same approach, however, does not work for patterns due to their -'conditional' nature. It is at the very core of pattern matching that -a pattern may safely fail to match a given subject and reject it. In -order for this to make sense and to reason about patterns, it is -imperative to avoid any side effects (as far as possible within the -bounds of a dynamic language). Patterns can therefore not assign -values to arbitrary targets, but rather bind *local* variables to -values extracted from the data provided. - -Another consequence of the static declarative nature of patterns is that -they cannot contain expressions. Nonetheless, as some structures are -discerned by specific *values* (e.g., an object for 'addition' might be -discerned by the ``operator`` field holding the string value ``'+'``), -patterns can contain such values/constants. The overall rules, however, -specifically exclude actual expressions and make sure that only specific -values are integrated into patterns. The value ``-3``, for instance, is -syntactically interpreted as the expression comprising the unary operation -'negate' applied to the positive integer '3' (i.e. Python's syntax does -not support negative numbers as atomic literals). The overall syntax of -patterns is carefully crafted to ensure that entities such as negative -numbers can be included despite the exclusion of expressions in general. - -Nonetheless, it is desirable to express some constant values through named -constants. ``HttpStatus.OK``, for instance, might be much more readable -than the plain number ``200``. This poses a challenge, though, because -the Python compiler cannot infer reliable from context, which names are -meant to denote variables/parameters and which are meant to denote named -constants. Noting that many meaningful constants are organised in specific -modules or enumerations, we follow a pragmatic approach here and interpret -any dotted names as constants (recall that assignments to attributes are -not possible because of side effects, anyway). We acknowledge that this -rule may seem restrictive as it leaves out support for named constants -coming from the current namespace. However, all alternatives turned out -to either introduce much more complex rules or additional syntax. We would -also like to emphasise that better syntactic support for named constants -could still be added in future proposals, thus warranting our focus on a -minimal viable specficiation. - +Patterns fulfill two purposes: they impose (structural) constraints on +the subject and they specify which data values should be extracted from +the subject and bound to variables. In iterable unpacking, which can be +seen as a prototype to pattern matching in Python, there is only one +*structural pattern* to express sequences while there is a rich set of +*binding patterns* to assign a value to a specific variable or field. +Full pattern matching differs from this in that there is more variety +in structual patterns but only a minimum of binding patterns. + +Patterns differ from assignment targets (as in iterable unpacking) in that +they impose additional constraints on the structure of the subject and in +that a subject might savely fail to match a specific pattern at any point +(in iterable unpacking, this constitutes an error). The latter means that +pattern should avoid side effects wherever possible, including binding +values to attributes or subscripts. + +Although the structural patterns might superficially look like expressions, +it is important to keep in mind that there is a clear distinction. In fact, +no pattern is or contains an expression. It is more productive to think of +patterns as declarative elements similar to the formal parameters in a +function definition. .. _capture_pattern: @@ -417,10 +307,11 @@ Capture Patterns ~~~~~~~~~~~~~~~~ Capture patterns take on the form of a name that accepts any value and binds -it to a (local) variable. In that sense, a simple capture pattern is -basically equivalent to a parameter in a function definition (when the -function is called, eacg parameter binds the respective argument to a local -variable in the function's scope). +it to a (local) variable (unless the name is declared as ``nonlocal`` or +``global``). In that sense, a simple capture pattern is basically equivalent +to a parameter in a function definition (when the function is called, each +parameter binds the respective argument to a local variable in the function's +scope). A name used for a capture pattern must not coincide with another capture pattern in the same pattern. This, again, is similar to parameters, which @@ -451,18 +342,28 @@ anything else. Without a wildcard, it would become necessary to 'invent' a number of local variables, which would be bound but never used. Even when sticking to naming conventions and using ``__1, __2, __3`` to name irrelevant values, say, this still introduces visual clutter and can hurt -performance (compare the sqeuence pattern ``(x, y, *z)`` to ``(_, y, *_)``, +performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``, where the ``*z`` forces the interpreter to copy a potentially very long sequence, whereas the second version simply compiles to code along the lines of ``y = seq[1]``). There has been much discussion about the choice of the underscore as ``_`` -as a wildcard pattern, i.e. making this one name not-binding. However, the +as a wildcard pattern, i.e. making this one name non-binding. However, the underscore is already heavily used as an 'ignore value' marker in iterable unpacking. Since the wildcard pattern ``_`` never binds, this use of the underscore does not interfere with other uses such as inside the REPL or internationalisation packages. +It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*`` +(star) as a wildcard. However, both these look as if an arbitrary number +of items is omitted:: + + case [a, ..., z]: ... + case [a, *, z]: ... + +Both look like the would match a sequence of at two or more items, +capturing the first and last values. + Finally note that the underscore is as a wildcard pattern in *every* programming language with pattern matching that we could find. Keeping in mind that many users of Python also work with other programming @@ -497,9 +398,9 @@ equality (``x == y`` in Python syntax). Consequently, the literal patterns ``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:`` and ``case 1:`` are fully interchangable. In principle, ``True`` would also match the same set of objects because ``True == 1`` holds. However, we -believe that many users would be surprised findings that ``case True:`` +believe that many users would be surprised finding that ``case True:`` matched the object ``1.0``, resulting in some subtle bugs and convoluted -work arounds. We therefore adopted the rule that the three singleton +workarounds. We therefore adopted the rule that the three singleton objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in Python syntax) rather than equality. Hence, ``case True:`` will match only ``True`` and nothing else. Note that ``case 1:`` would still match ``True``, @@ -550,16 +451,17 @@ interpreted as a constant value pattern as exemplified by ``HttpStatus.OK`` above. This excludes, in particular, local variables from acting as constants. -Global variables can only be directly used as constent when defined in other +Global variables can only be directly used as constant when defined in other modules, although there are work arounds to access the current module as a namespace as well. A proposed rule to use a leading dot (e.g. ``.CONSTANT``) for that purpose was critisised because it was felt that the dot would not be a visible-enough marker for that purpose. Partly inspired by use cases in other programming languages, a number of different -markers/sigils was proposed (such as ``^CONSTANT``, ``$CONSTANT``, -``==CONSTANT``, etc.), although there was no obvious or natural choice. -The current proposal therefore leaves the discussion and possible -introduction of such a 'constant' marker for future PEPs. +markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``, +``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although +there was no obvious or natural choice. The current proposal therefore +leaves the discussion and possible introduction of such a 'constant' marker +for future PEPs. Distinguishing the semantics of names based on whether it is a global variable (i.e. the compiler would treat global variables as constants rather @@ -741,8 +643,8 @@ along the lines of:: Even though such a strict separation of case clauses into independent functions does not make sense in Python, we find that patterns share many -syntactic rules with parameters, such as binding arguments to local -variables only or that variable/parameter names must not be repeated for +syntactic rules with parameters, such as binding arguments to unqualified +names only or that variable/parameter names must not be repeated for a particular pattern/function. With its emphasis on abstraction and encapsulation, object-oriented @@ -759,8 +661,8 @@ original tuple constructors. In a pattern like ``Node(left, right)``, ``Node`` is no longer a passive tag, but rather a function that can actively check for any given object whether it has the right structure and extract a ``left`` and ``right`` field. In other words: the ``Node``-tag becomes a -function that transforms an object into a tuple or returns ``None`` to -indicate that it is not possible. +function that transforms an object into a tuple or returns some failure +indicator if it is not possible. In Python, we simply use ``isinstance()`` together with the ``__match_args__`` field of a class to check whether an object has the correct structure and From 8cf80e748fd5e4e6afe776283e2fd3e1619d5e2c Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Tue, 29 Sep 2020 11:51:41 -0700 Subject: [PATCH 47/54] Apply various tweaks I proposed None of these should be controversial. --- pep-0635.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 2cf4db50a22..e83259e0c1f 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -139,7 +139,7 @@ Overview and terminology The ``match`` statement ----------------------- -The match statement evaluates an expression to yield a subject, finds the +The match statement evaluates an expression to produce a subject, finds the first pattern that matches the subject and executes the associated block of code. Syntactically, the match statement thus takes an expression and a sequence of case clauses, where each case clause comprises a pattern and @@ -289,7 +289,7 @@ in structual patterns but only a minimum of binding patterns. Patterns differ from assignment targets (as in iterable unpacking) in that they impose additional constraints on the structure of the subject and in -that a subject might savely fail to match a specific pattern at any point +that a subject might safely fail to match a specific pattern at any point (in iterable unpacking, this constitutes an error). The latter means that pattern should avoid side effects wherever possible, including binding values to attributes or subscripts. @@ -334,13 +334,13 @@ Wildcard Pattern The wildcard pattern is a special case of a 'capture' pattern: it accepts any value, but does not bind it to a variable. The idea behind this rule is to support repeated use of the wildcard in patterns. While ``(x, x)`` -constitutes an error, ``(_, _)`` is legal. +is an error, ``(_, _)`` is legal. Particularly in larger (sequence) patterns, it is important to allow the pattern to concentrate on values with actual significance while ignoring anything else. Without a wildcard, it would become necessary to 'invent' a number of local variables, which would be bound but never used. Even -when sticking to naming conventions and using ``__1, __2, __3`` to name +when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name irrelevant values, say, this still introduces visual clutter and can hurt performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``, where the ``*z`` forces the interpreter to copy a potentially very long @@ -352,7 +352,7 @@ as a wildcard pattern, i.e. making this one name non-binding. However, the underscore is already heavily used as an 'ignore value' marker in iterable unpacking. Since the wildcard pattern ``_`` never binds, this use of the underscore does not interfere with other uses such as inside the REPL or -internationalisation packages. +the ``gettext`` module. It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*`` (star) as a wildcard. However, both these look as if an arbitrary number @@ -370,7 +370,7 @@ in mind that many users of Python also work with other programming languages, have prior experience when learning Python, or moving on to other languages after having learnt Python, we find that such well established standards are important and relevant with respect to -readability and learnability. Moreover, concerns that this wildcard +readability and learnability. In our view, concerns that this wildcard means that a regular name received special treatment are not strong enough to introduce syntax that would make Python special. @@ -431,8 +431,8 @@ Constant Value Patterns It is good programming style to use named constants for parametric values or to clarify the meaning of particular values. Clearly, it would be desirable to also write ``case (HttpStatus.OK, body):`` rather than -``case (200, body):``, say. The main issue that arises here is how to -discern capture patterns (variables) and constant value patterns. The +``case (200, body):``, for example. The main issue that arises here is how to +distinguish capture patterns (variables) from constant value patterns. The general discussion surrounding this issue has brought forward a plethora of options, which we cannot all fully list here. @@ -447,12 +447,12 @@ constants syntactically. However, the idea of using upper vs. lower case as a marker has been met with scepticism since there is no similar precedence in core Python (although it is common in other languages). We therefore only adopted the rule that any dotted name (i.e. attribute access) is to be -interpreted as a constant value pattern as exemplified by ``HttpStatus.OK`` -above. This excludes, in particular, local variables from acting as +interpreted as a constant value pattern like ``HttpStatus.OK`` +above. This precludes, in particular, local variables from acting as constants. Global variables can only be directly used as constant when defined in other -modules, although there are work arounds to access the current module as a +modules, although there are workarounds to access the current module as a namespace as well. A proposed rule to use a leading dot (e.g. ``.CONSTANT``) for that purpose was critisised because it was felt that the dot would not be a visible-enough marker for that purpose. Partly inspired From cdab17dc95e8b4791251852f0a180b2d2be72fa5 Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Wed, 30 Sep 2020 22:54:22 +0200 Subject: [PATCH 48/54] Added sequence patterns --- pep-0635.rst | 150 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 111 insertions(+), 39 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index e83259e0c1f..f83387adb20 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -171,7 +171,7 @@ proposals of *flat indentation schemes* come at the expense of violating Python's establish structural paradigm, leading to additional syntactic rules: -- *Do no indent case clauses.* +- *Unindented case clauses.* The idea is to align case clauses with the ``match``, i.e.:: match expression: @@ -185,7 +185,7 @@ rules: neither follow the syntactic scheme of simple nor composite statements but rather establish a category of its own. -- *Put the expression on a line after ``match``.* +- *Putting the expression on a separate line after ``match``.* The idea is to use the expression yielding the subject as a statement to avoid the singularity of ``match`` having no actual block despite the colons:: @@ -294,6 +294,12 @@ that a subject might safely fail to match a specific pattern at any point pattern should avoid side effects wherever possible, including binding values to attributes or subscripts. +A corner stone of pattern matching is the possibility of arbitrarily +*nesting and combining patterns*. The nesting allows for expressing deep +tree structures (for an example of nested class patterns, see the motivation +section above). At any level of the nesting, several patterns can be +combined to form alternatives. + Although the structural patterns might superficially look like expressions, it is important to keep in mind that there is a clear distinction. In fact, no pattern is or contains an expression. It is more productive to think of @@ -325,6 +331,19 @@ expressing a tuple with two equal elements (which comes with its own issues). Should the need arise, then it is still possible to introduce support for repeated use of names later on. +There were calls to explicitly mark capture patterns and thus identify them +as binding targets. According to that idea, a capture pattern would be +written as, e.g. ``?x`` or ``$x``. The aim of such explicit capture markers +is to let an unmarked name be a constant value pattern (see below). However, +this is based on the misconception that pattern matching was an extension of +*switch* statements, placing the emphasis on fast switching based on +(ordinal) values. Such a *switch* statement has indeed been proposed for +Python before (see PEP 275 and PEP 3103). Pattern matching, on the other +hand, builds a generalized concept of iterable unpacking. Binding values +extracted from a data structure is at the very core of the concept. Explicit +markers for capture patterns would thus betray the objective of the proposed +pattern matching syntax. + .. _wildcard_pattern: @@ -365,8 +384,10 @@ Both look like the would match a sequence of at two or more items, capturing the first and last values. Finally note that the underscore is as a wildcard pattern in *every* -programming language with pattern matching that we could find. Keeping -in mind that many users of Python also work with other programming +programming language with pattern matching that we could find +(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, +*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). +Keeping in mind that many users of Python also work with other programming languages, have prior experience when learning Python, or moving on to other languages after having learnt Python, we find that such well established standards are important and relevant with respect to @@ -382,16 +403,7 @@ Literal Patterns Literal patterns are a convenient way for imposing constraints on the value of a subject, rather than its type or structure. Literal patterns -even allow you to emulate a switch statement using pattern matching. On -the flipside, if you think of patterns as building on parameters and -assignment targets, literal patterns are a novel addition (i.e. you would -not write, e.g., ``(2, a, b) = c`` in iterable unpacking). - -Originally, literal patterns came from the idea of expressing unstructured -singleton objects such as ``None``: instead of requiring that a subject has -type ``NoneType``, it makes much more sense to directly write ``None``. -More generally, literal patterns could also be seen as syntactic sugar for -guards. Rather than ``case x if x == 0:``, you can simply write ``case 0:``. +even allow you to emulate a switch statement using pattern matching. Generally, the subject is compared to a literal pattern by means of standard equality (``x == y`` in Python syntax). Consequently, the literal patterns @@ -408,16 +420,24 @@ though, because the literal pattern ``1`` works by equality and not identity. Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would match both the integer ``1`` and the floating point number ``1.0``, whereas -``case 1:`` would only match the integer ``1`` very eventually dropped in -favour of the simpler and consistent rule based on equality. +``case 1:`` would only match the integer ``1`` were eventually dropped in +favour of the simpler and consistent rule based on equality. Moreover, any +additional checks whether the subject is an instance of ``numbers.Integral`` +would come at a high runtime cost to introduce what would essentially be +novel in Python. When needed, the explicit syntax ``case int(1):`` might +be used. Recall that literal patterns are *not* expressions, but directly denote a specific value or object. From a syntactical point of view, we have to ensure that negative and complex numbers can equally be used as patterns, -although they are not atomic literal values (i.e. ``-3+4j`` is syntactically -an expression of the form ``BinOp(UnaryOp('-', 3), '+', 4j)``). -Interpolated *f*-strings, on the other hand, are not literal values, despite -their appearance and can therefore not be used as literal patterns. +although they are not atomic literal values (i.e. the seeming literal value +``-3+4j`` would syntactically be an expression of the form +``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of +patterns, we added syntactic support for such complex value literals without +having to resort to full expressions). Interpolated *f*-strings, on the +other hand, are not literal values, despite their appearance and can +therefore not be used as literal patterns (string concatenation, however, +is supported). Literal patterns not only occur as patterns in their own right, but also as keys in *mapping patterns*. @@ -475,6 +495,8 @@ patterns impossible. Group Patterns ~~~~~~~~~~~~~~ +Allowing users to explicitly specify the grouping is particularly helpful +in case of alternatives or sequence patterns written as tuples. .. _sequence_pattern: @@ -482,6 +504,47 @@ Group Patterns Sequence Patterns ~~~~~~~~~~~~~~~~~ +Sequence patterns follow as closely as possible the already established +syntax and semantics of iterable unpacking. Of course, subpatterns take +the place of assignment targets (variables, attributes and subscript). +Moreover, the sequence pattern only matches a narrow set of possible +subjects, whereas iterable unpacking can be applied to any iterable. + +- As in iterable unpacking, we do not distinguish between 'tuple' and + 'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all + equivalent. + +- A starred pattern will capture a sub-sequence of arbitrary length, + mirroring iterable unpacking as well. Only one starred item may be + present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)`` + could be understood as expressing any sequence containing the value ``3``. + In practise, however, this would only work for a very narrow set of use + cases and lead to inefficient backtracking or even ambiguities otherwise. + +- The sequence pattern does *not* iterate through an iterable subject. All + elements are accessed through subscripting and slicing, and the subject must + be an instance of ``collections.abc.Sequence`` (including, in particular, + lists and tuples, but excluding strings and bytes). + +A sequence pattern cannot just iterate through any iterable object. The +consumation of elements from the iteration would have to be undone if the +overall pattern fails, which is not possible. + +Relying on ``len()`` and subscripting and slicing alone does not work to +identify sequences because sequences share the protocol with more general +maps (dictionaries) in this regard. It would be surprising if a sequence +pattern also matched dictionaries or other custom objects that implement +the mapping protocol (i.e. ``__getitem__``). The interpreter therefore +performs an instance check to ensure that the subject in question really +is a sequence (of known type). + +String and bytes objects have a dual nature: they are both 'atomic' objects +in their own right, as well as sequences (with a strongly recursive nature +in that a string is a sequence of strings). The typical behaviour and use +cases for strings and bytes seems different enough from that of tuples and +lists to warrant a clear distinction. Strings and bytes are therefore not +matched by a sequence pattern, limiting the sequence pattern to a very +narrow and specific understanding of 'sequence'. .. _mapping_pattern: @@ -491,6 +554,7 @@ Mapping Patterns + .. _class_pattern: Class Patterns @@ -504,12 +568,20 @@ static occurrences in programs. Such instance checks typically precede a subsequent access to information stored in the object, or a possible manipulation thereof. A typical pattern might be along the lines of:: - def DFS(node): + def traverse_tree(node): if isinstance(node, Node): - DFS(node.left) - DFS(node.right) + traverse_tree(node.left) + traverse_tree(node.right) elif isinstance(node, Leaf): print(node.value) + +In many cases, however, class patterns occur nested as in the example +given in the motivation:: + + if (isinstance(node, BinOp) and node.op == "+" + and isinstance(node.right, BinOp) and node.right.op == "*"): + a, b, c = node.left, node.right.left, node.right.right + # Handle a + b*c The class pattern lets you to concisely specify both an instance-check as well as relevant attributes (with possible further constraints). It is @@ -521,8 +593,8 @@ problematic with the structure of Python objects. When dealing with general Python objects, we face a potentially very large number of unordered attributes: an instance of ``Node`` contains a large number of attributes (most of which are 'private methods' such as, e.g., -``__repr__``). Moreover, the interpreter cannot reliable deduce which of -the attributes comes first and which is second. For an object that +``__repr__``). Moreover, the interpreter cannot reliably deduce which of +the attributes comes first and which comes second. For an object that represents a circle, say, there is no inherently obvious ordering of the attributes ``x``, ``y`` and ``radius``. @@ -547,7 +619,7 @@ an object lacks an attribute specified by the pattern, the match fails. - The class field ``__match_args__`` specifies a number of attributes together with their ordering, allowing class patterns to rely on positional sub-patterns without having to explicitly name the attributes in question. - This is particularly handy for smaller objects or instances data classes, + This is particularly handy for smaller objects or instances of data classes, where the attributes of interest are rather obvious and often have a well-defined ordering. In a way, ``__match_args__`` is similar to the declaration of formal parameters, which allows to call functions with @@ -584,20 +656,20 @@ History and Context Pattern matching emerged in the late 1970s in the form of tuple unpacking and as a means to handle recursive data structures such as linked lists or -trees (object-oriented languages use the visitor pattern for handling +trees (object-oriented languages usually use the visitor pattern for handling recursive data structures). The early proponents of pattern matching -organised structured data in 'tagged tuples' rather than ``struct``s as in -_C_ or the objects introduced later. A node in a binary tree would, for +organised structured data in 'tagged tuples' rather than ``struct`` as in +*C* or the objects introduced later. A node in a binary tree would, for instance, be a tuple with two elements for the left and right branches, -respectively, and a ``Node``-tag, written as ``Node(left, right)``. In +respectively, and a ``Node`` tag, written as ``Node(left, right)``. In Python we would probably put the tag inside the tuple as ``('Node', left, right)`` or define a data class `Node` to achieve the same effect. -Using modern syntax, a depth-first search (DFS) would then be written as +Using modern syntax, a depth-first tree traversal would then be written as follows:: - def DFS(node): + def traverse_tree(node): node match: case Node(left, right): DFS(left) @@ -606,7 +678,8 @@ follows:: handle(value) The notion of handling recursive data structures with pattern matching -immediately gave rise to the idea of handling general recursive patterns +immediately gave rise to the idea of handling more general recursive +'patterns' (i.e. recursion beyond recursive data structures) with pattern matching. Pattern matching would thus also be used to define recursive functions such as:: @@ -623,7 +696,7 @@ As pattern matching was repeatedly integrated into new and emerging programming languages, its syntax slightly evolved and expanded. The two first cases in the ``fib`` example above could be written more succinctly as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the -underscore ``_`` was generally accepted as a wildcard, a filler where neither +underscore ``_`` was widely adopted as a wildcard, a filler where neither the structure nor value of parts of a pattern were of substance. Since the underscore is already frequently used in equivalent capacity in Python's iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these @@ -634,12 +707,11 @@ closely linked to the concept of functions. The different case clauses have always been considered as something like semi-indepedent functions where pattern variables take on the role of parameters. This becomes most apparent when pattern matching is written as an overloaded function, -along the lines of:: +along the lines of (Standard ML):: - def fib( 0 | 1 ): - return 1 - def fib( n ): - return fib(n-1) + fib(n-2) + fun fib 0 = 1 + | fib 1 = 1 + | fib n = fib (n-1) + fib (n-2) Even though such a strict separation of case clauses into independent functions does not make sense in Python, we find that patterns share many From d6fdb240a825fc6832427c756727f3f0e9635c17 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Wed, 30 Sep 2020 20:52:59 -0700 Subject: [PATCH 49/54] More target->subject --- pep-0634.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 024516d0dc1..b4a366f6ec2 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -190,8 +190,8 @@ Syntax:: (Note: the name on the left may not be ``_``.) A walrus pattern matches the OR pattern on the right of the ``:=`` -operator against the target. If this fails, the walrus pattern fails. -Otherwise, the walrus pattern binds the target to the name on the left +operator against the subject. If this fails, the walrus pattern fails. +Otherwise, the walrus pattern binds the subject to the name on the left of the ``:=`` operator and succeeds. @@ -207,7 +207,7 @@ this is called an OR pattern. (A single closed pattern is just that.) Each subpattern must bind the same set of names. -An OR pattern matches each of its subpatterns in turn to the target, +An OR pattern matches each of its subpatterns in turn to the subject, until one succeeds. The OR pattern is then deemed to succeed. If none of the subpatterns succeed the OR pattern fails. @@ -448,7 +448,7 @@ A class pattern may not repeat the same keyword multiple times. If ``name_or_attr`` is not an instance of the builtin ``type``, ``TypeError`` is raised. -A class pattern fails if the target is not an instance of ``name_or_attr``. +A class pattern fails if the subject is not an instance of ``name_or_attr``. This is tested using ``isinstance()``. If no arguments are present, the pattern succeeds if the ``isinstance()`` @@ -457,7 +457,7 @@ check succeeds. Otherwise: - If only keyword patterns are present, they are processed as follows, one by one: - - The keyword is looked up as an attribute on the target. + - The keyword is looked up as an attribute on the subject. - If this raises an exception other than ``AttributeError``, the exception bubbles up. @@ -480,7 +480,7 @@ as follows: - For a number of built-in types (specified below), a single positional subpattern is accepted which will match - the entire target; for these types no keyword patterns are accepted. + the entire subject; for these types no keyword patterns are accepted. - The equivalent of ``getattr(cls, "__match_args__", ()))`` is called. - If this raises an exception the exception bubbles up. - If the returned value is not a list or tuple, the conversion fails From a0be30004021fa3a7d8094d2c23b0e80c6caee6f Mon Sep 17 00:00:00 2001 From: Tobias Kohn Date: Thu, 1 Oct 2020 12:17:53 +0200 Subject: [PATCH 50/54] Added some more patterns and examples --- pep-0635.rst | 242 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 222 insertions(+), 20 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index f83387adb20..fd718c5aebe 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -214,7 +214,7 @@ PEP, a noticeable improvement in code brevity is observed, more than making up for the additional indentation level. -*Make it an expression.* Some suggestions centered around the idea of +*Statement v Expression.* Some suggestions centered around the idea of making ``match`` an expression rather than a statement. However, this would fit poorly with Python's statement-oriented nature and lead to unusually long and complex expressions with the need to invent new @@ -294,11 +294,10 @@ that a subject might safely fail to match a specific pattern at any point pattern should avoid side effects wherever possible, including binding values to attributes or subscripts. -A corner stone of pattern matching is the possibility of arbitrarily -*nesting and combining patterns*. The nesting allows for expressing deep +A cornerstone of pattern matching is the possibility of arbitrarily +*nesting patterns*. The nesting allows for expressing deep tree structures (for an example of nested class patterns, see the motivation -section above). At any level of the nesting, several patterns can be -combined to form alternatives. +section above) as well as alternatives. Although the structural patterns might superficially look like expressions, it is important to keep in mind that there is a clear distinction. In fact, @@ -307,6 +306,109 @@ patterns as declarative elements similar to the formal parameters in a function definition. +Walrus patterns +~~~~~~~~~~~~~~~ + + + +OR patterns +~~~~~~~~~~~ + +The OR pattern allows you to combine 'structurally equivalent' alternatives +into a new pattern, i.e. several patterns can share a common handler. If any +one of an OR pattern's subpatterns matches the given subject, the entire OR +pattern succeeds. + +Statically typed languages prohibit the binding of names (capture patterns) +inside an OR pattern because of potential conflicts concerning the types of +variables. As a dynamically typed language, Python can be less restrictive +here and allow capture patterns inside OR patterns. However, each subpattern +must bind the same set of variables so as not to leave potentially undefined +names. With two alternatives ``P | Q``, this means that if *P* binds the +variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*. + +There was some discussion on whether to use the bar ``|`` or the keyword +``or`` in order to separate alternatives. The OR pattern does not fully fit +the existing semantics and usage of either of these two symbols. However, +``|`` is the symbol of choice in all programming languages with support of +the OR pattern and is even used in that capacity for regular expressions in +Python as well. Moreover, ``|`` is not only used for bitwise OR, but also +for set unions and dict merging (:pep:`584`). +Other alternatives were considered as well, but none of these would allow +OR-patterns to be nested inside other patterns: + +- *Using a comma*:: + + case 401, 403, 404: + print("Some HTTP error") + + This looks too much like a tuple -- we would have to find a different way + to spell tuples, and the construct would have to be parenthesized inside + the argument list of a class pattern. In general, commas already have many + different meanings in Python, we shouldn't add more. + +- *Using stacked cases*:: + + case 401: + case 403: + case 404: + print("Some HTTP error") + + This is how this would be done in *C*, using its fall-through semantics + for cases. However, we don't want to mislead people into thinking that + match/case uses fall-through semantics (which are a common source of bugs + in *C*). Also, this would be a novel indentation pattern, which might make + it harder to support in IDEs and such (it would break the simple rule "add + an indentation level after a line ending in a colon"). Finally, this + would not support OR patterns nested inside other patterns. + +- *Using ``case in`` followed by a comma-separated list*:: + + case in 401, 403, 404: + print("Some HTTP error") + + This would not work for OR patterns nested inside other patterns, like:: + + case Point(0|1, 0|1): + print("A corner of the unit square") + + +*AND and NOT patterns.* +This proposal defines an OR-pattern (|) to match one of several alternates; +why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)? +Especially given that some other languages (``F#`` for example) support +AND-patterns. + +However, it is not clear how useful this would be. The semantics for matching +dictionaries, objects and sequences already incorporates an implicit 'and': +all attributes and elements mentioned must be present for the match to +succeed. Guard conditions can also support many of the use cases that a +hypothetical 'and' operator would be used for. + +A negation of a match pattern using the operator ``!`` as a prefix would match +exactly if the pattern itself does not match. For instance, ``!(3 | 4)`` +would match anything except ``3`` or ``4``. However, there is evidence from +other languages that this is rarely useful and primarily used as double +negation ``!!`` to control variable scopes and prevent variable bindings +(which does not apply to Python). + +In the end, it was decided that this would make the syntax more complex +without adding a significant benefit. + + +Example:: + + def simplify(expr): + match expr: + case ('/', 0, 0): + return expr + case ('*' | '/', 0, _): + return 0 + case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1): + return x + return expr + + .. _capture_pattern: Capture Patterns @@ -338,11 +440,26 @@ is to let an unmarked name be a constant value pattern (see below). However, this is based on the misconception that pattern matching was an extension of *switch* statements, placing the emphasis on fast switching based on (ordinal) values. Such a *switch* statement has indeed been proposed for -Python before (see PEP 275 and PEP 3103). Pattern matching, on the other +Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other hand, builds a generalized concept of iterable unpacking. Binding values -extracted from a data structure is at the very core of the concept. Explicit -markers for capture patterns would thus betray the objective of the proposed -pattern matching syntax. +extracted from a data structure is at the very core of the concept and hence +the most common use case. Explicit markers for capture patterns would thus +betray the objective of the proposed pattern matching syntax and simplify +a secondary use case at the expense of additional syntactic clutter for +core cases. + +Example:: + + def average(*args): + match args: + case [x, y]: # captures the two elements of a sequence + return (x + y) / 2 + case [x]: # captures the only element of a sequence + return x + case []: + return 0 + case x: # captures the entire sequence + return sum(x) / len(x) .. _wildcard_pattern: @@ -383,6 +500,11 @@ of items is omitted:: Both look like the would match a sequence of at two or more items, capturing the first and last values. +A single wildcard clause (i.e. ``case _:``) is semantically equivalent to +an ``else:``. It accepts any subject without binding it to a variable or +performing any other operation. However, the wildcard pattern is in +contrast to ``else`` usable as a subpattern in nested patterns. + Finally note that the underscore is as a wildcard pattern in *every* programming language with pattern matching that we could find (including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, @@ -395,6 +517,17 @@ readability and learnability. In our view, concerns that this wildcard means that a regular name received special treatment are not strong enough to introduce syntax that would make Python special. +Example:: + + def is_closed(sequence): + match sequence: + case [_]: # any sequence with a single element + return True + case [start, *_, end]: # a sequence with at least two elements + return start == end + case _: # anything + return False + .. _literal_pattern: @@ -442,6 +575,26 @@ is supported). Literal patterns not only occur as patterns in their own right, but also as keys in *mapping patterns*. +Example:: + + def simplify(expr): + match expr: + case ('+', 0, x): + return x + case ('+' | '-', x, 0): + return x + case ('and', True, x): + return x + case ('and', False, x): + return False + case ('or', False, x): + return x + case ('or', True, x): + return True + case ('not', ('not', x)): + return x + return expr + .. _constant_value_pattern: @@ -491,12 +644,26 @@ patterns. Moreover, pattern matching could not be used directly inside a module's scope because all variables would be global, making capture patterns impossible. +Example:: + + def handle_reply(reply): + match reply: + case (HttpStatus.OK, MimeType.TEXT, body): + process_text(body) + case (HttpStatus.OK, MimeType.APPL_ZIP, body): + text = deflate(body) + process_text(text) + case (HttpStatus.MOVED_PERMANENTLY, new_URI): + resend_request(new_URI) + case (HttpStatus.NOT_FOUND): + raise ResourceNotFound() + Group Patterns ~~~~~~~~~~~~~~ Allowing users to explicitly specify the grouping is particularly helpful -in case of alternatives or sequence patterns written as tuples. +in case of OR patterns. .. _sequence_pattern: @@ -507,12 +674,16 @@ Sequence Patterns Sequence patterns follow as closely as possible the already established syntax and semantics of iterable unpacking. Of course, subpatterns take the place of assignment targets (variables, attributes and subscript). -Moreover, the sequence pattern only matches a narrow set of possible -subjects, whereas iterable unpacking can be applied to any iterable. +Moreover, the sequence pattern only matches a carefully selected set of +possible subjects, whereas iterable unpacking can be applied to any +iterable. - As in iterable unpacking, we do not distinguish between 'tuple' and 'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all - equivalent. + equivalent. While this means we have a redundant notation and checking + specifically for lists or tuples requires more effort (e.g. + ``case list([a, b, c])``), we mimick iterable unpacking as much as + possible. - A starred pattern will capture a sub-sequence of arbitrary length, mirroring iterable unpacking as well. Only one starred item may be @@ -524,10 +695,11 @@ subjects, whereas iterable unpacking can be applied to any iterable. - The sequence pattern does *not* iterate through an iterable subject. All elements are accessed through subscripting and slicing, and the subject must be an instance of ``collections.abc.Sequence`` (including, in particular, - lists and tuples, but excluding strings and bytes). + lists and tuples, but excluding strings and bytes, as well as sets and + dictionaries). A sequence pattern cannot just iterate through any iterable object. The -consumation of elements from the iteration would have to be undone if the +consumption of elements from the iteration would have to be undone if the overall pattern fails, which is not possible. Relying on ``len()`` and subscripting and slicing alone does not work to @@ -541,10 +713,12 @@ is a sequence (of known type). String and bytes objects have a dual nature: they are both 'atomic' objects in their own right, as well as sequences (with a strongly recursive nature in that a string is a sequence of strings). The typical behaviour and use -cases for strings and bytes seems different enough from that of tuples and -lists to warrant a clear distinction. Strings and bytes are therefore not -matched by a sequence pattern, limiting the sequence pattern to a very -narrow and specific understanding of 'sequence'. +cases for strings and bytes are different enough from that of tuples and +lists to warrant a clear distinction. It is in fact often unintuitive and +unintended that strings pass for sequences as evidenced by regular questions +and complaints. Strings and bytes are therefore not matched by a sequence +pattern, limiting the sequence pattern to a very specific understanding of +'sequence'. .. _mapping_pattern: @@ -552,7 +726,35 @@ narrow and specific understanding of 'sequence'. Mapping Patterns ~~~~~~~~~~~~~~~~ - +Dictionaries or mappings in general are one of the most important and most +widely used data structures in Python. In contrast to sequences mappings +are built for fast direct access to arbitrary elements (identified by a key). +In most use cases an element is retrieved from a dictionary by a known key +without regard for any ordering or other key-value pairs stored in the same +dictionary. Particularly common are string keys. + +The mapping pattern reflects the common usage of dictionary lookup: it allows +the user to extract some values from a mapping by means of constant/known +keys and have the values match given subpatterns. Moreover, the mapping +pattern does not check for the presence of additional keys. Should it be +necessary to impose an upper bound on the mapping and ensure that no +additional keys are present, then the usual double-star-pattern ``**rest`` +can be used. The special case ``**_`` with a wildcard, however, is not +supported as it would not have any effect, but might lead to a wrong +understanding of the mapping pattern's semantics. + +To avoid overly expensive matching algorithms, keys must be literals or +constant values. + +Example:: + + def change_red_to_blue(json_obj): + match json_obj: + case { 'color': ('red' | '#FF0000') }: + json_obj['color'] = 'blue' + case { 'children': children }: + for child in children: + change_red_to_blue(child) .. _class_pattern: From 402e2df7506fc7470a6fcd55c9d89161c60a67cf Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 1 Oct 2020 16:40:16 -0700 Subject: [PATCH 51/54] Resolve discrepancies between syntax sections and Appending A --- pep-0634.rst | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index b4a366f6ec2..2d6d8373d8b 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -78,7 +78,7 @@ below, apply to these cases. The ``match`` statement ----------------------- -A ``match`` statement has the following top-level syntax:: +Syntax:: match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT match_expr: @@ -181,7 +181,7 @@ The top-level syntax for patterns is as follows:: Walrus patterns ^^^^^^^^^^^^^^^ -TODO: Change to or_pattern 'as' capture_pattern (and rename)? +TODO: Change to ``or_pattern 'as' capture_pattern`` (and rename)? Syntax:: @@ -549,9 +549,6 @@ it looks beneficial. Appendix A -- Full Grammar ========================== -TODO: Double-check that the syntax sections above match what's written -here (except for trailing lookaheads). - TODO: Go over the differences with the reference implementation and resolve them (either by fixing the PEP or by fixing the reference implementation). @@ -579,7 +576,7 @@ Other notation used beyond standard EBNF: patterns: open_sequence_pattern | pattern pattern: walrus_pattern | or_pattern - walrus_pattern: NAME ':=' or_pattern + walrus_pattern: capture_pattern ':=' or_pattern or_pattern: '|'.closed_pattern+ closed_pattern: | literal_pattern @@ -627,10 +624,12 @@ Other notation used beyond standard EBNF: double_star_pattern: '**' capture_pattern class_pattern: - | name_or_attr '(' ')' - | name_or_attr '(' ','.pattern+ ','? ')' - | name_or_attr '(' ','.keyword_pattern+ ','? ')' - | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')' + | name_or_attr '(' [pattern_arguments ','?] ')' + pattern_arguments: + | positional_patterns [',' keyword_patterns] + | keyword_patterns + positional_patterns: ','.pattern+ + keyword_patterns: ','.keyword_pattern+ keyword_pattern: NAME '=' or_pattern From 698164c3f40f527e10d43e8826f0edc54b993dd4 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 1 Oct 2020 16:42:58 -0700 Subject: [PATCH 52/54] Add notes about PEP incompleteness --- pep-0634.rst | 3 +++ pep-0635.rst | 7 +++++-- pep-0636.rst | 3 +++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/pep-0634.rst b/pep-0634.rst index 2d6d8373d8b..0097995ce04 100644 --- a/pep-0634.rst +++ b/pep-0634.rst @@ -19,6 +19,9 @@ Resolution: Abstract ======== +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + This PEP provides the technical specification for the ``match`` statement. It replaces PEP 622, which is hereby split in three parts: diff --git a/pep-0635.rst b/pep-0635.rst index fd718c5aebe..8f56f027c4f 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -19,6 +19,9 @@ Resolution: Abstract ======== +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + This PEP provides the motivation and rationale for PEP 634 ("Structural Pattern Matching: Specification"). First-time readers are encouraged to start with PEP 636, which provides a gentler @@ -26,8 +29,8 @@ introduction to the concepts, syntax and semantics of patterns. -Motivation (Guido's version) -============================ +Motivation +========== (Structural) pattern matching syntax is found in many languages, from Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for diff --git a/pep-0636.rst b/pep-0636.rst index c5d0e856b13..fb72aa824a4 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -19,6 +19,9 @@ Resolution: Abstract ======== +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + This PEP is a tutorial for the pattern matching introduced by PEP 634. PEP 622 proposed syntax for pattern matching, which received detailed discussion From 71f3c214e9c9cf54df86fa77cc71bda10f86c70b Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 1 Oct 2020 16:30:09 -0700 Subject: [PATCH 53/54] Remove trailing whitespace and fix British spelling --- pep-0635.rst | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 8f56f027c4f..34820cd14c0 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -176,36 +176,36 @@ rules: - *Unindented case clauses.* The idea is to align case clauses with the ``match``, i.e.:: - + match expression: case pattern_1: ... case pattern_2: ... - + This may look awkward to the eye of a Python programmer, because everywhere else colon is followed by an indent. The ``match`` would neither follow the syntactic scheme of simple nor composite statements but rather establish a category of its own. - + - *Putting the expression on a separate line after ``match``.* The idea is to use the expression yielding the subject as a statement to avoid the singularity of ``match`` having no actual block despite the colons:: - + match: expression case pattern_1: ... case pattern_2: ... - + This was ultimately rejected because the first block would be another novelty in Python's grammar: a block whose only content is a single expression rather than a sequence of statements. Attempts to amend this issue by adding or repurposing yet another keyword along the lines of ``match: return expression`` did not yield any satisfactory solution. - + Although flat indentation would save some horizontal space, the cost of increased complexity or unusual rules is too high. It would also complicate life for simple-minded code editors. Finally, the horizontal space issue can @@ -226,7 +226,7 @@ obvious consequence of ``match`` as an expression would be that case clauses could no longer have abitrary blocks of code attached, but only a single expression. Overall, the strong limitations could in no way offset the slight simplification in some special use cases. - + Match semantics @@ -499,7 +499,7 @@ of items is omitted:: case [a, ..., z]: ... case [a, *, z]: ... - + Both look like the would match a sequence of at two or more items, capturing the first and last values. @@ -510,8 +510,8 @@ contrast to ``else`` usable as a subpattern in nested patterns. Finally note that the underscore is as a wildcard pattern in *every* programming language with pattern matching that we could find -(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, -*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). +(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, +*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). Keeping in mind that many users of Python also work with other programming languages, have prior experience when learning Python, or moving on to other languages after having learnt Python, we find that such well @@ -557,7 +557,7 @@ though, because the literal pattern ``1`` works by equality and not identity. Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would match both the integer ``1`` and the floating point number ``1.0``, whereas ``case 1:`` would only match the integer ``1`` were eventually dropped in -favour of the simpler and consistent rule based on equality. Moreover, any +favor of the simpler and consistent rule based on equality. Moreover, any additional checks whether the subject is an instance of ``numbers.Integral`` would come at a high runtime cost to introduce what would essentially be novel in Python. When needed, the explicit syntax ``case int(1):`` might @@ -567,7 +567,7 @@ Recall that literal patterns are *not* expressions, but directly denote a specific value or object. From a syntactical point of view, we have to ensure that negative and complex numbers can equally be used as patterns, although they are not atomic literal values (i.e. the seeming literal value -``-3+4j`` would syntactically be an expression of the form +``-3+4j`` would syntactically be an expression of the form ``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of patterns, we added syntactic support for such complex value literals without having to resort to full expressions). Interpolated *f*-strings, on the @@ -715,7 +715,7 @@ is a sequence (of known type). String and bytes objects have a dual nature: they are both 'atomic' objects in their own right, as well as sequences (with a strongly recursive nature -in that a string is a sequence of strings). The typical behaviour and use +in that a string is a sequence of strings). The typical behavior and use cases for strings and bytes are different enough from that of tuples and lists to warrant a clear distinction. It is in fact often unintuitive and unintended that strings pass for sequences as evidenced by regular questions @@ -779,7 +779,7 @@ manipulation thereof. A typical pattern might be along the lines of:: traverse_tree(node.right) elif isinstance(node, Leaf): print(node.value) - + In many cases, however, class patterns occur nested as in the example given in the motivation:: @@ -817,7 +817,7 @@ an object lacks an attribute specified by the pattern, the match fails. flow from left to right seems unusual, but is in line with mapping patterns and has precedents such as assignments via ``as`` in *with*- or *import*-statements. - + Naming the attributes in question explicitly will be mostly used for more complex cases where the positional form (below) is insufficient. @@ -837,7 +837,7 @@ any Python construct, be assignment targets, function definitions or iterable unpacking. In all these cases, we find that the syntax for sending and that for receiving 'data' are virtually identical. -- Assignment targets such as variables, attributes and subscripts: +- Assignment targets such as variables, attributes and subscripts: ``foo.bar[2] = foo.bar[3]``; - Function definitions: a function defined with ``def foo(x, y, z=6)`` @@ -938,7 +938,7 @@ original tuple constructors. In a pattern like ``Node(left, right)``, ``Node`` is no longer a passive tag, but rather a function that can actively check for any given object whether it has the right structure and extract a ``left`` and ``right`` field. In other words: the ``Node``-tag becomes a -function that transforms an object into a tuple or returns some failure +function that transforms an object into a tuple or returns some failure indicator if it is not possible. In Python, we simply use ``isinstance()`` together with the ``__match_args__`` From 6f18499b154baa61babec653275ce25fc151f439 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Thu, 1 Oct 2020 16:57:46 -0700 Subject: [PATCH 54/54] Fix more trailing whitespace --- pep-0635.rst | 2 +- pep-0636.rst | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/pep-0635.rst b/pep-0635.rst index 34820cd14c0..e2f7cd2c8c6 100644 --- a/pep-0635.rst +++ b/pep-0635.rst @@ -684,7 +684,7 @@ iterable. - As in iterable unpacking, we do not distinguish between 'tuple' and 'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all equivalent. While this means we have a redundant notation and checking - specifically for lists or tuples requires more effort (e.g. + specifically for lists or tuples requires more effort (e.g. ``case list([a, b, c])``), we mimick iterable unpacking as much as possible. diff --git a/pep-0636.rst b/pep-0636.rst index fb72aa824a4..8ac56d41f71 100644 --- a/pep-0636.rst +++ b/pep-0636.rst @@ -103,7 +103,7 @@ You can use a matching statement instead:: match command.split(): case [action, obj]: - ... # interpret action, obj + ... # interpret action, obj The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks it against the **pattern** next to ``case``. A pattern is able to do two different @@ -112,7 +112,7 @@ things: * Verify that the subject has certain structure. In your case, the ``[action, obj]`` pattern matches any sequence of exactly two elements. This is called **matching** * It will bind some names in the pattern to component elements of your subject. In - this case, if the list has two elements, it will bind ``action = subject[0]`` and + this case, if the list has two elements, it will bind ``action = subject[0]`` and ``obj = subject[1]``. This is called **destructuring** If there's a match, the statements inside the ``case`` clause will be executed with the @@ -134,13 +134,13 @@ of different lengths. For example you might want to add single verbs with no obj case [action]: ... # interpret single-verb action case [action, obj]: - ... # interpret action, obj + ... # interpret action, obj The ``match`` statement will check patterns from top to bottom. If the pattern doesn't match the subject, the next pattern will be tried. However, once the *first* matching ``case`` clause is found, the body of that clause is executed, and all further ``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...`` -statement works. +statement works. Matching specific values ------------------------ @@ -168,7 +168,7 @@ A pattern like ``["get", obj]`` will match only 2-element sequences that have a element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``. As you can see in the ``go`` case, we also can use different variable names in -different patterns. +different patterns. FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it literally, and a "soft constant" will not work :) @@ -178,7 +178,7 @@ Matching slices A player may be able to drop multiple objects by using a series of commands ``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and -you might like to allow dropping multiple items in a single command, like +you might like to allow dropping multiple items in a single command, like ``drop key sword cheese``. In this case you don't know beforehand how many words will be in the command, but you can use extended unpacking in patterns in the same way that they are allowed in assignments:: @@ -255,19 +255,19 @@ patterns) that we've seen: Until now, the only non-simple pattern we have experimented with is the sequence pattern. Each element in a sequence pattern can in fact be -any other pattern. This means that you could write a pattern like +any other pattern. This means that you could write a pattern like ``["first", (left, right), *rest]``. This will match subjects which are a sequence of at -least two elements, where the first one is equal to ``"first"`` and the second one is -in turn a sequence of two elements. It will also bind ``left=subject[1][0]``, +least two elements, where the first one is equal to ``"first"`` and the second one is +in turn a sequence of two elements. It will also bind ``left=subject[1][0]``, ``right=subject[1][1]``, and ``rest = subject[2:]`` Alternate patterns ------------------ Going back to the adventure game example, you may find that you'd like to have several -patterns resulting in the same outcome. For example, you might want the commands -``north`` and ``go north`` be equivalent. You may also desire to have aliases for -``get X``, ``pick up X`` and ``pick X up`` for any X. +patterns resulting in the same outcome. For example, you might want the commands +``north`` and ``go north`` be equivalent. You may also desire to have aliases for +``get X``, ``pick up X`` and ``pick X up`` for any X. The ``|`` symbol in patterns combines them as alternatives. You could for example write:: @@ -331,7 +331,7 @@ case-clause. Guards consist of the ``if`` keyword followed by any expression:: case ["go", direction] if direction in current_room.exits: current_room = current_room.neighbor(direction) case ["go", _]: - print("Sorry, you can't go that way") + print("Sorry, you can't go that way") The guard is not part of the pattern, it's part of the case clause. It's only checked if the pattern matches, and after all the pattern variables have been bound (that's why the @@ -342,7 +342,7 @@ next ``case`` clause as if the pattern hadn't matched (with the possible side-ef having already bound some variables). The sequence of these steps must be considered carefully when combining or-patterns and -guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is +guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is ``[0, 100]``, the clause will be skipped. This happens because: * The or-pattern finds the first alternative that matches the subject, which happens to