diff --git a/pep-0622.rst b/pep-0622.rst index 3a56daf19955..ec0c09c2f53a 100644 --- a/pep-0622.rst +++ b/pep-0622.rst @@ -10,12 +10,13 @@ Author: Brandt Bucher , Talin BDFL-Delegate: Discussions-To: Python-Dev -Status: Draft +Status: Superseded Type: Standards Track Content-Type: text/x-rst Created: 23-Jun-2020 Python-Version: 3.10 Post-History: 23-Jun-2020, 8-Jul-2020 +Superseded-By: 634 Resolution: diff --git a/pep-0634.rst b/pep-0634.rst new file mode 100644 index 000000000000..0097995ce040 --- /dev/null +++ b/pep-0634.rst @@ -0,0 +1,653 @@ +PEP: 634 +Title: Structural Pattern Matching: Specification +Version: $Revision$ +Last-Modified: $Date$ +Author: Brandt Bucher , + Guido van Rossum +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Replaces: 622 +Resolution: + + +Abstract +======== + +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + +This PEP provides the technical specification for the ``match`` +statement. It replaces PEP 622, which is hereby split in three parts: + +- PEP 634: Specification +- PEP 635: Motivation and Rationale +- PEP 636: Tutorial + +This PEP is intentionally devoid of commentary; all explanations of +design choices are in PEP 635. First-time readers are encouraged to +start with PEP 636, which provides a gentler introduction to the +concepts, syntax and semantics of patterns. + +TODO: Maybe we should add simple examples back to each section? +There's no rule saying a spec can't include examples, and currently +it's *very* dry. + +TODO: Go over the feedback from the SC and make sure everything's +somehow incorporated (either here or in PEP 635, which has to answer +why we didn't budge on most of the SC's initial requests). + + +Syntax and Semantics +==================== + +See `Appendix A`_ for the complete grammar. + +Overview and terminology +------------------------ + +The pattern matching process takes as input a pattern (following +``case``) and a subject value (following ``match``). Phrases to +describe the process include "the pattern is matched with (or against) +the subject value" and "we match the pattern against (or with) the +subject value". + +The primary outcome of pattern matching is success or failure. In +case of success we may say "the pattern succeeds", "the match +succeeds", or "the pattern matches the subject value". + +In many cases a pattern contains subpatterns, and success or failure +is determined by the success or failure of matching those subpatterns +against the value (e.g., for OR patterns) or against parts of the +value (e.g., for sequence patterns). This process typically processes +the subpatterns from left to right until the overall outcome is +determined. E.g., an OR pattern succeeds at the first succeeding +subpattern, while a sequence patterns fails at the first failing +subpattern. + +A secondary outcome of pattern matching may be one or more name +bindings. We may say "the pattern binds a value to a name". When +subpatterns tried until the first success, only the bindings due to +the successful subpattern are valid; when trying until the first +failure, the bindings are merged. Several more rules, explained +below, apply to these cases. + + +The ``match`` statement +----------------------- + +Syntax:: + + match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT + match_expr: + | star_named_expression ',' star_named_expressions? + | named_expression + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + +The rules ``star_named_expression``, ``star_named_expressions``, +``named_expression`` and ``block`` are part of the `standard Python +grammar `_. + +The rule ``patterns`` is specified below. + +For context, ``match_stmt`` is a new alternative for +``compound_statement``:: + + compound_statement: + | if_stmt + ... + | match_stmt + + +The ``match`` and ``case`` keywords are soft keywords, i.e. they are +not reserved words in other grammatical contexts (including at the +start of a line if there is no colon where expected). This implies +that they are recognized as keywords when part of a ``match`` +statement or ``case`` block only, and are allowed to be used in all +other context as variable or argument names. + + +Match semantics +^^^^^^^^^^^^^^^ + +TODO: Make the language about choosing a block more precise. + +The overall semantics for choosing the match is to choose the first +matching pattern (including guard) and execute the corresponding +block. The remaining patterns are not tried. If there are no +matching patterns, execution continues at the following statement. + +Name bindings made during a successful pattern match outlive the +executed block and can be used after the ``match`` statement. + +During failed pattern matches, some subpatterns may succeed. For +example, while matching the pattern ``(0, x, 1)`` with the value ``[0, +1, 2]``, the subpattern ``x`` may succeed if the list elements are +matched from left to right. The implementation may choose to either +make persistent bindings for those partial matches or not. User code +including a ``match`` statement should not rely on the bindings being +made for a failed match, but also shouldn't assume that variables are +unchanged by a failed match. This part of the behavior is left +intentionally unspecified so different implementations can add +optimizations, and to prevent introducing semantic restrictions that +could limit the extensibility of this feature. + +The precise pattern binding rules vary per pattern type and are +specified below. + + +.. _guards: + +Guards +^^^^^^ + +Syntax:: + + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + +If a guard is present on a case block, once all patterns succeed, +the expression in the guard is evaluated. +If this raises an exception, the exception bubbles up. +Otherwise, if the condition is "truthy" the block is selected; +if it is "falsy" the next case block (if any) is tried. + + +.. _patterns: + +Patterns +-------- + +The top-level syntax for patterns is as follows:: + + patterns: open_sequence_pattern | pattern + pattern: walrus_pattern | or_pattern + walrus_pattern: capture_pattern ':=' or_pattern + or_pattern: '|'.closed_pattern+ + closed_pattern: + | literal_pattern + | capture_pattern + | wildcard_pattern + | constant_pattern + | group_pattern + | sequence_pattern + | mapping_pattern + | class_pattern + + +Walrus patterns +^^^^^^^^^^^^^^^ + +TODO: Change to ``or_pattern 'as' capture_pattern`` (and rename)? + +Syntax:: + + walrus_pattern: capture_pattern ':=' or_pattern + +(Note: the name on the left may not be ``_``.) + +A walrus pattern matches the OR pattern on the right of the ``:=`` +operator against the subject. If this fails, the walrus pattern fails. +Otherwise, the walrus pattern binds the subject to the name on the left +of the ``:=`` operator and succeeds. + + +OR patterns +^^^^^^^^^^^ + +Syntax:: + + or_pattern: '|'.closed_pattern+ + +When two or more patterns are separated by vertical bars (``|``), +this is called an OR pattern. (A single closed pattern is just that.) + +Each subpattern must bind the same set of names. + +An OR pattern matches each of its subpatterns in turn to the subject, +until one succeeds. The OR pattern is then deemed to succeed. +If none of the subpatterns succeed the OR pattern fails. + + +.. _literal_pattern: + +Literal Patterns +^^^^^^^^^^^^^^^^ + +Syntax:: + + literal_pattern: + | signed_number + | signed_number '+' NUMBER + | signed_number '-' NUMBER + | strings + | 'None' + | 'True' + | 'False' + signed_number: NUMBER | '-' NUMBER + +The rule ``strings`` and the token ``NUMBER`` are defined in the +standard Python grammar. + +Triple-quoted strings are supported. Raw strings and byte strings +are supported. F-strings are not supported. + +The forms ``signed_number '+' NUMBER`` and ``signed_number '-' +NUMBER`` are only permitted to express complex numbers; they require a +real number on the left and an imaginary number on the right. + +A literal pattern succeeds if the subject value compares equal to the +value expressed by the literal, using the following comparisons rules: + +- Numbers and strings are compared using the ``==`` operator. + +- The singleton literals ``None``, ``True`` and ``False`` are compared + using the ``is`` operator. + + +.. _capture_pattern: + +Capture Patterns +^^^^^^^^^^^^^^^^ + +Syntax:: + + capture_pattern: !"_" NAME + +The single underscore (``_``) is not a capture pattern (this is what +``!"_"`` expresses). It is treated as a `wildcard pattern`_. + +A capture pattern always succeeds. It binds the subject value to the +name using the scoping rules for name binding established for the +walrus operator in PEP 572. (Summary: the name becomes a local +variable in the closest containing function scope unless there's an +applicable ``nonlocal`` or ``global`` statement.) + +In a given pattern, a given name may be bound only once. This +disallows for example ``case x, x: ...`` but allows ``case [x] | x: +...``. + +.. _wildcard_pattern: + +Wildcard Pattern +^^^^^^^^^^^^^^^^ + +Syntax:: + + wildcard_pattern: "_" + +A wildcard pattern always succeeds. It binds no name. + +.. _constant_value_pattern: + +Constant Value Patterns +^^^^^^^^^^^^^^^^^^^^^^^ + +TODO: Rename to Value Patterns? (But ``value[s]_pattern`` is already +a grammatical rule.) + +Syntax:: + + constant_pattern: attr + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + +The dotted name in the pattern is looked up using the standard Python +name resolution rules. However, when the same constant pattern occurs +multiple times in the same ``match`` statement, the interpreter may cache +the first value found and reuse it, rather than repeat the same +lookup. (To clarify, this cache is strictly tied to a given execution +of a given ``match`` statement.) + +The pattern succeeds if the value found thus compares equal to the +subject value (using the ``==`` operator). + + +Group Patterns +^^^^^^^^^^^^^^ + +Syntax: + + group_pattern: '(' pattern ')' + +(For the syntax of ``pattern``, see Patterns above. Note that it +contains no comma -- a parenthesized series of items with at least one +comma is a sequence pattern, as is ``()``.) + +A parenthesized pattern has no additional syntax. It allows users to +add parentheses around patterns to emphasize the intended grouping. + + +.. _sequence_pattern: + +Sequence Patterns +^^^^^^^^^^^^^^^^^ + +Syntax:: + + sequence_pattern: + | '[' [values_pattern] ']' + | '(' [open_sequence_pattern] ')' + open_sequence_pattern: value_pattern ',' [values_pattern] + values_pattern: ','.value_pattern+ ','? + value_pattern: star_pattern | pattern + star_pattern: '*' (capture_pattern | wildcard_pattern) + +(Note that a single parenthesized pattern without a trailing comma is +a group pattern, not a sequence pattern. However a single pattern +enclosed in ``[...]`` is still a sequence pattern.) + +There is no semantic difference between a sequence pattern using +``[...]``, a sequence pattern using ``(...)``, and an open sequence +pattern. + +A sequence pattern may contain at most one star subpattern. The star +subpattern may occur in any position. If no star subpattern is +present, the sequence pattern is a fixed-length sequence pattern; +otherwise it is a variable-length sequence pattern. + +A sequence pattern fails if the subject value is not an instance of +``collections.abc.Sequence``. It also fails if the subject value is +an instance of ``str``, ``bytes`` or ``bytearray``. + +A fixed-length sequence pattern fails if the length of the subject +sequence is not equal to the number of subpatterns. + +A variable-length sequence pattern fails if the length of the subject +sequence is less than the number of non-star subpatterns. + +The length of the subject sequence is obtained using the builtin +``len()`` function (i.e., via the ``__len__`` protocol). However, the +interpreter may cache this value in a similar manner as described for +constant value patterns. + +A fixed-length sequence pattern matches the subpatterns to +corresponding items of the subject sequence, from left to right. +Matching stops (with a failure) as soon as a subpattern fails. If all +subpatterns succeed in matching their corresponding item, the sequence +pattern succeeds. + +A variable-length sequence pattern first matches the leading non-star +subpatterns to the curresponding items of the subject sequence, as for +a fixed-length sequence. If this succeeds, the star subpattern +matches a list formed of the remaining subject items, with items +removed from the end corresponding to the non-star subpatterns +following the star subpattern. The remaining non-star subpatterns are +then matched to the corresponding subject items, as for a fixed-length +sequence. + + +.. _mapping_pattern: + +Mapping Patterns +^^^^^^^^^^^^^^^^ + +Syntax:: + + mapping_pattern: '{' [items_pattern] '}' + items_pattern: ','.key_value_pattern+ ','? + key_value_pattern: + | (literal_pattern | constant_pattern) ':' or_pattern + | double_star_pattern + double_star_pattern: '**' capture_pattern + +(Note that ``**_`` is disallowed by this syntax.) + +A mapping pattern may contain at most one double star pattern, +and it must be last. + +A mapping pattern may not contain duplicate key values. +(If all key patterns are literal patterns this is considered a +syntax error; otherwise this is a runtime error and will +raise ``TypeError``.) + +A mapping pattern fails if the subject value is not an instance of +``collections.abc.Mapping``. + +A mapping pattern succeeds if every key given in the mapping pattern +matches the corresponding item of the subject mapping. If a ``'**' +NAME`` form is present, that name is bound to a ``dict`` containing +remaining key-value pairs from the subject mapping. + +If duplicate keys are detected in the mapping pattern, the pattern is +considered invalid, and a ``ValueError`` is raised. + +Key-value pairs are matched using the two-argument form of the +subject's ``get()`` method. As a consequence, matched key-value pairs +must already be present in the mapping, and not created on-the-fly by +``__missing__`` or ``__getitem__``. For example, +``collections.defaultdict`` instances will only be matched by patterns +with keys that were already present when the ``match`` block was +entered. + + +.. _class_pattern: + +Class Patterns +^^^^^^^^^^^^^^ + +Syntax:: + + class_pattern: + | name_or_attr '(' [pattern_arguments ','?] ')' + pattern_arguments: + | positional_patterns [',' keyword_patterns] + | keyword_patterns + positional_patterns: ','.pattern+ + keyword_patterns: ','.keyword_pattern+ + keyword_pattern: NAME '=' or_pattern + +(Note that positional patterns may be unparenthesized walrus patterns, +but keyword patterns may not.) + +A class pattern may not repeat the same keyword multiple times. + +If ``name_or_attr`` is not an instance of the builtin ``type``, +``TypeError`` is raised. + +A class pattern fails if the subject is not an instance of ``name_or_attr``. +This is tested using ``isinstance()``. + +If no arguments are present, the pattern succeeds if the ``isinstance()`` +check succeeds. Otherwise: + +- If only keyword patterns are present, they are processed as follows, + one by one: + + - The keyword is looked up as an attribute on the subject. + + - If this raises an exception other than ``AttributeError``, + the exception bubbles up. + + - If this raises ``AttributeError`` the class pattern fails. + + - Otherwise, the subpattern associated with the keyword is matched + against the attribute value. If this fails, the class pattern fails. + If it succeeds, the match proceeds to the next keyword. + + - If all keyword patterns succeed, the class pattern as a whole succeeds. + +- If any positional patterns are present, they are converted to keyword + patterns (see below) and treated as additional keyword patterns, + preceding the syntactic keyword patterns (if any). + +Positional patterns are converted to keyword patterns using the +``__match_args__`` attribute on the class designated by ``name_or_attr``, +as follows: + +- For a number of built-in types (specified below), + a single positional subpattern is accepted which will match + the entire subject; for these types no keyword patterns are accepted. +- The equivalent of ``getattr(cls, "__match_args__", ()))`` is called. +- If this raises an exception the exception bubbles up. +- If the returned value is not a list or tuple, the conversion fails + and ``TypeError`` is raised. +- If there are more positional patterns than the length of + ``__match_args__``` (as obtained using ``len()``), ``TypeError`` is raised. +- Otherwise, positional pattern ``i`` is converted to a keyword pattern + using ``__match_args__[i]`` as the keyword, + provided it the latter is a string; + if it is not, ``TypeError`` is raised. +- For duplicate keywords, ``TypeError`` is raised. + +Once the positional patterns have been converted to keyword patterns, +the match proceeds as if there were only keyword patterns. + +As mentioned above, for the following built-in types the handling of +positional subpatterns is different: +``bool``, ``bytearray``, ``bytes``, ``dict``, ``float``, +``frozenset``, ``int``, ``list``, ``set``, ``str``, and ``tuple``. + +This behavior is roughly equivalent to the following:: + + class C: + __match_args__ = ["__match_self_prop__"] + @property + def __match_self_prop__(self): + return self + + +Side effects +============ + +The only side-effect produced explicitly by the matching process is +the binding of names. However, the process relies on attribute +access, instance checks, ``len()``, equality and item access on the +subject and some of its components. It also evaluates constant value +patterns and the class name of class patterns. While none of those +typically create any side-effects, in theory they could. This +proposal intentionally leaves out any specification of what methods +are called or how many times. This behavior is therefore undefined +and user code should not rely on it. + + +The standard library +==================== + +To facilitate the use of pattern matching, several changes will be +made to the standard library: + +- Namedtuples and dataclasses will have auto-generated + ``__match_args__``. + +- For dataclasses the order of attributes in the generated + ``__match_args__`` will be the same as the order of corresponding + arguments in the generated ``__init__()`` method. This includes the + situations where attributes are inherited from a superclass. Fields + with ``init=False`` are excluded from ``__match_args__``. + +In addition, a systematic effort will be put into going through +existing standard library classes and adding ``__match_args__`` where +it looks beneficial. + + +.. _Appendix A: + +Appendix A -- Full Grammar +========================== + +TODO: Go over the differences with the reference implementation and +resolve them (either by fixing the PEP or by fixing the reference +implementation). + +Here is the full grammar for ``match_stmt``. This is an additional +alternative for ``compound_stmt``. Remember that ``match`` and +``case`` are soft keywords, i.e. they are not reserved words in other +grammatical contexts (including at the start of a line if there is no +colon where expected). By convention, hard keywords use single quotes +while soft keywords use double quotes. + +Other notation used beyond standard EBNF: + +- ``SEP.RULE+`` is shorthand for ``RULE (SEP RULE)*`` +- ``!RULE`` is a negative lookahead assertion + +:: + + match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT + match_expr: + | star_named_expression ',' [star_named_expressions] + | named_expression + case_block: "case" patterns [guard] ':' block + guard: 'if' named_expression + + patterns: open_sequence_pattern | pattern + pattern: walrus_pattern | or_pattern + walrus_pattern: capture_pattern ':=' or_pattern + or_pattern: '|'.closed_pattern+ + closed_pattern: + | literal_pattern + | capture_pattern + | wildcard_pattern + | constant_pattern + | group_pattern + | sequence_pattern + | mapping_pattern + | class_pattern + + literal_pattern: + | signed_number !('+' | '-') + | signed_number '+' NUMBER + | signed_number '-' NUMBER + | strings + | 'None' + | 'True' + | 'False' + signed_number: NUMBER | '-' NUMBER + + capture_pattern: !"_" NAME !('.' | '(' | '=') + + wildcard_pattern: "_" + + constant_pattern: attr !('.' | '(' | '=') + attr: name_or_attr '.' NAME + name_or_attr: attr | NAME + + group_pattern: '(' pattern ')' + + sequence_pattern: + | '[' [values_pattern] ']' + | '(' [open_sequence_pattern] ')' + open_sequence_pattern: value_pattern ',' [values_pattern] + values_pattern: ','.value_pattern+ ','? + value_pattern: star_pattern | pattern + star_pattern: '*' (capture_pattern | wildcard_pattern) + + mapping_pattern: '{' [items_pattern] '}' + items_pattern: ','.key_value_pattern+ ','? + key_value_pattern: + | (literal_pattern | constant_pattern) ':' or_pattern + | double_star_pattern + double_star_pattern: '**' capture_pattern + + class_pattern: + | name_or_attr '(' [pattern_arguments ','?] ')' + pattern_arguments: + | positional_patterns [',' keyword_patterns] + | keyword_patterns + positional_patterns: ','.pattern+ + keyword_patterns: ','.keyword_pattern+ + keyword_pattern: NAME '=' or_pattern + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0635.rst b/pep-0635.rst new file mode 100644 index 000000000000..e2f7cd2c8c63 --- /dev/null +++ b/pep-0635.rst @@ -0,0 +1,977 @@ +PEP: 635 +Title: Structural Pattern Matching: Motivation and Rationale +Version: $Revision$ +Last-Modified: $Date$ +Author: Tobias Kohn , + Guido van Rossum +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Resolution: + + + +Abstract +======== + +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + +This PEP provides the motivation and rationale for PEP 634 +("Structural Pattern Matching: Specification"). First-time readers +are encouraged to start with PEP 636, which provides a gentler +introduction to the concepts, syntax and semantics of patterns. + + + +Motivation +========== + +(Structural) pattern matching syntax is found in many languages, from +Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for +JavaScript is also under consideration.) + +Python already supports a limited form of this through sequence +unpacking assignments, which the new proposal leverages. + +Several other common Python idioms are also relevant: + +- The ``if ... elif ... elif ... else`` idiom is often used to find + out the type or shape of an object in an ad-hoc fashion, using one + or more checks like ``isinstance(x, cls)``, ``hasattr(x, "attr")``, + ``len(x) == n`` or ``"key" in x`` as guards to select an applicable + block. The block can then assume ``x`` supports the interface + checked by the guard. For example:: + + if isinstance(x, tuple) and len(x) == 2: + host, port = x + mode = "http" + elif isinstance(x, tuple) and len(x) == 3: + host, port, mode = x + # Etc. + + Code like this is more elegantly rendered using ``match``:: + + match x: + case host, port: + mode = "http" + case host, port, mode: + pass + # Etc. + +- AST traversal code often looks for nodes matching a given pattern, + for example the code to detect a node of the shape "A + B * C" might + look like this:: + + if (isinstance(node, BinOp) and node.op == "+" + and isinstance(node.right, BinOp) and node.right.op == "*"): + a, b, c = node.left, node.right.left, node.right.right + # Handle a + b*c + + Using ``match`` this becomes more readable:: + + match node: + case BinOp("+", a, BinOp("*", b, c)): + # Handle a + b*c + +- TODO: Other compelling examples? + +We believe that adding pattern matching to Python will enable Python +users to write cleaner, more readable code for examples like those +above, and many others. + +Pattern matching and OO +----------------------- + +Pattern matching is complimentary to the object-oriented paradigm. +Using OO and inheritance we can easily define a method on a base class +that defines default behavior for a specific operation on that class, +and we can override this default behavior in subclasses. We can also +use the Visitor pattern to separate actions from data. + +But this is not sufficient for all situations. For example, a code +generator may consume an AST, and have many operations where the +generated code needs to vary based not just on the class of a node, +but also on the value of some class attributes, like the ``BinOp`` +example above. The Visitor pattern is insufficiently flexible for +this: it can only select based on the class. + +For a complete example, see +https://github.com/gvanrossum/patma/blob/master/examples/expr.py#L231 + +TODO: Could we say more here? + +Pattern and functional style +---------------------------- + +Most Python applications and libraries are not written in a consistent +OO style -- unlike Java, Python encourages defining functions at the +top-level of a module, and for simple data structures, tuples (or +named tuples or lists) and dictionaries are often used exclusively or +mixed with classes or data classes. + +Pattern matching is particularly suitable for picking apart such data +structures. As an extreme example, it's easy to write code that picks +a JSON data structure using ``match``. + +TODO: Example code. + + + + +Rationale +========= + +TBD. + +This section should provide the rationale for individual design decisions. +It takes the place of "Rejected ideas" in the standard PEP format. +It is organized in sections corresponding to the specification (PEP 634). + + +Overview and terminology +------------------------ + + + +The ``match`` statement +----------------------- + +The match statement evaluates an expression to produce a subject, finds the +first pattern that matches the subject and executes the associated block +of code. Syntactically, the match statement thus takes an expression and +a sequence of case clauses, where each case clause comprises a pattern and +a block of code. + +Since case clauses comprise a block of code, they adhere to the existing +indentation scheme with the syntactic structure of +`` ...: <(indented) block>``, which in turn makes it a (compound) +statement. The chosen keyword ``case`` reflects its widespread use in +pattern matching languages, ignoring those languages that use other +syntactic means such as a symbol like ``|`` because it would not fit +established Python structures. The syntax of patterns following the +keyword is discussed below. + +Given that the case clauses follow the structure of a compound statement, +the match statement itself naturally becomes a compoung statement itself +as well, following the same syntactic structure. This naturally leads to +``match : +``. Note that the match statement determines +a quasi-scope in which the evaluated subject is kept alive (although not in +a local variable), similar to how a with statement might keep a resource +alive during execution of its block. Furthermore, control flows from the +match statement to a case clause and then leaves the block of the match +statement. The block of the match statement thus has both syntactic and +semantic meaning. + +Various suggestions have sought to eliminate or avoid the naturally arising +"double indentation" of a case clause's code block. Unfortunately, all such +proposals of *flat indentation schemes* come at the expense of violating +Python's establish structural paradigm, leading to additional syntactic +rules: + +- *Unindented case clauses.* + The idea is to align case clauses with the ``match``, i.e.:: + + match expression: + case pattern_1: + ... + case pattern_2: + ... + + This may look awkward to the eye of a Python programmer, because + everywhere else colon is followed by an indent. The ``match`` would + neither follow the syntactic scheme of simple nor composite statements + but rather establish a category of its own. + +- *Putting the expression on a separate line after ``match``.* + The idea is to use the expression yielding the subject as a statement + to avoid the singularity of ``match`` having no actual block despite + the colons:: + + match: + expression + case pattern_1: + ... + case pattern_2: + ... + + This was ultimately rejected because the first block would be another + novelty in Python's grammar: a block whose only content is a single + expression rather than a sequence of statements. Attempts to amend this + issue by adding or repurposing yet another keyword along the lines of + ``match: return expression`` did not yield any satisfactory solution. + +Although flat indentation would save some horizontal space, the cost of +increased complexity or unusual rules is too high. It would also complicate +life for simple-minded code editors. Finally, the horizontal space issue can +be alleviated by allowing "half-indent" (i.e. two spaces instead of four) +for match statements. + +In sample programs using match, written as part of the development of this +PEP, a noticeable improvement in code brevity is observed, more than making +up for the additional indentation level. + + +*Statement v Expression.* Some suggestions centered around the idea of +making ``match`` an expression rather than a statement. However, this +would fit poorly with Python's statement-oriented nature and lead to +unusually long and complex expressions with the need to invent new +syntactic constructs or break well established syntactic rules. An +obvious consequence of ``match`` as an expression would be that case +clauses could no longer have abitrary blocks of code attached, but only +a single expression. Overall, the strong limitations could in no way +offset the slight simplification in some special use cases. + + + +Match semantics +~~~~~~~~~~~~~~~ + +The patterns of different case clauses might overlap in that more than +one case clause would match a given subject. The first-to-match rule +ensures that the selection of a case clause for a given subject is +unambiguous. Furthermore, case clauses can have increasingly general +patterns matching wider classes of subjects. The first-to-match rule +then ensures that the most precise pattern can be chosen (although it +is the programmer's responsibility to order the case clauses correctly). + +In a statically typed language, the match statement would be compiled to +a decision tree to select a matching pattern quickly and very efficiently. +This would, however, require that all patterns be purely declarative and +static, running against the established dynamic semantics of Python. The +proposed semantics thus represent a path incorporating the best of both +worlds: patterns are tried in a strictly sequential order so that each +case clause constitutes an actual stement. At the same time, we allow +the interpreter to cache any information about the subject or change the +order in which subpatterns are tried. In other words: if the interpreter +has found that the subject is not an instance of a class ``C``, it can +directly skip case clauses testing for this again, without having to +perform repeated instance-checks. If a guard stipulates that a variable +``x`` must be positive, say (i.e. ``if x > 0``), the interpreter might +check this directly after binding ``x`` and before any further +subpatterns are considered. + + +*Binding and scoping.* In many pattern matching implementations, each +case clause would establish a separate scope of its own. Variables bound +by a pattern would then only be visible inside the corresponding case block. +In Python, however, this does not make sense. Establishing separate scopes +would essentially mean that each case clause is a separate function without +direct access to the variables in the surrounding scope (without having to +resort to ``nonlocal`` that is). Moreover, a case clause could no longer +influence any surrounding control flow through standard statement such as +``return`` or ``break``. Hence, such script scoping would lead to +unintuitive and surprising behavior. + +A direct consequence of this is that any variable bindings outlive the +respective case or match statements. Even patterns that only match a +subject partially might bind local variables (this is, in fact, necessary +for guards to function properly). However, this escaping of variable +bindings is in line with existing Python structures such as for loops and +with statements. + + +.. _patterns: + +Patterns +-------- + +Patterns fulfill two purposes: they impose (structural) constraints on +the subject and they specify which data values should be extracted from +the subject and bound to variables. In iterable unpacking, which can be +seen as a prototype to pattern matching in Python, there is only one +*structural pattern* to express sequences while there is a rich set of +*binding patterns* to assign a value to a specific variable or field. +Full pattern matching differs from this in that there is more variety +in structual patterns but only a minimum of binding patterns. + +Patterns differ from assignment targets (as in iterable unpacking) in that +they impose additional constraints on the structure of the subject and in +that a subject might safely fail to match a specific pattern at any point +(in iterable unpacking, this constitutes an error). The latter means that +pattern should avoid side effects wherever possible, including binding +values to attributes or subscripts. + +A cornerstone of pattern matching is the possibility of arbitrarily +*nesting patterns*. The nesting allows for expressing deep +tree structures (for an example of nested class patterns, see the motivation +section above) as well as alternatives. + +Although the structural patterns might superficially look like expressions, +it is important to keep in mind that there is a clear distinction. In fact, +no pattern is or contains an expression. It is more productive to think of +patterns as declarative elements similar to the formal parameters in a +function definition. + + +Walrus patterns +~~~~~~~~~~~~~~~ + + + +OR patterns +~~~~~~~~~~~ + +The OR pattern allows you to combine 'structurally equivalent' alternatives +into a new pattern, i.e. several patterns can share a common handler. If any +one of an OR pattern's subpatterns matches the given subject, the entire OR +pattern succeeds. + +Statically typed languages prohibit the binding of names (capture patterns) +inside an OR pattern because of potential conflicts concerning the types of +variables. As a dynamically typed language, Python can be less restrictive +here and allow capture patterns inside OR patterns. However, each subpattern +must bind the same set of variables so as not to leave potentially undefined +names. With two alternatives ``P | Q``, this means that if *P* binds the +variables *u* and *v*, *Q* must bind exactly the same variables *u* and *v*. + +There was some discussion on whether to use the bar ``|`` or the keyword +``or`` in order to separate alternatives. The OR pattern does not fully fit +the existing semantics and usage of either of these two symbols. However, +``|`` is the symbol of choice in all programming languages with support of +the OR pattern and is even used in that capacity for regular expressions in +Python as well. Moreover, ``|`` is not only used for bitwise OR, but also +for set unions and dict merging (:pep:`584`). +Other alternatives were considered as well, but none of these would allow +OR-patterns to be nested inside other patterns: + +- *Using a comma*:: + + case 401, 403, 404: + print("Some HTTP error") + + This looks too much like a tuple -- we would have to find a different way + to spell tuples, and the construct would have to be parenthesized inside + the argument list of a class pattern. In general, commas already have many + different meanings in Python, we shouldn't add more. + +- *Using stacked cases*:: + + case 401: + case 403: + case 404: + print("Some HTTP error") + + This is how this would be done in *C*, using its fall-through semantics + for cases. However, we don't want to mislead people into thinking that + match/case uses fall-through semantics (which are a common source of bugs + in *C*). Also, this would be a novel indentation pattern, which might make + it harder to support in IDEs and such (it would break the simple rule "add + an indentation level after a line ending in a colon"). Finally, this + would not support OR patterns nested inside other patterns. + +- *Using ``case in`` followed by a comma-separated list*:: + + case in 401, 403, 404: + print("Some HTTP error") + + This would not work for OR patterns nested inside other patterns, like:: + + case Point(0|1, 0|1): + print("A corner of the unit square") + + +*AND and NOT patterns.* +This proposal defines an OR-pattern (|) to match one of several alternates; +why not also an AND-pattern (``&``) or even a NOT-pattern (``!``)? +Especially given that some other languages (``F#`` for example) support +AND-patterns. + +However, it is not clear how useful this would be. The semantics for matching +dictionaries, objects and sequences already incorporates an implicit 'and': +all attributes and elements mentioned must be present for the match to +succeed. Guard conditions can also support many of the use cases that a +hypothetical 'and' operator would be used for. + +A negation of a match pattern using the operator ``!`` as a prefix would match +exactly if the pattern itself does not match. For instance, ``!(3 | 4)`` +would match anything except ``3`` or ``4``. However, there is evidence from +other languages that this is rarely useful and primarily used as double +negation ``!!`` to control variable scopes and prevent variable bindings +(which does not apply to Python). + +In the end, it was decided that this would make the syntax more complex +without adding a significant benefit. + + +Example:: + + def simplify(expr): + match expr: + case ('/', 0, 0): + return expr + case ('*' | '/', 0, _): + return 0 + case ('+' | '-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*' | '/', x, 1): + return x + return expr + + +.. _capture_pattern: + +Capture Patterns +~~~~~~~~~~~~~~~~ + +Capture patterns take on the form of a name that accepts any value and binds +it to a (local) variable (unless the name is declared as ``nonlocal`` or +``global``). In that sense, a simple capture pattern is basically equivalent +to a parameter in a function definition (when the function is called, each +parameter binds the respective argument to a local variable in the function's +scope). + +A name used for a capture pattern must not coincide with another capture +pattern in the same pattern. This, again, is similar to parameters, which +equally require each parameter name to be unique within the list of +parameters. It differs, however, from iterable unpacking assignment, where +the repeated use of a variable name as target is permissible (e.g., +``x, x = 1, 2``). The rationale for not supporting ``(x, x)`` in patterns +is its ambiguous reading: it could be seen as in iterable unpacking where +only the second binding to ``x`` survives. But it could be equally seen as +expressing a tuple with two equal elements (which comes with its own issues). +Should the need arise, then it is still possible to introduce support for +repeated use of names later on. + +There were calls to explicitly mark capture patterns and thus identify them +as binding targets. According to that idea, a capture pattern would be +written as, e.g. ``?x`` or ``$x``. The aim of such explicit capture markers +is to let an unmarked name be a constant value pattern (see below). However, +this is based on the misconception that pattern matching was an extension of +*switch* statements, placing the emphasis on fast switching based on +(ordinal) values. Such a *switch* statement has indeed been proposed for +Python before (see :pep:`275` and :pep:`3103`). Pattern matching, on the other +hand, builds a generalized concept of iterable unpacking. Binding values +extracted from a data structure is at the very core of the concept and hence +the most common use case. Explicit markers for capture patterns would thus +betray the objective of the proposed pattern matching syntax and simplify +a secondary use case at the expense of additional syntactic clutter for +core cases. + +Example:: + + def average(*args): + match args: + case [x, y]: # captures the two elements of a sequence + return (x + y) / 2 + case [x]: # captures the only element of a sequence + return x + case []: + return 0 + case x: # captures the entire sequence + return sum(x) / len(x) + + +.. _wildcard_pattern: + +Wildcard Pattern +~~~~~~~~~~~~~~~~ + +The wildcard pattern is a special case of a 'capture' pattern: it accepts +any value, but does not bind it to a variable. The idea behind this rule +is to support repeated use of the wildcard in patterns. While ``(x, x)`` +is an error, ``(_, _)`` is legal. + +Particularly in larger (sequence) patterns, it is important to allow the +pattern to concentrate on values with actual significance while ignoring +anything else. Without a wildcard, it would become necessary to 'invent' +a number of local variables, which would be bound but never used. Even +when sticking to naming conventions and using e.g. ``_1, _2, _3`` to name +irrelevant values, say, this still introduces visual clutter and can hurt +performance (compare the sequence pattern ``(x, y, *z)`` to ``(_, y, *_)``, +where the ``*z`` forces the interpreter to copy a potentially very long +sequence, whereas the second version simply compiles to code along the +lines of ``y = seq[1]``). + +There has been much discussion about the choice of the underscore as ``_`` +as a wildcard pattern, i.e. making this one name non-binding. However, the +underscore is already heavily used as an 'ignore value' marker in iterable +unpacking. Since the wildcard pattern ``_`` never binds, this use of the +underscore does not interfere with other uses such as inside the REPL or +the ``gettext`` module. + +It has been proposed to use ``...`` (i.e., the ellipsis token) or ``*`` +(star) as a wildcard. However, both these look as if an arbitrary number +of items is omitted:: + + case [a, ..., z]: ... + case [a, *, z]: ... + +Both look like the would match a sequence of at two or more items, +capturing the first and last values. + +A single wildcard clause (i.e. ``case _:``) is semantically equivalent to +an ``else:``. It accepts any subject without binding it to a variable or +performing any other operation. However, the wildcard pattern is in +contrast to ``else`` usable as a subpattern in nested patterns. + +Finally note that the underscore is as a wildcard pattern in *every* +programming language with pattern matching that we could find +(including *C#*, *Elixir*, *Erlang*, *F#*, *Grace*, *Haskell*, +*Mathematica*, *OCaml*, *Ruby*, *Rust*, *Scala*, *Swift*, and *Thorn*). +Keeping in mind that many users of Python also work with other programming +languages, have prior experience when learning Python, or moving on to +other languages after having learnt Python, we find that such well +established standards are important and relevant with respect to +readability and learnability. In our view, concerns that this wildcard +means that a regular name received special treatment are not strong +enough to introduce syntax that would make Python special. + +Example:: + + def is_closed(sequence): + match sequence: + case [_]: # any sequence with a single element + return True + case [start, *_, end]: # a sequence with at least two elements + return start == end + case _: # anything + return False + + +.. _literal_pattern: + +Literal Patterns +~~~~~~~~~~~~~~~~ + +Literal patterns are a convenient way for imposing constraints on the +value of a subject, rather than its type or structure. Literal patterns +even allow you to emulate a switch statement using pattern matching. + +Generally, the subject is compared to a literal pattern by means of standard +equality (``x == y`` in Python syntax). Consequently, the literal patterns +``1.0`` and ``1`` match exactly the same set of objects, i.e. ``case 1.0:`` +and ``case 1:`` are fully interchangable. In principle, ``True`` would also +match the same set of objects because ``True == 1`` holds. However, we +believe that many users would be surprised finding that ``case True:`` +matched the object ``1.0``, resulting in some subtle bugs and convoluted +workarounds. We therefore adopted the rule that the three singleton +objects ``None``, ``False`` and ``True`` match by identity (``x is y`` in +Python syntax) rather than equality. Hence, ``case True:`` will match only +``True`` and nothing else. Note that ``case 1:`` would still match ``True``, +though, because the literal pattern ``1`` works by equality and not identity. + +Early ideas to induce a hierarchy on numbers so that ``case 1.0`` would +match both the integer ``1`` and the floating point number ``1.0``, whereas +``case 1:`` would only match the integer ``1`` were eventually dropped in +favor of the simpler and consistent rule based on equality. Moreover, any +additional checks whether the subject is an instance of ``numbers.Integral`` +would come at a high runtime cost to introduce what would essentially be +novel in Python. When needed, the explicit syntax ``case int(1):`` might +be used. + +Recall that literal patterns are *not* expressions, but directly denote a +specific value or object. From a syntactical point of view, we have to +ensure that negative and complex numbers can equally be used as patterns, +although they are not atomic literal values (i.e. the seeming literal value +``-3+4j`` would syntactically be an expression of the form +``BinOp(UnaryOp('-', 3), '+', 4j)``, but as expressions are not part of +patterns, we added syntactic support for such complex value literals without +having to resort to full expressions). Interpolated *f*-strings, on the +other hand, are not literal values, despite their appearance and can +therefore not be used as literal patterns (string concatenation, however, +is supported). + +Literal patterns not only occur as patterns in their own right, but also +as keys in *mapping patterns*. + +Example:: + + def simplify(expr): + match expr: + case ('+', 0, x): + return x + case ('+' | '-', x, 0): + return x + case ('and', True, x): + return x + case ('and', False, x): + return False + case ('or', False, x): + return x + case ('or', True, x): + return True + case ('not', ('not', x)): + return x + return expr + + +.. _constant_value_pattern: + +Constant Value Patterns +~~~~~~~~~~~~~~~~~~~~~~~ + +It is good programming style to use named constants for parametric values or +to clarify the meaning of particular values. Clearly, it would be desirable +to also write ``case (HttpStatus.OK, body):`` rather than +``case (200, body):``, for example. The main issue that arises here is how to +distinguish capture patterns (variables) from constant value patterns. The +general discussion surrounding this issue has brought forward a plethora of +options, which we cannot all fully list here. + +Strictly speaking, constant value patterns are not really necessary, but +could be implemented using guards, i.e. +``case (status, body) if status == HttpStatus.OK:``. Nonetheless, the +convenience of constant value patterns is unquestioned and obvious. + +The observation that constants tend to be written in uppercase letters or +collected in enumeration-like namespaces suggests possible rules to discern +constants syntactically. However, the idea of using upper vs. lower case as +a marker has been met with scepticism since there is no similar precedence +in core Python (although it is common in other languages). We therefore only +adopted the rule that any dotted name (i.e. attribute access) is to be +interpreted as a constant value pattern like ``HttpStatus.OK`` +above. This precludes, in particular, local variables from acting as +constants. + +Global variables can only be directly used as constant when defined in other +modules, although there are workarounds to access the current module as a +namespace as well. A proposed rule to use a leading dot (e.g. +``.CONSTANT``) for that purpose was critisised because it was felt that the +dot would not be a visible-enough marker for that purpose. Partly inspired +by use cases in other programming languages, a number of different +markers/sigils were proposed (such as ``^CONSTANT``, ``$CONSTANT``, +``==CONSTANT``, ``CONSTANT?``, or the word enclosed in backticks), although +there was no obvious or natural choice. The current proposal therefore +leaves the discussion and possible introduction of such a 'constant' marker +for future PEPs. + +Distinguishing the semantics of names based on whether it is a global +variable (i.e. the compiler would treat global variables as constants rather +than capture patterns) leads to various issues. The addition or alteration +of a global variable in the module could have unintended side effects on +patterns. Moreover, pattern matching could not be used directly inside a +module's scope because all variables would be global, making capture +patterns impossible. + +Example:: + + def handle_reply(reply): + match reply: + case (HttpStatus.OK, MimeType.TEXT, body): + process_text(body) + case (HttpStatus.OK, MimeType.APPL_ZIP, body): + text = deflate(body) + process_text(text) + case (HttpStatus.MOVED_PERMANENTLY, new_URI): + resend_request(new_URI) + case (HttpStatus.NOT_FOUND): + raise ResourceNotFound() + + +Group Patterns +~~~~~~~~~~~~~~ + +Allowing users to explicitly specify the grouping is particularly helpful +in case of OR patterns. + + +.. _sequence_pattern: + +Sequence Patterns +~~~~~~~~~~~~~~~~~ + +Sequence patterns follow as closely as possible the already established +syntax and semantics of iterable unpacking. Of course, subpatterns take +the place of assignment targets (variables, attributes and subscript). +Moreover, the sequence pattern only matches a carefully selected set of +possible subjects, whereas iterable unpacking can be applied to any +iterable. + +- As in iterable unpacking, we do not distinguish between 'tuple' and + 'list' notation. ``[a, b, c]``, ``(a, b, c)`` and ``a, b, c`` are all + equivalent. While this means we have a redundant notation and checking + specifically for lists or tuples requires more effort (e.g. + ``case list([a, b, c])``), we mimick iterable unpacking as much as + possible. + +- A starred pattern will capture a sub-sequence of arbitrary length, + mirroring iterable unpacking as well. Only one starred item may be + present in any sequence pattern. In theory, patterns such as ``(*_, 3, *_)`` + could be understood as expressing any sequence containing the value ``3``. + In practise, however, this would only work for a very narrow set of use + cases and lead to inefficient backtracking or even ambiguities otherwise. + +- The sequence pattern does *not* iterate through an iterable subject. All + elements are accessed through subscripting and slicing, and the subject must + be an instance of ``collections.abc.Sequence`` (including, in particular, + lists and tuples, but excluding strings and bytes, as well as sets and + dictionaries). + +A sequence pattern cannot just iterate through any iterable object. The +consumption of elements from the iteration would have to be undone if the +overall pattern fails, which is not possible. + +Relying on ``len()`` and subscripting and slicing alone does not work to +identify sequences because sequences share the protocol with more general +maps (dictionaries) in this regard. It would be surprising if a sequence +pattern also matched dictionaries or other custom objects that implement +the mapping protocol (i.e. ``__getitem__``). The interpreter therefore +performs an instance check to ensure that the subject in question really +is a sequence (of known type). + +String and bytes objects have a dual nature: they are both 'atomic' objects +in their own right, as well as sequences (with a strongly recursive nature +in that a string is a sequence of strings). The typical behavior and use +cases for strings and bytes are different enough from that of tuples and +lists to warrant a clear distinction. It is in fact often unintuitive and +unintended that strings pass for sequences as evidenced by regular questions +and complaints. Strings and bytes are therefore not matched by a sequence +pattern, limiting the sequence pattern to a very specific understanding of +'sequence'. + + +.. _mapping_pattern: + +Mapping Patterns +~~~~~~~~~~~~~~~~ + +Dictionaries or mappings in general are one of the most important and most +widely used data structures in Python. In contrast to sequences mappings +are built for fast direct access to arbitrary elements (identified by a key). +In most use cases an element is retrieved from a dictionary by a known key +without regard for any ordering or other key-value pairs stored in the same +dictionary. Particularly common are string keys. + +The mapping pattern reflects the common usage of dictionary lookup: it allows +the user to extract some values from a mapping by means of constant/known +keys and have the values match given subpatterns. Moreover, the mapping +pattern does not check for the presence of additional keys. Should it be +necessary to impose an upper bound on the mapping and ensure that no +additional keys are present, then the usual double-star-pattern ``**rest`` +can be used. The special case ``**_`` with a wildcard, however, is not +supported as it would not have any effect, but might lead to a wrong +understanding of the mapping pattern's semantics. + +To avoid overly expensive matching algorithms, keys must be literals or +constant values. + +Example:: + + def change_red_to_blue(json_obj): + match json_obj: + case { 'color': ('red' | '#FF0000') }: + json_obj['color'] = 'blue' + case { 'children': children }: + for child in children: + change_red_to_blue(child) + + +.. _class_pattern: + +Class Patterns +~~~~~~~~~~~~~~ + +Class patterns fulfil two purposes: checking whether a given subject is +indeed an instance of a specific class and extracting data from specific +attributes of the subject. A quick survey revealed that ``isinstance()`` +is indeed one of the most often used functions in Python in terms of +static occurrences in programs. Such instance checks typically precede +a subsequent access to information stored in the object, or a possible +manipulation thereof. A typical pattern might be along the lines of:: + + def traverse_tree(node): + if isinstance(node, Node): + traverse_tree(node.left) + traverse_tree(node.right) + elif isinstance(node, Leaf): + print(node.value) + +In many cases, however, class patterns occur nested as in the example +given in the motivation:: + + if (isinstance(node, BinOp) and node.op == "+" + and isinstance(node.right, BinOp) and node.right.op == "*"): + a, b, c = node.left, node.right.left, node.right.right + # Handle a + b*c + +The class pattern lets you to concisely specify both an instance-check as +well as relevant attributes (with possible further constraints). It is +thereby very tempting to write, e.g., ``case Node(left, right):`` in the +first case above and ``case Leaf(value):`` in the second. While this +indeed works well for languages with strict algebraic data types, it is +problematic with the structure of Python objects. + +When dealing with general Python objects, we face a potentially very large +number of unordered attributes: an instance of ``Node`` contains a large +number of attributes (most of which are 'private methods' such as, e.g., +``__repr__``). Moreover, the interpreter cannot reliably deduce which of +the attributes comes first and which comes second. For an object that +represents a circle, say, there is no inherently obvious ordering of the +attributes ``x``, ``y`` and ``radius``. + +We envision two possibilities for dealing with this issue: either explicitly +name the attributes of interest or provide an additional mapping that tells +the interpreter which attributes to extract and in which order. Both +approaches are supported. Moreover, explicitly naming the attributes of +interest lets you further specify the required structure of an object; if +an object lacks an attribute specified by the pattern, the match fails. + +- Attributes that are explicitly named pick up the syntax of named arguments. + If an object of class ``Node`` has two attributes ``left`` and ``right`` + as above, the pattern ``Node(left=x, right=y)`` will extract the values of + both attributes and assign them to ``x`` and ``y``, respectively. The data + flow from left to right seems unusual, but is in line with mapping patterns + and has precedents such as assignments via ``as`` in *with*- or + *import*-statements. + + Naming the attributes in question explicitly will be mostly used for more + complex cases where the positional form (below) is insufficient. + +- The class field ``__match_args__`` specifies a number of attributes + together with their ordering, allowing class patterns to rely on positional + sub-patterns without having to explicitly name the attributes in question. + This is particularly handy for smaller objects or instances of data classes, + where the attributes of interest are rather obvious and often have a + well-defined ordering. In a way, ``__match_args__`` is similar to the + declaration of formal parameters, which allows to call functions with + positional arguments rather than naming all the parameters. + + +The syntax of class patterns is based on the idea that de-construction +mirrors the syntax of construction. This is already the case in virtually +any Python construct, be assignment targets, function definitions or +iterable unpacking. In all these cases, we find that the syntax for +sending and that for receiving 'data' are virtually identical. + +- Assignment targets such as variables, attributes and subscripts: + ``foo.bar[2] = foo.bar[3]``; + +- Function definitions: a function defined with ``def foo(x, y, z=6)`` + is called as, e.g., ``foo(123, y=45)``, where the actual arguments + provided at the call site are matched against the formal parameters + at the definition site; + +- Iterable unpacking: ``a, b = b, a`` or ``[a, b] = [b, a]`` or + ``(a, b) = (b, a)``, just to name a few equivalent possibilities. + +Using the same syntax for reading and writing, l- and r-values, or +construction and de-construction is widely accepted for its benefits in +thinking about data, its flow and manipulation. This equally extends to +the explicit construction of instances, where class patterns ``c(p, q)`` +deliberately mirror the syntax of creating instances. + + + +History and Context +=================== + +Pattern matching emerged in the late 1970s in the form of tuple unpacking +and as a means to handle recursive data structures such as linked lists or +trees (object-oriented languages usually use the visitor pattern for handling +recursive data structures). The early proponents of pattern matching +organised structured data in 'tagged tuples' rather than ``struct`` as in +*C* or the objects introduced later. A node in a binary tree would, for +instance, be a tuple with two elements for the left and right branches, +respectively, and a ``Node`` tag, written as ``Node(left, right)``. In +Python we would probably put the tag inside the tuple as +``('Node', left, right)`` or define a data class `Node` to achieve the +same effect. + +Using modern syntax, a depth-first tree traversal would then be written as +follows:: + + def traverse_tree(node): + node match: + case Node(left, right): + DFS(left) + DFS(right) + case Leaf(value): + handle(value) + +The notion of handling recursive data structures with pattern matching +immediately gave rise to the idea of handling more general recursive +'patterns' (i.e. recursion beyond recursive data structures) +with pattern matching. Pattern matching would thus also be used to define +recursive functions such as:: + + def fib(arg): + match arg: + case 0: + return 1 + case 1: + return 1 + case n: + return fib(n-1) + fib(n-2) + +As pattern matching was repeatedly integrated into new and emerging +programming languages, its syntax slightly evolved and expanded. The two +first cases in the ``fib`` example above could be written more succinctly +as ``case 0 | 1:`` with ``|`` denoting alternative patterns. Moreover, the +underscore ``_`` was widely adopted as a wildcard, a filler where neither +the structure nor value of parts of a pattern were of substance. Since the +underscore is already frequently used in equivalent capacity in Python's +iterable unpacking (e.g., ``_, _, third, _* = something``) we kept these +universal standards. + +It is noteworthy that the concept of pattern matching has always been +closely linked to the concept of functions. The different case clauses +have always been considered as something like semi-indepedent functions +where pattern variables take on the role of parameters. This becomes +most apparent when pattern matching is written as an overloaded function, +along the lines of (Standard ML):: + + fun fib 0 = 1 + | fib 1 = 1 + | fib n = fib (n-1) + fib (n-2) + +Even though such a strict separation of case clauses into independent +functions does not make sense in Python, we find that patterns share many +syntactic rules with parameters, such as binding arguments to unqualified +names only or that variable/parameter names must not be repeated for +a particular pattern/function. + +With its emphasis on abstraction and encapsulation, object-oriented +programming posed a serious challenge to pattern matching. In short: in +object-oriented programming, we can no longer view objects as tagged tuples. +The arguments passed into the constructor do not necessarily specify the +attributes or fields of the objects. Moreover, there is no longer a strict +ordering of an object's fields and some of the fields might be private and +thus inaccessible. And on top of this, the given object might actually be +an instance of a subclass with slightly different structure. + +To address this challenge, patterns became increasingly independent of the +original tuple constructors. In a pattern like ``Node(left, right)``, +``Node`` is no longer a passive tag, but rather a function that can actively +check for any given object whether it has the right structure and extract a +``left`` and ``right`` field. In other words: the ``Node``-tag becomes a +function that transforms an object into a tuple or returns some failure +indicator if it is not possible. + +In Python, we simply use ``isinstance()`` together with the ``__match_args__`` +field of a class to check whether an object has the correct structure and +then transform some of its attributes into a tuple. For the `Node` example +above, for instance, we would have ``__match_args__ = ('left', 'right')`` to +indicate that these two attributes should be extracted to form the tuple. +That is, ``case Node(x, y)`` would first check whether a given object is an +instance of ``Node`` and then assign ``left`` to ``x`` and ``right`` to ``y``, +respectively. + +Paying tribute to Python's dynamic nature with 'duck typing', however, we +also added a more direct way to specify the presence of, or constraints on +specific attributes. Instead of ``Node(x, y)`` you could also write +``object(left=x, right=y)``, effectively eliminating the ``isinstance()`` +check and thus supporting any object with ``left`` and ``right`` attributes. +Or you would combine these ideas to write ``Node(right=y)`` so as to require +an instance of ``Node`` but only extract the value of the `right` attribute. + + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0636.rst b/pep-0636.rst new file mode 100644 index 000000000000..8ac56d41f71a --- /dev/null +++ b/pep-0636.rst @@ -0,0 +1,385 @@ +PEP: 636 +Title: Structural Pattern Matching: Tutorial +Version: $Revision$ +Last-Modified: $Date$ +Author: Daniel F Moisset , + Tobias Kohn +Sponsor: Guido van Rossum +BDFL-Delegate: +Discussions-To: Python-Dev +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 12-Sep-2020 +Python-Version: 3.10 +Post-History: +Resolution: + + +Abstract +======== + +**NOTE:** This draft is incomplete and not intended for review yet. +We're checking it into the peps repo for the convenience of the authors. + +This PEP is a tutorial for the pattern matching introduced by PEP 634. + +PEP 622 proposed syntax for pattern matching, which received detailed discussion +both from the community and the Steering Council. A frequent concern was +about how easy it would be to explain (and learn) this feature. This PEP +addresses that concern providing the kind of document which developers could use +to learn about pattern matching in Python. + +This is considered supporting material for PEP 634 (the technical specification +for pattern matching) and PEP 635 (the motivation and rationale for having pattern +matching and design considerations). + +Meta +==== + +This section is intended to get in sync about style and language with +co-authors. It should be removed from the released PEP + +The following are design decisions I made while writing this: + +1. Who is the target audience? +I'm considering "People with general Python experience" (i.e. who shouldn't be surprised +at anything in the Python tutorial), but not necessarily involved with the +design/development or Python. I'm assuming someone who hasn't been exposed to pattern +matching in other languages. + +2. How detailed should this document be? +I considered a range from "very superficial" (like the detail level you might find about +statements in the Python tutorial) to "terse but complete" like +https://github.com/gvanrossum/patma/#tutorial +to "long and detailed". I chose the later, we can always trim down from that. + +3. What kind of examples to use? +I tried to write examples that are could that I might write using pattern matching. I +avoided going +for a full application (because the examples I have in mind are too large for a PEP) but +I tried to follow ideas related to a single project to thread the story-telling more +easily. This is probably the most controversial thing here, and if the rest of +the authors dislike it, we can change to a more formal explanatory style. + +Other rules I'm following (let me know if I forgot to): + +* I'm not going to reference/compare with other languages +* I'm not trying to convince the reader that this is a good idea (that's the job of + PEP 635) just explain how to use it +* I'm not trying to cover every corner case (that's the job of PEP 634), just cover + how to use the full functionality in the "normal" cases. +* I talk to the learner in second person + +Tutorial +======== + +As an example to motivate this tutorial, you will be writing a text-adventure. That is +a form of interactive fiction where the user enters text commands to interact with a +fictional world and receives text descriptions of what happens. Commands will be +simplified forms of natural language like ``get sword``, ``attack dragon``, ``go north``, +``enter shop`` or ``buy cheese``. + +Matching sequences +------------------ + +Your main loop will need to get input from the user and split it into words, let's say +a list of strings like this:: + + command = input("What are you doing next? ") + # analyze the result of command.split() + +The next step is to interpret the words. Most of our commands will have two words: an +action and an object. So you may be tempted to do the following:: + + [action, obj] = command.split() + ... # interpret action, obj + +The problem with that line of code is that it's missing something: what if the user +types more or fewer than 2 words? To prevent this problem you can either check the length +of the list of words, or capture the ``ValueError`` that the statement above would raise. + +You can use a matching statement instead:: + + match command.split(): + case [action, obj]: + ... # interpret action, obj + +The ``match`` statement evaluates the **subject** after the ``match`` keyword, and checks +it against the **pattern** next to ``case``. A pattern is able to do two different +things: + +* Verify that the subject has certain structure. In your case, the ``[action, obj]`` + pattern matches any sequence of exactly two elements. This is called **matching** +* It will bind some names in the pattern to component elements of your subject. In + this case, if the list has two elements, it will bind ``action = subject[0]`` and + ``obj = subject[1]``. This is called **destructuring** + +If there's a match, the statements inside the ``case`` clause will be executed with the +bound variables. If there's no match, nothing happens and the next statement after +``match`` keeps running. + +TODO: discuss other sequences, tuples. Discuss syntax with parenthesis. discuss +iterators? discuss [x, x] possibly later on? + +Matching multiple patterns +-------------------------- + +Even if most commands have the action/object form, you might want to have user commands +of different lengths. For example you might want to add single verbs with no object like +``look`` or ``quit``. A match statement can (and is likely to) have more than one +``case``:: + + match command.split(): + case [action]: + ... # interpret single-verb action + case [action, obj]: + ... # interpret action, obj + +The ``match`` statement will check patterns from top to bottom. If the pattern doesn't +match the subject, the next pattern will be tried. However, once the *first* +matching ``case`` clause is found, the body of that clause is executed, and all further +``case`` clauses are ignored. This is similar to the way that an ``if/elif/elif/...`` +statement works. + +Matching specific values +------------------------ + +Your code still needs to look at the specific actions and conditionally run +different logic depending on the specific action (e.g., ``quit``, ``attack``, or ``buy``). +You could do that using a chain of ``if/elif/elif/...``, or using a dictionary of +functions, but here we'll leverage pattern matching to solve that task. Instead of a +variable, you can use literal values in patterns (like ``"quit"``, ``42``, or ``None``). +This allows you to write:: + + match command.split(): + case ["quit"]: + print("Goodbye!") + quit_game() + case ["look"]: + current_room.describe() + case ["get", obj]: + character.get(obj, current_room) + case ["go", direction]: + current_room = current_room.neighbor(direction) + # The rest of your commands go here + +A pattern like ``["get", obj]`` will match only 2-element sequences that have a first +element equal to ``"get"``. When destructuring, it will bind ``obj = subject[1]``. + +As you can see in the ``go`` case, we also can use different variable names in +different patterns. + +FIXME: This *might* be the place to explain a bit that when I say "literal" I mean it +literally, and a "soft constant" will not work :) + +Matching slices +--------------- + +A player may be able to drop multiple objects by using a series of commands +``drop key``, ``drop sword``, ``drop cheese``. This interface might be cumbersome, and +you might like to allow dropping multiple items in a single command, like +``drop key sword cheese``. In this case you don't know beforehand how many words will +be in the command, but you can use extended unpacking in patterns in the same way that +they are allowed in assignments:: + + match command.split(): + case ["drop", *objects]: + for obj in objects: + character.drop(obj, current_room) + # The rest of your commands go here + +This will match any sequences having "drop" as its first elements. All remaining +elements will be captured in a ``list`` object which will be bound to the ``objects`` +variable. + +This syntax has similar restrictions as sequence unpacking: you can not have more than one +starred name in a pattern. + +Adding a catch-all +------------------ + +You may want to print an error message saying that the command wasn't recognized when +all the patterns fail. You could use the feature we just learned and write the +following:: + + match command.split(): + case ["quit"]: ... # Code omitted for brevity + case ["go", direction]: ... + case ["drop", *objects]: ... + ... # Other case clauses + case [*ignored_words]: + print(f"Sorry, I couldn't understand {command!r}") + +Note that you must add this last pattern at the end, otherwise it will match before other +possible patterns that could be considered. This works but it's a bit verbose and +somewhat wasteful: this will make a full copy of the word list, which will be bound to +``ignored_words`` even if it's never used. + +You can use an special pattern which is written ``_``, which always matches but it +doesn't bind anything. which would allow you to rewrite:: + + match command.split(): + ... # Other case clauses + case [*_]: + print(f"Sorry, I couldn't understand {command!r}") + +This pattern will match for any sequence. In this case we can simplify even more and +match any object:: + + match command.split(): + ... # Other case clauses + case _: + print(f"Sorry, I couldn't understand {command!r}") + +TODO: Explain about syntaxerror when having an irrefutable pattern above others? + +How patterns are composed +------------------------- + +This is a good moment to step back from the examples and understand how the patterns +that you have been using are built. Patterns can be nested within each other, and we +have being doing that implicitly in the examples above. + +There are some "simple" patterns ("simple" here meaning that they do not contain other +patterns) that we've seen: + +* **Literal patterns** (string literals, number literals, ``True``, ``False``, and + ``None``) +* The **wildcard pattern** ``_`` +* **Capture patterns** (stand-alone names like ``direction``, ``action``, ``objects``). We + never discussed these separately, but used them as part of other patterns. Note that + a capture pattern by itself will always match, and usually makes sense only + as a catch-all at the end of your ``match`` if you desire to bind the name to the + subject. + +Until now, the only non-simple pattern we have experimented with is the sequence pattern. +Each element in a sequence pattern can in fact be +any other pattern. This means that you could write a pattern like +``["first", (left, right), *rest]``. This will match subjects which are a sequence of at +least two elements, where the first one is equal to ``"first"`` and the second one is +in turn a sequence of two elements. It will also bind ``left=subject[1][0]``, +``right=subject[1][1]``, and ``rest = subject[2:]`` + +Alternate patterns +------------------ + +Going back to the adventure game example, you may find that you'd like to have several +patterns resulting in the same outcome. For example, you might want the commands +``north`` and ``go north`` be equivalent. You may also desire to have aliases for +``get X``, ``pick up X`` and ``pick X up`` for any X. + +The ``|`` symbol in patterns combines them as alternatives. You could for example write:: + + match command.split(): + ... # Other case clauses + case ["north"] | ["go", "north"]: + current_room = current_room.neighbor("north") + case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]: + ... # Code for picking up the given object + +This is called an **or pattern** and will produce the expected result. Patterns are +attempted from left to right; this may be relevant to know what is bound if more than +one alternative matches. An important restriction when writing or patterns is that all +alternatives should bind the same variables. So a pattern ``[1, x] | [2, y]`` is not +allowed because it would make unclear which variable would be bound after a successful +match. ``[1, x] | [2, x]`` is perfectly fine and will always bind ``x`` if successful. + + +Capturing matched sub-patterns +------------------------------ + +The first version of our "go" command was written with a ``["go", direction]`` pattern. +The change we did in our last version using the pattern ``["north"] | ["go", "north"]`` +has some benefits but also some drawbacks in comparison: the latest version allows the +alias, but also has the direction hardcoded, which will force us to actually have +separate patterns for north/south/east/west. This leads to some code duplication, but at +the same time we get better input validation, and we will not be getting into that +branch if the command entered by the user is ``"go figure!"`` instead of an direction. + +We could try to get the best of both worlds doing the following (I'll omit the aliased +version without "go" for brevity):: + + match command.split(): + case ["go", ("north" | "south" | "east" | "west")]: + current_room = current_room.neighbor(...) + # how do I know which direction to go? + +This code is a single branch, and it verifies that the word after "go" is really a +direction. But the code moving the player around needs to know which one was chosen and +has no way to do so. What we need is a pattern that behaves like the or pattern but at +the same time does a capture. We can do so with a **walrus pattern**:: + + match command.split(): + case ["go", direction := ("north" | "south" | "east" | "west")]: + current_room = current_room.neighbor(direction) + +The walrus pattern (named like that because the ``:=`` operator looks like a sideways +walrus) matches whatever pattern is on its right hand side, but also binds the value to +a name. + +Adding conditions to patterns +----------------------------- + +The patterns we have explored above can do some powerful data filtering, but sometimes +you may wish for the full power of a boolean expression. Let's say that you would actually +like to allow a "go" command only in a restricted set of directions based on the possible +exits from the current_room. We can achieve that by adding a **guard** to our +case-clause. Guards consist of the ``if`` keyword followed by any expression:: + + match command.split(): + case ["go", direction] if direction in current_room.exits: + current_room = current_room.neighbor(direction) + case ["go", _]: + print("Sorry, you can't go that way") + +The guard is not part of the pattern, it's part of the case clause. It's only checked if +the pattern matches, and after all the pattern variables have been bound (that's why the +condition can use the ``direction`` variable in the example above). If the pattern +matches and the condition is truthy, the body of the case clause runs normally. If the +pattern matches but the condition is falsy, the match statement proceeds to check the +next ``case`` clause as if the pattern hadn't matched (with the possible side-effect of +having already bound some variables). + +The sequence of these steps must be considered carefully when combining or-patterns and +guards. If you have ``case [x, 100] | [0, x] if x > 10`` and your subject is +``[0, 100]``, the clause will be skipped. This happens because: + +* The or-pattern finds the first alternative that matches the subject, which happens to + be ``[x, 100]`` +* ``x`` is bound to 0 +* The condition x > 10 is checked. Given that it's false, the whole case clause is + skipped. The ``[0, x]`` pattern is never attempted. + +Going to the cloud: Mappings +---------------------------- + +TODO: Give the motivating example of netowrk requests, describe JSON based "protocol" + +TODO: partial matches, double stars + +Matching objects +---------------- + +UI events motivations. describe events in dataclasses. inspiration for event objects +can be taken from https://www.pygame.org/docs/ref/event.html + +example of getting constants from module (like key names for keyboard events) + +customizing match_args? + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: