Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new "type system concepts" section #1743

Merged
merged 48 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b323afb
initial draft of new concepts section
carljm May 22, 2024
bd4416e
review feedback
carljm May 22, 2024
3c332bf
prefer 'assignable to' over 'consistent subtype of'
carljm May 22, 2024
6603f43
minor tweaks
carljm May 22, 2024
4e2a1ce
Merge branch 'main' into concepts
carljm May 22, 2024
137ab2a
terms have types
carljm May 22, 2024
24e6ee5
review comments
carljm May 22, 2024
9e9ebb4
use asymmetric example of assignable-to
carljm May 22, 2024
1271558
Any is equivalent to Any
carljm May 22, 2024
2ddd4ad
add a short para on the gradual guarantee
carljm May 22, 2024
971e9f2
conciseness tweak
carljm May 22, 2024
10fda4e
use 'assignable to' in Any description
carljm May 22, 2024
f4110bb
incorporate some of Eric Traut's feedback
carljm May 23, 2024
ac6273e
more on union
carljm May 24, 2024
15c0d56
add an example of bounded gradual type
carljm May 24, 2024
efdbc5f
add section on attributes and methods
carljm May 24, 2024
e4537ac
more feedback and tweaks
carljm May 26, 2024
c3bbb52
Merge branch 'main' into concepts
carljm Jun 1, 2024
6bebab4
review comments
carljm Jun 1, 2024
2900cce
a bit more on gradual unions
carljm Jun 1, 2024
182a058
a few more review comments
carljm Jun 1, 2024
351136e
add terms to glossary
carljm Jun 1, 2024
cd03de8
Update glossary.rst
carljm Jun 2, 2024
07941d7
Update glossary.rst
carljm Jun 2, 2024
c55d40a
review comments on glossary
carljm Jun 2, 2024
b1775b1
re-apply review comment
carljm Jun 2, 2024
1a71a72
Apply suggestions from code review
carljm Jun 2, 2024
c18d9e1
audit remainder of type spec for terminology usage
carljm Jun 2, 2024
cca3bcd
review comments
carljm Jun 2, 2024
3d6b406
some review comments
carljm Jun 2, 2024
5d40036
one more review tweak on protocol wording
carljm Jun 2, 2024
1e0d118
Merge branch 'main' into concepts
carljm Jun 12, 2024
cbe6e23
add equivalent, narrow, and wide to glossary
carljm Jun 12, 2024
8954595
add table of type relations
carljm Jun 12, 2024
efaa7ab
explicitly allow inference on missing function annotations
carljm Jun 12, 2024
12ae9eb
some review comments
carljm Jun 12, 2024
ccfef86
Update docs/spec/callables.rst
carljm Jun 12, 2024
e990bda
more review comments
carljm Jun 12, 2024
045d7c2
link 'assignable' to glossary more often in callables doc
carljm Jun 14, 2024
dd6ffd1
add nominal/structural to concepts and glossary
carljm Jun 14, 2024
673a467
don't use 'compatible' in callables doc
carljm Jun 14, 2024
0f5fba4
equivalence of gradual types and union simplification
carljm Jun 14, 2024
a6b3ab0
simplify description of structural subtyping
carljm Jun 15, 2024
e5943a4
define equivalence of gradual types in glossary
carljm Jun 15, 2024
effcdec
more review comments
carljm Jun 17, 2024
9ebaef8
review comments
carljm Jun 19, 2024
6a3b716
Update glossary.rst
carljm Jun 20, 2024
ab7f9ac
review comments
carljm Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 170 additions & 93 deletions docs/spec/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,97 +3,174 @@
Type system concepts
====================

.. _`union-types`:
Static, dynamic, and gradual typing
carljm marked this conversation as resolved.
Show resolved Hide resolved
-----------------------------------

A **statically-typed** programming language runs a type-checker before running
a program. The program is required to be well-typed according to the language's
type system. The type system assigns a type to all expressions in the language
and verifies that their uses obey the typing rules. Normally, a program that is
not well-typed (i.e., one that contains a type error) will not run. Java and
C++ are examples of statically-typed object-oriented languages.

A **dynamically-typed** programming language does not run a type-checker before
running a program. Instead, it will check the types of values before performing
carljm marked this conversation as resolved.
Show resolved Hide resolved
operations on them at runtime. This is not to say that the language is
"untyped". Values at runtime have a type and their uses obey typing rules. Not
every operation will be checked, but certain primitive operations in the
language such as attribute access or arithmetic will be. Python was
historically a dynamically-typed language.

**Gradual typing** is the name for a specific way to combine static and dynamic
carljm marked this conversation as resolved.
Show resolved Hide resolved
typing. It allows mixing static and dynamic checking at a fine level of
granularity. For instance, in Python, individual variables, parameters, and
return values can be either statically or dynamically typed. Data structures
such as objects can have a mix of static and dynamic checking. As an example,
a Python dictionary could choose to have static checking of the key type but
carljm marked this conversation as resolved.
Show resolved Hide resolved
dynamic checking of the value type.

In gradual typing, a dynamically typed value is indicated by a special
carljm marked this conversation as resolved.
Show resolved Hide resolved
"unknown" or "dynamic" type. In Python, the unknown type is spelled
:ref:`Any`. It indicates to the static type checker that this value should not
be subject to static checking. The system should not signal a static type
error for use of an expression with type :ref:`Any`. Instead, the expression's
value will be dynamically checked, according to the Python runtime's usual
checks on the types of runtime values.

This specification describes a gradual type system for Python.

Static, dynamic, and gradual types
----------------------------------

We will refer to types that do not contain :ref:`Any` as a sub-part as **static
carljm marked this conversation as resolved.
Show resolved Hide resolved
carljm marked this conversation as resolved.
Show resolved Hide resolved
types**. The dynamic type :ref:`Any` is the **dynamic type**. Any type other
than :ref:`Any` that does contain :ref:`Any` as a sub-part is a **gradual
type**.

Static types
~~~~~~~~~~~~

A static type can intuitively be understood as denoting a set of runtime
carljm marked this conversation as resolved.
Show resolved Hide resolved
values. For instance, the Python static type ``object`` is the set of all
Python objects. The Python static type ``bool`` is the set of values ``{ True,
False }``. The Python static type ``str`` is the set of all Python strings;
more precisely, the set of all Python objects whose runtime type (i.e.
``__class__`` attribute) is either ``str`` or a class that inherits ``str``,
including transitively; i.e. a type with ``str`` in its method resolution
carljm marked this conversation as resolved.
Show resolved Hide resolved
order. Static types can also be specified in other ways. For example,
:ref:`Protocols` specify a static type which is the set of all objects which
share a certain set of attributes and/or methods.

The dynamic type
~~~~~~~~~~~~~~~~

The dynamic type :ref:`Any` represents an unknown static type. It denotes some
carljm marked this conversation as resolved.
Show resolved Hide resolved
unknown set of values.

At first glance, this may appear similar to the static type ``object``, which
carljm marked this conversation as resolved.
Show resolved Hide resolved
represents the set of all Python objects. But it is quite different.
carljm marked this conversation as resolved.
Show resolved Hide resolved

If a value has the static type ``object``, a static type checker should ensure
that all operations on the value are valid for all Python objects, or else emit
a static type error. This allows very few operations! For example, if ``x`` is
typed as ``object``, ``x.foo`` should be a static type error, because not all
carljm marked this conversation as resolved.
Show resolved Hide resolved
Python objects have an attribute ``foo``.

A value typed as :ref:`Any`, on the other hand, should be assumed to have
carljm marked this conversation as resolved.
Show resolved Hide resolved
_some_ specific static type, but _which_ static type is not known. A static
checker should not emit any errors that depend on assuming a particular static
type; a static checker should instead assume that the runtime is responsible
for checking the type of operations on this value, as in a dynamically-typed
language.

The subtype relation
--------------------

A static type ``B`` is a **subtype** of another static type ``A`` if (and only
carljm marked this conversation as resolved.
Show resolved Hide resolved
if) the set of values represented by ``B`` is a subset of the set of values
represented by ``A``. Because the subset relation on sets is transitive and
reflexive, the subtype relation is also transitive (if ``A`` is a subtype of
``B`` and ``B`` is a subtype of ``C``, then ``A`` is a subtype of ``C``) and
reflexive (``A`` is always a subtype of ``A``).
carljm marked this conversation as resolved.
Show resolved Hide resolved

The **supertype** relation is the inverse of subtype: ``A`` is a supertype of
``B`` if (and only if) ``B`` is a subtype of ``A``; or equivalently, if (and
carljm marked this conversation as resolved.
Show resolved Hide resolved
only if) the set of values represented by ``A`` is a superset of the values
represented by ``B``. The supertype relation is also transitive and reflexive.

We also define an **equivalence** relation on static types: the types ``A`` and
``B`` are equivalent (or "the same type") if (and only if) ``A`` is a subtype
of ``B`` and ``B`` is a subtype of ``A``. This means that the set of values
represented by ``A`` is both a superset and a subset of the values represented
by ``B``, meaning ``A`` and ``B`` must represent the same set of values.

We may describe a type ``B`` as "narrower" than a type ``A`` if ``B`` is a
subtype of ``A`` and ``B`` is not equivalent to ``A``.

The consistency relation
------------------------

Since :ref:`Any` (the dynamic type) represents an unknown static type, it does
not represent any known single set of values, and thus it is not in the domain
of the subtype, supertype, or equivalence relations on static types described
above.

We define a **materialization** relation from gradual types to gradual and
carljm marked this conversation as resolved.
Show resolved Hide resolved
static types as follows: if replacing one or more occurrences of ``Any`` in
gradual type ``A`` with some gradual or static type results in the gradual or
static type ``B``, then ``B`` is a materialization of ``A``. For instance,
``tuple[int, str]`` (a static type) and ``tuple[Any, str]`` (a gradual type)
are both materializations of ``tuple[Any, Any]``. If ``B`` is a materialization
of ``A``, we can say that ``B`` is a more precise type than ``A``.

We define a **consistency** relation over all types (the dynamic type, gradual
types, and static types.)

A static type ``A`` is consistent with another static type ``B`` only if they
carljm marked this conversation as resolved.
Show resolved Hide resolved
are the same type (``A`` is equivalent to ``B``.)

The dynamic type ``Any`` is consistent with every type, and every type is
consistent with ``Any``.

A gradual type ``C`` is consistent with a gradual or static type ``D``, and
carljm marked this conversation as resolved.
Show resolved Hide resolved
``D`` is consistent with ``C``, if ``D`` is a materialization of ``C``.

The consistency relation is not transitive. ``tuple[int, int]`` and
carljm marked this conversation as resolved.
Show resolved Hide resolved
``tuple[str, int]`` are both consistent with ``tuple[Any, int]]``, but
carljm marked this conversation as resolved.
Show resolved Hide resolved
``tuple[str, int]`` is not consistent with ``tuple[int, int]``.

The consistency relation is symmetric. If ``A`` is consistent with ``B``, ``B``
carljm marked this conversation as resolved.
Show resolved Hide resolved
is also consistent with ``A``. It is also reflexive: ``A`` is always consistent
with ``A``.

The consistent subtype relation
-------------------------------

Given the consistency relation and the subtyping relation, we define the
carljm marked this conversation as resolved.
Show resolved Hide resolved
**consistent subtype** relation over all types. A type ``A`` is a consistent
carljm marked this conversation as resolved.
Show resolved Hide resolved
subtype of a type ``B`` if there exists a materialization ``A'`` of ``A`` and a
materialization ``B'`` of ``B``, where ``A'`` and ``B'`` are both static types,
and ``A'`` is a subtype of ``B'``.

For example, ``Any`` is a consistent subtype of ``int``, because ``int`` is a
carljm marked this conversation as resolved.
Show resolved Hide resolved
materialization of ``Any``, and ``int`` is a subtype of ``int``. The same
materialization also gives that ``int`` is a consistent subtype of ``Any``.

Consistent subtyping defines assignability
------------------------------------------

Consistent subtyping defines "assignability" for Python. An expression can be
assigned to a variable (including passed as a parameter, and also returned from
a function) if it is a consistent subtype of the variable's type annotation
(respectively, parameter's type annotation or return type annotation).

Union types
-----------

Since accepting a small, limited set of expected types for a single
argument is common, the type system supports union types, created with the
``|`` operator.
Example::

def handle_employees(e: Employee | Sequence[Employee]) -> None:
if isinstance(e, Employee):
e = [e]
...

A type factored by ``T1 | T2 | ...`` is a supertype
of all types ``T1``, ``T2``, etc., so that a value that
is a member of one of these types is acceptable for an argument
annotated by ``T1 | T2 | ...``.

One common case of union types are *optional* types. By default,
``None`` is an invalid value for any type, unless a default value of
``None`` has been provided in the function definition. Examples::

def handle_employee(e: Employee | None) -> None: ...

A past version of this specification allowed type checkers to assume an optional
type when the default value is ``None``, as in this code::

def handle_employee(e: Employee = None): ...

This would have been treated as equivalent to::

def handle_employee(e: Employee | None = None) -> None: ...

This is no longer the recommended behavior. Type checkers should move
towards requiring the optional type to be made explicit.

Support for singleton types in unions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A singleton instance is frequently used to mark some special condition,
in particular in situations where ``None`` is also a valid value
for a variable. Example::

_empty = object()

def func(x=_empty):
if x is _empty: # default argument value
return 0
elif x is None: # argument was provided and it's None
return 1
else:
return x * 2

To allow precise typing in such situations, the user should use
a union type in conjunction with the ``enum.Enum`` class provided
by the standard library, so that type errors can be caught statically::

from enum import Enum

class Empty(Enum):
token = 0
_empty = Empty.token

def func(x: int | None | Empty = _empty) -> int:

boom = x * 42 # This fails type check

if x is _empty:
return 0
elif x is None:
return 1
else: # At this point typechecker knows that x can only have type int
return x * 2

Since the subclasses of ``Enum`` cannot be further subclassed,
the type of variable ``x`` can be statically inferred in all branches
of the above example. The same approach is applicable if more than one
singleton object is needed: one can use an enumeration that has more than
one value::

class Reason(Enum):
timeout = 1
error = 2

def process(response: str | Reason = '') -> str:
if response is Reason.timeout:
return 'TIMEOUT'
elif response is Reason.error:
return 'ERROR'
else:
# response can be only str, all other possible values exhausted
return 'PROCESSED: ' + response
We can say that a type ``A`` is "assignable to" a type ``B`` if ``A`` is a
consistent subtype of ``B``.

References
----------

The concepts presented here are derived from the research literature in gradual
typing. See::

* `Victor Lanvin. A semantic foundation for gradual set-theoretic types. Computer science. Université Paris Cité, 2021. English. NNT : 2021UNIP7159. tel-03853222 <https://theses.hal.science/tel-03853222/file/va_Lanvin_Victor.pdf>`_
110 changes: 101 additions & 9 deletions docs/spec/special-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,13 @@ Special types in annotations
``Any``
-------

A special kind of type is ``Any``. Every type is consistent with
``Any``. It can be considered a type that has all values and all methods.
Note that ``Any`` and builtin type ``object`` are completely different.
``Any`` is the dynamic type. It represents some unknown static type, whose use
carljm marked this conversation as resolved.
Show resolved Hide resolved
should not be checked statically.

When the type of a value is ``object``, the type checker will reject
almost all operations on it, and assigning it to a variable (or using
it as a return value) of a more specialized type is a type error. On
the other hand, when a value has type ``Any``, the type checker will
allow all operations on it, and a value of type ``Any`` can be assigned
to a variable (or used as a return value) of a more constrained type.
Every type is consistent with ``Any``, every type is a consistent subtype of
``Any``, and ``Any`` is a consistent subtype of every type.

See :ref:`type-system-concepts` for more discussion of ``Any``.

A function parameter without an annotation is assumed to be annotated with
``Any``. If a generic type is used without specifying type parameters,
Expand Down Expand Up @@ -218,3 +215,98 @@ subtype of ``type[Base]``::
def new_pro_user(pro_user_class: type[ProUser]):
user = new_user(pro_user_class) # OK
...

.. _`union-types`:

Union types
carljm marked this conversation as resolved.
Show resolved Hide resolved
-----------

Since accepting a small, limited set of expected types for a single
carljm marked this conversation as resolved.
Show resolved Hide resolved
argument is common, the type system supports union types, created with the
``|`` operator.
Example::

def handle_employees(e: Employee | Sequence[Employee]) -> None:
if isinstance(e, Employee):
e = [e]
...

A type factored by ``T1 | T2 | ...`` is a supertype
of all types ``T1``, ``T2``, etc., so that a value that
is a member of one of these types is acceptable for an argument
annotated by ``T1 | T2 | ...``.

One common case of union types are *optional* types. By default,
``None`` is an invalid value for any type, unless a default value of
``None`` has been provided in the function definition. Examples::

def handle_employee(e: Employee | None) -> None: ...

A past version of this specification allowed type checkers to assume an optional
type when the default value is ``None``, as in this code::

def handle_employee(e: Employee = None): ...

This would have been treated as equivalent to::

def handle_employee(e: Employee | None = None) -> None: ...

This is no longer the recommended behavior. Type checkers should move
towards requiring the optional type to be made explicit.

Support for singleton types in unions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A singleton instance is frequently used to mark some special condition,
in particular in situations where ``None`` is also a valid value
for a variable. Example::

_empty = object()

def func(x=_empty):
if x is _empty: # default argument value
return 0
elif x is None: # argument was provided and it's None
return 1
else:
return x * 2

To allow precise typing in such situations, the user should use
a union type in conjunction with the ``enum.Enum`` class provided
by the standard library, so that type errors can be caught statically::

from enum import Enum

class Empty(Enum):
token = 0
_empty = Empty.token

def func(x: int | None | Empty = _empty) -> int:

boom = x * 42 # This fails type check

if x is _empty:
return 0
elif x is None:
return 1
else: # At this point typechecker knows that x can only have type int
return x * 2

Since the subclasses of ``Enum`` cannot be further subclassed,
the type of variable ``x`` can be statically inferred in all branches
of the above example. The same approach is applicable if more than one
singleton object is needed: one can use an enumeration that has more than
one value::

class Reason(Enum):
timeout = 1
error = 2

def process(response: str | Reason = '') -> str:
if response is Reason.timeout:
return 'TIMEOUT'
elif response is Reason.error:
return 'ERROR'
else:
# response can be only str, all other possible values exhausted
return 'PROCESSED: ' + response