Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new "type system concepts" section #1743

Merged
merged 48 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
b323afb
initial draft of new concepts section
carljm May 22, 2024
bd4416e
review feedback
carljm May 22, 2024
3c332bf
prefer 'assignable to' over 'consistent subtype of'
carljm May 22, 2024
6603f43
minor tweaks
carljm May 22, 2024
4e2a1ce
Merge branch 'main' into concepts
carljm May 22, 2024
137ab2a
terms have types
carljm May 22, 2024
24e6ee5
review comments
carljm May 22, 2024
9e9ebb4
use asymmetric example of assignable-to
carljm May 22, 2024
1271558
Any is equivalent to Any
carljm May 22, 2024
2ddd4ad
add a short para on the gradual guarantee
carljm May 22, 2024
971e9f2
conciseness tweak
carljm May 22, 2024
10fda4e
use 'assignable to' in Any description
carljm May 22, 2024
f4110bb
incorporate some of Eric Traut's feedback
carljm May 23, 2024
ac6273e
more on union
carljm May 24, 2024
15c0d56
add an example of bounded gradual type
carljm May 24, 2024
efdbc5f
add section on attributes and methods
carljm May 24, 2024
e4537ac
more feedback and tweaks
carljm May 26, 2024
c3bbb52
Merge branch 'main' into concepts
carljm Jun 1, 2024
6bebab4
review comments
carljm Jun 1, 2024
2900cce
a bit more on gradual unions
carljm Jun 1, 2024
182a058
a few more review comments
carljm Jun 1, 2024
351136e
add terms to glossary
carljm Jun 1, 2024
cd03de8
Update glossary.rst
carljm Jun 2, 2024
07941d7
Update glossary.rst
carljm Jun 2, 2024
c55d40a
review comments on glossary
carljm Jun 2, 2024
b1775b1
re-apply review comment
carljm Jun 2, 2024
1a71a72
Apply suggestions from code review
carljm Jun 2, 2024
c18d9e1
audit remainder of type spec for terminology usage
carljm Jun 2, 2024
cca3bcd
review comments
carljm Jun 2, 2024
3d6b406
some review comments
carljm Jun 2, 2024
5d40036
one more review tweak on protocol wording
carljm Jun 2, 2024
1e0d118
Merge branch 'main' into concepts
carljm Jun 12, 2024
cbe6e23
add equivalent, narrow, and wide to glossary
carljm Jun 12, 2024
8954595
add table of type relations
carljm Jun 12, 2024
efaa7ab
explicitly allow inference on missing function annotations
carljm Jun 12, 2024
12ae9eb
some review comments
carljm Jun 12, 2024
ccfef86
Update docs/spec/callables.rst
carljm Jun 12, 2024
e990bda
more review comments
carljm Jun 12, 2024
045d7c2
link 'assignable' to glossary more often in callables doc
carljm Jun 14, 2024
dd6ffd1
add nominal/structural to concepts and glossary
carljm Jun 14, 2024
673a467
don't use 'compatible' in callables doc
carljm Jun 14, 2024
0f5fba4
equivalence of gradual types and union simplification
carljm Jun 14, 2024
a6b3ab0
simplify description of structural subtyping
carljm Jun 15, 2024
e5943a4
define equivalence of gradual types in glossary
carljm Jun 15, 2024
effcdec
more review comments
carljm Jun 17, 2024
9ebaef8
review comments
carljm Jun 19, 2024
6a3b716
Update glossary.rst
carljm Jun 20, 2024
ab7f9ac
review comments
carljm Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
307 changes: 213 additions & 94 deletions docs/spec/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,97 +3,216 @@
Type system concepts
====================

.. _`union-types`:

Union types
-----------

Since accepting a small, limited set of expected types for a single
argument is common, the type system supports union types, created with the
``|`` operator.
Example::

def handle_employees(e: Employee | Sequence[Employee]) -> None:
if isinstance(e, Employee):
e = [e]
...

A type factored by ``T1 | T2 | ...`` is a supertype
of all types ``T1``, ``T2``, etc., so that a value that
is a member of one of these types is acceptable for an argument
annotated by ``T1 | T2 | ...``.

One common case of union types are *optional* types. By default,
``None`` is an invalid value for any type, unless a default value of
``None`` has been provided in the function definition. Examples::

def handle_employee(e: Employee | None) -> None: ...

A past version of this specification allowed type checkers to assume an optional
type when the default value is ``None``, as in this code::

def handle_employee(e: Employee = None): ...

This would have been treated as equivalent to::

def handle_employee(e: Employee | None = None) -> None: ...

This is no longer the recommended behavior. Type checkers should move
towards requiring the optional type to be made explicit.

Support for singleton types in unions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A singleton instance is frequently used to mark some special condition,
in particular in situations where ``None`` is also a valid value
for a variable. Example::

_empty = object()

def func(x=_empty):
if x is _empty: # default argument value
return 0
elif x is None: # argument was provided and it's None
return 1
else:
return x * 2

To allow precise typing in such situations, the user should use
a union type in conjunction with the ``enum.Enum`` class provided
by the standard library, so that type errors can be caught statically::

from enum import Enum

class Empty(Enum):
token = 0
_empty = Empty.token

def func(x: int | None | Empty = _empty) -> int:

boom = x * 42 # This fails type check

if x is _empty:
return 0
elif x is None:
return 1
else: # At this point typechecker knows that x can only have type int
return x * 2

Since the subclasses of ``Enum`` cannot be further subclassed,
the type of variable ``x`` can be statically inferred in all branches
of the above example. The same approach is applicable if more than one
singleton object is needed: one can use an enumeration that has more than
one value::

class Reason(Enum):
timeout = 1
error = 2

def process(response: str | Reason = '') -> str:
if response is Reason.timeout:
return 'TIMEOUT'
elif response is Reason.error:
return 'ERROR'
else:
# response can be only str, all other possible values exhausted
return 'PROCESSED: ' + response
Static, dynamic, and gradual typing
carljm marked this conversation as resolved.
Show resolved Hide resolved
-----------------------------------

A **statically-typed** programming language runs a type-checker before running
a program. The program is required to be well-typed according to the language's
type system. The type system assigns a type to all expressions in the language
and verifies that their uses obey the typing rules. Normally, a program that is
not well-typed (i.e., one that contains a type error) will not run. Java and
C++ are examples of statically-typed object-oriented languages.

A **dynamically-typed** programming language does not run a type-checker before
running a program. Instead, it will check the types of values before performing
carljm marked this conversation as resolved.
Show resolved Hide resolved
operations on them at runtime. This is not to say that the language is
"untyped". Values at runtime have a type and their uses obey typing rules. Not
every operation will be checked, but certain primitive operations in the
language such as attribute access or arithmetic will be. Python was
carljm marked this conversation as resolved.
Show resolved Hide resolved
historically a dynamically-typed language.

**Gradual typing** is the name for a specific way to combine static and dynamic
carljm marked this conversation as resolved.
Show resolved Hide resolved
typing. Type-annotated Python allows mixing static and dynamic checking at a
carljm marked this conversation as resolved.
Show resolved Hide resolved
fine level of granularity. Individual variables, parameters, and returns can
carljm marked this conversation as resolved.
Show resolved Hide resolved
optionally be given a static type. Data structures such as objects can have a
carljm marked this conversation as resolved.
Show resolved Hide resolved
mix of static and dynamic-only checking. For example, a dictionary could be
carljm marked this conversation as resolved.
Show resolved Hide resolved
annotated to have static checking of the key type but only dynamic checking of
carljm marked this conversation as resolved.
Show resolved Hide resolved
the value type.

In gradual typing, a dynamically typed term is indicated by a special "unknown"
carljm marked this conversation as resolved.
Show resolved Hide resolved
or "dynamic" type. In Python, the unknown type is spelled :ref:`Any`. It
carljm marked this conversation as resolved.
Show resolved Hide resolved
indicates to the static type checker that this term should not be subject to
carljm marked this conversation as resolved.
Show resolved Hide resolved
static checking. The system should not signal a static type error for use of a
term with type :ref:`Any`. Instead, the term's type will be dynamically
carljm marked this conversation as resolved.
Show resolved Hide resolved
checked, according to the Python runtime's usual checks on the types of runtime
values.

This specification describes a gradual type system for Python.

Static, dynamic, and gradual types
----------------------------------

We will refer to types that do not contain :ref:`Any` as a sub-part as **static
carljm marked this conversation as resolved.
Show resolved Hide resolved
carljm marked this conversation as resolved.
Show resolved Hide resolved
types**.

:ref:`Any` itself is the **dynamic type**.
carljm marked this conversation as resolved.
Show resolved Hide resolved

A **gradual type** can be a static type, :ref:`Any` itself, or a type that
contains :ref:`Any` as a sub-part.

Static types
~~~~~~~~~~~~

A static type can intuitively be understood as denoting a set of runtime
carljm marked this conversation as resolved.
Show resolved Hide resolved
values. For instance, the Python static type ``object`` is the set of all
Python objects. The Python static type ``bool`` is the set of values ``{ True,
False }``. The Python static type ``str`` is the set of all Python strings;
more precisely, the set of all Python objects whose runtime type (i.e.
``__class__`` attribute) is either ``str`` or a class that inherits ``str``,
including transitively; i.e. a type with ``str`` in its method resolution
carljm marked this conversation as resolved.
Show resolved Hide resolved
order.

Static types can also be specified in other ways. For example, :ref:`Protocols`
carljm marked this conversation as resolved.
Show resolved Hide resolved
specify a static type which is the set of all objects which share a certain set
of attributes and/or methods.

The dynamic type
~~~~~~~~~~~~~~~~

The dynamic type :ref:`Any` represents an unknown static type. It denotes some
carljm marked this conversation as resolved.
Show resolved Hide resolved
unknown set of values.

At first glance, this may appear similar to the static type ``object``, which
carljm marked this conversation as resolved.
Show resolved Hide resolved
represents the set of all Python objects. But it is quite different.
carljm marked this conversation as resolved.
Show resolved Hide resolved

If a term has the static type ``object``, a static type checker should ensure
carljm marked this conversation as resolved.
Show resolved Hide resolved
that operations on the term are valid for all Python objects, or else emit a
static type error. This allows very few operations! For example, if ``x`` is
typed as ``object``, ``x.foo`` should be a static type error, because not all
carljm marked this conversation as resolved.
Show resolved Hide resolved
Python objects have an attribute ``foo``.

A term typed as :ref:`Any`, on the other hand, should be assumed to have
_some_ specific static type, but _which_ static type is not known. A static
checker should not emit any errors that depend on assuming a particular static
type; a static checker should instead assume that the runtime is responsible
for checking the type of operations on this term, as in a dynamically-typed
language.

The gradual guarantee
~~~~~~~~~~~~~~~~~~~~~

:ref:`Any` allows gradually adding static types to a dynamically-typed program.
For a fully dynamically-typed program, a static checker has the type :ref:`Any`
for all terms, and should emit no errors. Adding more static type annotations
to the program (making the program more statically typed) may result in static
carljm marked this conversation as resolved.
Show resolved Hide resolved
type errors, if the program is not correct or if the static type annotations
aren't able to fully represent the runtime types. Removing static type
annotations (making the program more dynamic) should not result in additional
static type errors. This is often referred to as the **gradual guarantee**.

In Python's type system, we don't take the gradual guarantee as a strict
requirement, but it's a useful guideline.

The subtype relation
--------------------

A static type ``B`` is a **subtype** of another static type ``A`` if and only
carljm marked this conversation as resolved.
Show resolved Hide resolved
if the set of values represented by ``B`` is a subset of the set of values
represented by ``A``. Because the subset relation on sets is transitive and
reflexive, the subtype relation is also transitive (if ``C`` is a subtype of
``B`` and ``B`` is a subtype of ``A``, then ``C`` is a subtype of ``A``) and
reflexive (``A`` is always a subtype of ``A``).
carljm marked this conversation as resolved.
Show resolved Hide resolved

The **supertype** relation is the inverse of subtype: ``A`` is a supertype of
``B`` if and only if ``B`` is a subtype of ``A``; or equivalently, if and only
if the set of values represented by ``A`` is a superset of the values
represented by ``B``. The supertype relation is also transitive and reflexive.

We also define an **equivalence** relation on static types: the types ``A`` and
``B`` are equivalent (or "the same type") if and only if ``A`` is a subtype
of ``B`` and ``B`` is a subtype of ``A``. This means that the set of values
represented by ``A`` is both a superset and a subset of the values represented
by ``B``, meaning ``A`` and ``B`` must represent the same set of values.

We may describe a type ``B`` as "narrower" than a type ``A`` (or as a "strict
carljm marked this conversation as resolved.
Show resolved Hide resolved
carljm marked this conversation as resolved.
Show resolved Hide resolved
subtype" of ``A``) if ``B`` is a subtype of ``A`` and ``B`` is not equivalent
to ``A``.

The consistency relation
------------------------

Since :ref:`Any` represents an unknown static type, it does not represent any
known single set of values, and thus it is not in the domain of the subtype,
carljm marked this conversation as resolved.
Show resolved Hide resolved
supertype, or equivalence relations on static types described above. (We do
find it convenient to say, however, that ``Any`` is both subtype and supertype
carljm marked this conversation as resolved.
Show resolved Hide resolved
of -- that is, equivalent to -- only itself. This can allow us to simplify
redundant multiple occurrences of ``Any`` out of more complex types.)
carljm marked this conversation as resolved.
Show resolved Hide resolved

To relate gradual types more generally, we define a **materialization**
erictraut marked this conversation as resolved.
Show resolved Hide resolved
relation. The intuition for materialization is that it transforms a "more
carljm marked this conversation as resolved.
Show resolved Hide resolved
dynamic" type to a "more static" type. Given a gradual type ``A``, if we
replace zero or more occurrences of ``Any`` in ``A`` with some gradual type
AlexWaygood marked this conversation as resolved.
Show resolved Hide resolved
(which can be different for each occurrence of ``Any``), the resulting gradual
type ``B`` is a materialization of ``A``.

For instance, ``tuple[int, str]`` (a static type) and ``tuple[Any, str]`` (a
gradual type) are both materializations of ``tuple[Any, Any]``. ``tuple[int,
str]`` is also a materialization of ``tuple[Any, str]``.

If ``B`` is a materialization of ``A``, we can say that ``B`` is a "more
carljm marked this conversation as resolved.
Show resolved Hide resolved
static" type than ``A``, and ``A`` is a "more dynamic" type than ``B``.

The materialization relation is both transitive and reflexive, so it defines a
preorder on gradual types.

We also define a **consistency** relation on gradual types.

A static type ``A`` is consistent with another static type ``B`` if and only if
carljm marked this conversation as resolved.
Show resolved Hide resolved
they are the same type (``A`` is equivalent to ``B``.)

A gradual type ``A`` is consistent with a gradual type ``B``, and ``B`` is
consistent with ``A``, if and only if there exists some static type ``C`` which
is a materialization of both ``A`` and ``B``.

The dynamic type ``Any`` is consistent with every type, and every type is
consistent with ``Any``. (This must follow from the above definitions of
carljm marked this conversation as resolved.
Show resolved Hide resolved
materialization and consistency, but is worth stating explicitly.)
carljm marked this conversation as resolved.
Show resolved Hide resolved

The consistency relation is not transitive. ``tuple[int, int]`` is consistent
with ``tuple[Any, int]``, and ``tuple[Any, int]`` is consistent with
``tuple[str, int]``, but ``tuple[int, int]`` is not consistent with
``tuple[str, int]``.

The consistency relation is symmetric. If ``A`` is consistent with ``B``, ``B``
carljm marked this conversation as resolved.
Show resolved Hide resolved
is also consistent with ``A``. It is also reflexive: ``A`` is always consistent
with ``A``.

The assignable-to (or consistent subtyping) relation
carljm marked this conversation as resolved.
Show resolved Hide resolved
----------------------------------------------------

Given the materialization relation and the subtyping relation, we can define the
**consistent subtype** relation over all types. A type ``A`` is a consistent
carljm marked this conversation as resolved.
Show resolved Hide resolved
subtype of a type ``B`` if there exists a materialization ``A'`` of ``A`` and a
materialization ``B'`` of ``B``, where ``A'`` and ``B'`` are both static types,
and ``A'`` is a subtype of ``B'``.

Consistent subtyping defines "assignability" for Python. An expression can be
assigned to a variable (including passed as a parameter or returned from a
carljm marked this conversation as resolved.
Show resolved Hide resolved
function) if it is a consistent subtype of the variable's type annotation
carljm marked this conversation as resolved.
Show resolved Hide resolved
(respectively, parameter's type annotation or return type annotation).

We can say that a type ``A`` is "assignable to" a type ``B`` if ``A`` is a
consistent subtype of ``B``.

In the remainder of this specification, we will usually prefer the term
**assignable to** over "consistent subtype of". The two are synonymous, but
"assignable to" is shorter, and may communicate a clearer intuition to many
readers.

For example, ``Any`` is assignable to ``int``, because ``int`` is a
carljm marked this conversation as resolved.
Show resolved Hide resolved
materialization of ``Any``, and ``int`` is a subtype of ``int``. The same
materialization also gives that ``int`` is assignable to ``Any``.
carljm marked this conversation as resolved.
Show resolved Hide resolved

The assignable-to relation is not generally symmetric, though. If ``B`` is a
carljm marked this conversation as resolved.
Show resolved Hide resolved
subtype of ``A``, then ``tuple[Any, B]`` is assignable to ``tuple[int, A]``,
because ``tuple[Any, B]`` can materialize to ``tuple[int, B]``, which is a
subtype of ``tuple[int, A]``. But ``tuple[int, A]`` is not assignable to
``tuple[Any, B]``.

References
----------

The concepts presented here are derived from the research literature in gradual
typing. See e.g.:

* `Victor Lanvin. A semantic foundation for gradual set-theoretic types. <https://theses.hal.science/tel-03853222/file/va_Lanvin_Victor.pdf>`_ Computer science. Université Paris Cité, 2021. English. NNT : 2021UNIP7159. tel-03853222
Loading