Skip to content

Using specialised class directly

Abel Cheung edited this page Apr 21, 2023 · 3 revisions

TL;DR

To migitate type error, either:

  1. Quote all subscripted annotations as string
  2. Use from __future__ import annotations

Description of problem, part 1

Starting from Python 3.7, classes support subscript notation (PEP 560 — Core support for typing module and generic types) to denote, e.g., a collection of some other types. Python native data types (list, dict etc) supported this since 3.9, such as list[int], dict[str, str] etc.

It is used extensively in types-lxml for 'specialising' classes in lxml. Sometimes, classes behave differently depending on initialisation arguments. However, when writing functions or methods that make use of such classes as argument, it is unavoidable to encounter errors like the following, if not taking precaution:

TypeError: 'type' object is not subscriptable

Example

For example, Element Tree can contain different kinds of elements:

  • Normal XML elements (lxml.etree._Element and friends)
  • HTML elements (lxml.html.HtmlElement and friends)
  • Objectified Element (lxml.objectify.ObjectifiedElement)

This difference depends on input parser argument of certain lxml functions that produce Element Tree, such as:

  • lxml.etree.ElementTree() factory function
  • lxml.etree.parse() module function

And in turn, this affects the type of .parser property and type of root element inside the tree.

Description of problem, part 2

However, lxml runtime will probably never support subscripted usage due to its nature; lxml is implemented in Cython, and maintains compatibility with very ancient Python versions. Such situation will lead to conflict when aforementioned classes are used and annotated directly in, say, function arguments, as illustrated in following example:

from lxml.etree import _Element, _ElementTree, XMLParser

def get_parser(tree: _ElementTree[_Element]) -> XMLParser[_Element]:
    ...

Usage of above code would lead to TypeError message mentioned before, because runtime lxml classes actually don't support subscripts during runtime.

The fix

Similar problem has already been asked and answered on StackOverflow, but that deals with native data types. Our sitatuion is a little bit different (and simpler).

1. Quote subscripted annotation as string

Modifying above code example:

from lxml.etree import _Element, _ElementTree, XMLParser

def get_parser(tree: "_ElementTree[_Element]") -> "XMLParser[_Element]":
    ...

This allows Python interpreter to skip evaluating the annotation, but static type checkers can still understand.

2. Use from __future__ import annotations

This is established in (PEP 563 — Postponed Evaluation of Annotations). It is effectively the same as automatically applying method (1) to all annotations in the same file.

Scope of subscript usage

Here are the classes making use of subscripts in annotation:

Class Description
lxml.etree._ElementTree As described above. Its subscript denotes the type of element contained in ElementTree — more specifically, the root node of ElementTree.
lxml.etree.XMLParser, lxml.etree.HTMLParser Document / content parsers. They are the main factor deciding which kind of elements are produced. More on it in next document section.
lxml.builder.ElementMaker Element factory function. Similar to parsers, the subscript denotes what kind of elements would be produced.
lxml.sax.ElementTreeContentHandler lxml adapter for official python sax event handler. Its subscript denotes the kind of Element Tree that would be produced in .etree property.
lxml.etree._IDDict Dictionary-like class that contains mapping of XML:ID attribute name to corresponding element. The subscript indicates type of element.
lxml.etree._ElementUnicodeResult One of the possible output (string) when evaluating XPath expressions, as described in official document. This string subclass contains .getparent() method, allowing to access the original element that produced the string. Its subscript represents type of original element.
lxml.etree.ParserTarget (Temporarily abandoned in types-lxml) Custom parser target support

Caveat

Not all parsers use subscripts

Above table mentions lxml.etree.XMLParser and lxml.etree.HTMLParser do use subscript to denote type of element it is supposed to produce. But that doesn't necessarily apply to all subclasses. Parsers in lxml.html submodule (html.HTMLParser and html.XHTMLParser) have no subscripts.

html submodule parsers are designed to always produce lxml.html.HtmlElement and friends. This production can be changed with .set_element_class_lookup() method; but such change degenerates the parser into common XML parser, and usage of html submodule parsers becomes moot.

No automatic change of subscript

As mentioned before, parser.set_element_class_lookup() method allows producing different kind of element. This is actually done in, say, ObjectifiedElement parser. But due to limitation of python typing feature, annotation can't be changed automatically to reflect such situation. It has to be manually modified:

from typing import TYPE_CHECKING, cast
from lxml.etree import XMLParser
from lxml.objectify import ObjectifiedElement, ObjectifyElementClassLookup

p = XMLParser()  # type is XMLParser[_Element]
if TYPE_CHECKING:
    p = cast('XMLParser[ObjectifiedElement]', p)
else:
    p.set_element_class_lookup(ObjectifyElementClassLookup())