-
Notifications
You must be signed in to change notification settings - Fork 5
Using specialised class directly
To migitate type error, either:
- Quote all subscripted annotations as string
- Use
from __future__ import annotations
Starting from Python 3.7, classes support subscript notation
(PEP 560 — Core support for typing module and generic types) to denote, e.g.,
a collection of some other types. Python native data types
(list
, dict
etc) supported this since 3.9, such as list[int]
,
dict[str, str]
etc.
It is used extensively in types-lxml
for 'specialising' classes
in lxml. Sometimes, classes behave differently depending on
initialisation arguments. However, when writing functions or methods
that make use of such classes as argument, it is unavoidable to
encounter errors like the following, if not taking precaution:
TypeError: 'type' object is not subscriptable
For example, Element Tree can contain different kinds of elements:
- Normal XML elements (
lxml.etree._Element
and friends) - HTML elements (
lxml.html.HtmlElement
and friends) - Objectified Element (
lxml.objectify.ObjectifiedElement
)
This difference depends on input parser
argument of certain
lxml functions that produce Element Tree, such as:
-
lxml.etree.ElementTree()
factory function -
lxml.etree.parse()
module function
And in turn, this affects the type of .parser
property and type of
root element inside the tree.
However, lxml runtime will probably never support subscripted usage due to its nature; lxml is implemented in Cython, and maintains compatibility with very ancient Python versions. Such situation will lead to conflict when aforementioned classes are used and annotated directly in, say, function arguments, as illustrated in following example:
from lxml.etree import _Element, _ElementTree, XMLParser
def get_parser(tree: _ElementTree[_Element]) -> XMLParser[_Element]:
...
Usage of above code would lead to TypeError
message mentioned before,
because runtime lxml
classes actually don't support subscripts
during runtime.
Similar problem has already been asked and answered on StackOverflow, but that deals with native data types. Our sitatuion is a little bit different (and simpler).
Modifying above code example:
from lxml.etree import _Element, _ElementTree, XMLParser
def get_parser(tree: "_ElementTree[_Element]") -> "XMLParser[_Element]":
...
This allows Python interpreter to skip evaluating the annotation, but static type checkers can still understand.
This is established in (PEP 563 — Postponed Evaluation of Annotations). It is effectively the same as automatically applying method (1) to all annotations in the same file.
Here are the classes making use of subscripts in annotation:
Class | Description |
---|---|
lxml.etree._ElementTree |
As described above. Its subscript denotes the type of element contained in ElementTree — more specifically, the root node of ElementTree. |
lxml.etree.XMLParser , lxml.etree.HTMLParser
|
Document / content parsers. They are the main factor deciding which kind of elements are produced. More on it in next document section. |
lxml.builder.ElementMaker |
Element factory function. Similar to parsers, the subscript denotes what kind of elements would be produced. |
lxml.sax.ElementTreeContentHandler |
lxml adapter for official python sax event handler. Its subscript denotes the kind of Element Tree that would be produced in .etree property. |
lxml.etree._IDDict |
Dictionary-like class that contains mapping of XML:ID attribute name to corresponding element. The subscript indicates type of element. |
lxml.etree._ElementUnicodeResult |
One of the possible output (string) when evaluating XPath expressions, as described in official document. This string subclass contains .getparent() method, allowing to access the original element that produced the string. Its subscript represents type of original element. |
lxml.etree.ParserTarget |
(Temporarily abandoned in types-lxml ) Custom parser target support
|
Above table mentions lxml.etree.XMLParser
and lxml.etree.HTMLParser
do use subscript to denote type of element it is supposed to produce.
But that doesn't necessarily apply to all subclasses. Parsers in lxml.html
submodule (html.HTMLParser
and html.XHTMLParser
) have no subscripts.
html
submodule parsers are designed to always produce lxml.html.HtmlElement
and friends. This production can be changed with .set_element_class_lookup()
method; but such change degenerates the parser into common XML parser, and
usage of html
submodule parsers becomes moot.
As mentioned before, parser.set_element_class_lookup()
method allows producing
different kind of element. This is actually done in, say, ObjectifiedElement
parser. But due to limitation of python typing feature, annotation can't be
changed automatically to reflect such situation. It has to be manually modified:
from typing import TYPE_CHECKING, cast
from lxml.etree import XMLParser
from lxml.objectify import ObjectifiedElement, ObjectifyElementClassLookup
p = XMLParser() # type is XMLParser[_Element]
if TYPE_CHECKING:
p = cast('XMLParser[ObjectifiedElement]', p)
else:
p.set_element_class_lookup(ObjectifyElementClassLookup())