-
Notifications
You must be signed in to change notification settings - Fork 5
Smart string usage
Smart string is a private str
subclass documented in
return types
of XPath evaluation result. Directly quoting from lxml documentation:
XPath string results are 'smart' in that they provide a
getparent()
method that knows their origin:
- for attribute values,
result.getparent()
returns the Element that carries them. An example is//foo/@attribute
, where the parent would be a foo Element.- for the
text()
function (as in//text()
), it returns the Element that contains the text or tail that was returned.
The actual class is named
_ElementUnicodeResult
in source code. Although for Python 2.x and PyPy this str
subclass
represents some other concrete classes, we can forget them as far as
type checking is concerned.
Following are breaking changes since 2023.2.11
.
Historically the class is named SmartStr
in annotation
package, which is more user friendly but need to be
imported manually for typing. Being underused, it is
decided to break compatibility and revert to concrete
class name (_ElementUnicodeResult
) instead.
Because getparent()
method needs to known original
element type, smart string is modified as a Generic
class,
containing the element type as subscript, as in
_ElementUnicodeResult[_Element]
.
Version | Usage |
---|---|
2023.02.11 or earlier |
SmartStr |
Afterwards | _ElementUnicodeResult[_Element] |
There are 2 occasions where this class is primarily useful. See further down for examples of both types of usage.
-
XPath
selection result -
HtmlElement.text_content()
result (which usesXPath
internally)
However this class is almost never used directly in type annotation,
since XPath result is too versatile to be annotated (str
, float
,
bool
, list of them, as well as list of _Element
and namespace tuples).
Users are therefore expected to narrow down XPath selection result themselves. First example code below shows how to handle smart strings in selection result.
from lxml import etree
from typing import TypeGuard # (or from typing_extensions)
def is_smart_str(s: str) -> TypeGuard[etree._ElementUnicodeResult[etree._Element]]:
return hasattr(s, 'getparent')
tree = etree.parse(<...some html file...>)
for result in tree.xpath('//div/span/text()'):
if is_smart_str(result):
# At this point,
# result -> _ElementUnicodeResult[_Element],
# parent -> Optional[_Element]
parent = result.getparent()
if parent is not None:
print(parent.tag) # 'span'
from lxml import html
tree = html.parse('index.html') # _ElementTree[HtmlElement]
form = tree.getroot().forms[0] # FormElement
form_content = form.text_content() # _ElementUnicodeResult[FormElement]
# parent is identified as Optional[FormElement] during type
# check; but in runtime it is always None due to implementation detail
parent = form_content.getparent()