Skip to content

Conversation

@bollwyvl
Copy link
Contributor

This follows the advice offered by the DeprecationWarning:

.../pytest_check_links/plugin.py:428: DeprecationWarning: Testing an element's truth value will 
    raise an exception in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.
        if self.parent.check_anchors and self.parsed:

Observed with:

  • html5lib 1.1
  • pytest-check-links 0.10.1
  • pytest 8.1.2

@bollwyvl bollwyvl marked this pull request as ready for review May 18, 2024 13:20
@bollwyvl
Copy link
Contributor Author

One disadvantage to this change is that the html5lib lxml backend (which could maybe be used to get better per-document-parse performance) would not work with this. It's worth exploring this (or perhaps some other tool, e.g. htmlpyever on a separate PR.

@bollwyvl
Copy link
Contributor Author

Sadly, looks like htmlpyever isn't particularly maintained.

html5-parser looks fast (and tests against the html5lib test suite), but doesn't distribute wheels, and has a finicky home-grown build system that isn't going to work in most setups. I couldn't even get it to build for older pythons on windows.

Maybe the play is to extract all the HTML parsing into an interface that could be overloaded by e.g. entry_points name, so --check-links-parser=html5lib(_etree|lxml|dom) and leave a place for a more involved one if someone wanted to do the work. Either way, we'd probably need a proper benchmark setup (probably good anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant