Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Replace beautiful soup 4 with Python standard library #58

Closed
wants to merge 1 commit into from
Closed

Replace beautiful soup 4 with Python standard library #58

wants to merge 1 commit into from

Conversation

cn-kali-team
Copy link

I need to install lxml support when I install beautiful soup 4. I think a simpler method should be used to parse HTML, so I submitted a branch without beautifulsoup4

@tristanlatr
Copy link
Collaborator

Hi, @cn-kali-team ,

Thanks for the PR, honestly I don't know what to think about parsing the HTML with the std lib...
Beautiful soup can be very useful when dealing with half-broken HTML. Maybe this should be the behaviour when beautiful soup cannot be imported ?

Does this fixes #48 ?

Thanks again!

@tristanlatr tristanlatr changed the base branch from master to without-lxml-fallback March 8, 2022 19:38
@tristanlatr tristanlatr mentioned this pull request Mar 9, 2022
@tristanlatr
Copy link
Collaborator

We provide a way to use python-Wappalyzer without lxml. This should only be used only lxml cannot be installed, the standard library DOM parser will fail on broken HTML, resulting in incomplete results.

It can be used by installing python-Wappalyzer with pip option --no-deps. Then install the required packages manually (pip install requests aiohttp cached_property dom_query pytest).

Thanks @cn-kali-team

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants