Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Invalid regex in Wappalyzer/data/technologies.json: Symfony: html #81

Open
arielf opened this issue Jan 18, 2023 · 0 comments
Open

Invalid regex in Wappalyzer/data/technologies.json: Symfony: html #81

arielf opened this issue Jan 18, 2023 · 0 comments

Comments

@arielf
Copy link

arielf commented Jan 18, 2023

Following code work with python3.9 but correctly warns about a bad regex in python3.11:

   from Wappalyzer import Wappalyzer, WebPage
   WPL = Wappalyzer.latest()
   webpage = WebPage.new_from_url(url)
   web_record = WPL.analyze_with_versions_and_categories(webpage)

Trying to run this with python3.11 on " http://yahoo.com" I get:

.../python3.11/site-packages/Wappalyzer/Wappalyzer.py:226: UserWarning: Caught 'unbalanced parenthesis at position 119' compiling regex:

['(?:<div class="sf-toolbar[^>]+?>[^]+<span class="sf-toolbar-value">([\\d.])+|<div id="sfwdt[^"]+" class="[^"]*sf-toolbar)', 'version:\\1']
----------------------------------^^^ invalid?

The 'position 119' seems to a delayed reaction to the core issue.

Indeed it looks like the sub-regex: [^]+ just before is invalid since ^ is a negation/complement for the char-class which is empty here.

The problem is in the data-file:
Wappalyzer/data/technologies.json (towards the end, technologies are alphabetically sorted)

The rule for "Symfony": "html": should be (one char change):

"html": "(?:<div class=\"sf-toolbar[^>]+?>[^<]+<span class=\"sf-toolbar-value\">([\\d.])+|<div id=\"sfwdt[^\"]+\" class=\"[^\"]*sf-toolbar)\\;version:\\1",
------------------------------------------^^^^ the fix

Fixed in this PR:
#80

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant