Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

created test for valid selector that does not increase time #79

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

brandonscholet
Copy link

My room wappybird implement ls your library. I started pulling the updated wappalyzer libraries. They have had issues with valid json, so I started pulling the current release of, but the tally selector is malformed. I talked to the maintainer of soupsieve and they provided a function to tech for valid selectors and skip if not. This replaces the crude try/catch code

I can update your repo to pull the current technologies if you would like. Or feel free to pull from wappybird.

Also, the pip is out of date and incompatible with the updated technologies files

@tristanlatr
Copy link
Collaborator

Thanks @brandonscholet.
Can you provide a test for an invalid selector please ?

@brandonscholet
Copy link
Author

The current release of npm-Wappalyzer has this broken selector
Broken Selector iframe[scr*='//airtable.com/'], a[href*='//airtable.com/][target='_blank']

@brandonscholet
Copy link
Author

brandonscholet commented Jan 17, 2023

This will pull the latest into the technology file. They have had broken selectors for the past two releases

def update_technologies_from_latest():
	print("updating technologies")
	technologies_file = os.path.expanduser('~/.python-Wappalyzer/technologies.json')
	technologies = {}
	 
	#get release page
	latest_release = requests.get('https://api.github.com/repos/wappalyzer/wappalyzer/releases/latest').json()
	#get zip from url
	zip_url = requests.get(latest_release['zipball_url'])
	myzip = ZipFile(io.BytesIO(zip_url.content)) 

	#parse files
	for listed_file in myzip.namelist():
		#get all technology files
		if "src/technologies" in listed_file and ".json" in listed_file:
			#extract file into json
			tech_json_file=myzip.read(listed_file).decode('UTF-8')
			tech_json = json.loads(tech_json_file)
			#add to full json
			technologies = {**technologies, **tech_json}
		if "src/categories.json" in listed_file:
			#extract categories into json
			categories = json.loads(myzip.read(listed_file).decode('UTF-8'))
		#merge into one object
	combined_object = {'categories': categories, 'technologies': technologies}

	#write to file
	with open(technologies_file, 'w', encoding='utf-8') as tfile:
	    tfile.write(json.dumps(combined_object))
	    tfile.flush()
	print("done!\n")

webpage = WebPage.new_from_url("https://example.com", verify=False, timeout=60)
wappalyzer= Wappalyzer.latest(technologies_file=technologies_file)
techs = wappalyzer.analyze_with_versions_and_categories(webpage)

@brandonscholet
Copy link
Author

looking back, the print statement should probably be removed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants