-
Notifications
You must be signed in to change notification settings - Fork 0
Jaegrqualm/practice-html-scraper
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
##README This is a simple HTML scraper written in python. Currently, it has two URLs hardcoded, one for distrowatch.com and the other for the top 100 distros at distrowatch.com. Directories and files are also hardcoded. The aim is to collect the package versions on the pages of the top 100 distros. Output is a comma-delimited .csv file that is recognizable to most any excel-style program. The data it outputs is very poorly formatted, and has a lot of extra cells where they shouldn't be, thanks to irregular formatting on distrowatch's part. To try to amend the fact that we're working with version numbers, there is also a converter from version numbers to ordinal integers.
About
A basic HTML scrapter with hardcoded URLs.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published