Skip to content

Jaegrqualm/practice-html-scraper

Repository files navigation

##README
 
This is a simple HTML scraper written in python.
Currently, it has two URLs hardcoded, one for distrowatch.com and the other for the top 100 distros at distrowatch.com.
Directories and files are also hardcoded.

The aim is to collect the package versions on the pages of the top 100 distros.
Output is a comma-delimited .csv file that is recognizable to most any excel-style program.
The data it outputs is very poorly formatted, and has a lot of extra cells where they shouldn't be, thanks to irregular formatting on distrowatch's part.

To try to amend the fact that we're working with version numbers, there is also a converter from version numbers to ordinal integers.

About

A basic HTML scrapter with hardcoded URLs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages