GitHub - freshSauce/UniversalProxyScraper: Script made to scrape proxies from multiple websites.

Universal Proxy Scraper BETA v 0.1.6

Need some proxys but don't want to scrap them manually?, just give this script the domain!
Give the project a star!

Report Bug · Request Feature

About The Project

Example of the code imported as module

Output

Hi there! The purpose of this script is to demonstrate the power and the functions that will be implemented in the future module of Universal Proxy Scraper, at first, this will be developed until the v 1.0.0 came out, that version will be the first version deployed as module :)

Modules used

urllib3
re

Just built-in modules! (Python >= 3.0)

Getting Started

Let's get to it!

Script usage

Setting up the list

To set-up the websites you wanna get the proxies from you have to place every single URL you wanna scrape into a list, just as:

http://free-proxy.cz/es/
https://free-proxy-list.net/
http://www.freeproxylists.net/
https://hidemy.name/es/proxy-list/#list

(For a better reference see the test_urls.txt that is in this same repository).

Using via command-line

The usage of command line is pretty simple :D Ex.

path/to/the/script: python main.py -h


██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

usage: main.py [-h] -f FILE [-o] [-q QUANTITY] [-v] [-p]

Command-line option for the Universal Scraper

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  name of the file with the sites
  -o, --output          if used, stores the scraped proxies
  -q QUANTITY, --quantity QUANTITY
                        if used, stores the scraped proxies
  -v, --verify          if used, verify every single proxy and returns the live ones
  -p, --print           if used, prints out the obtained list of proxies

As you may see, there's a lot of options you can use :)

file (required, value needed) : path or name of the file that contains all the webistes you want to scrape.
output (optional, no value needed) : if used, writes a file named "output.txt" with every single scraped proxy.
quantity (optional, value needed, 10 by default) : it declares the quantity of proxies to be scraped.
verify (optional, no value needed) : if used, verifys every single proxy scraped, and returns the list with those that are alive.
print (optional, no value needed) : if used, prints out the list that contains all the proxies.

Example with every single argument

path/to/the/script: python main.py -f test_urls.txt -p -o -v -q 5

██╗   ██╗   ██████╗ ███████╗    █████╗  ██████╗
██║   ██║   ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║   ██║   ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║   ██║   ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗  ███████║╚═╝╚█████╔╝╚██████╔╝
 ╚═════╝ ╚═╝╚═╝╚═╝  ╚══════╝    ╚════╝  ╚═════╝

            Proxy
Universal           Scraper | Your ideal proxy scraper ;)
       by: @freshSauce
           0.1.6

Connection to http://free-proxy.cz/es/ timed out
Proxies obtained !!!
['172.67.181.214:80', '172.67.80.190:80', '45.82.139.34:4443', '188.168.56.82:55443', '150.129.54.111:6667']
Everything is done !!! Wanna get more proxies? (Y[es]/N[o]): n
Have a nice day !!!

Setting up our code

In order to make it work with our own code we have to import it as module, just like:

from main import ProxyScraper

There's no need to import it as 'main', you can change the script's name and import it with the name you gave to the script. Now, once you done that you can use it as you please.

# Storing it on a variable
proxy_scraper = ProxyScraper('test_urls.txt')

proxy_list = proxy_scraper.Proxies()

# Iterating through each proxy

for proxy in ProxyScraper('test_urls.txt').Proxies()
    ...

# Saving the proxies to a file

proxy_scraper = ProxyScraper('test_urls.txt', output=True)

proxy_list = proxy_scraper.Proxies() # This will give you the scraped proxies and save them into a file.

Usage

It's pretty easy-to-use! just make sure to pass the URLs correctly and you're ready to go!

from main import ProxyScraper

proxy_list = ProxyScraper('test_urls.txt').Proxies() # Will save the proxies list on a variable

ProxyScraper('test_urls.txt', output=True).Proxies() # Will save the output into an output file

proxy_list = ProxyScraper('test_urls.txt').Proxies(quantity=15) # Will save 15 of the scraped proxies into a variable (10 by default)

proxy_list = ProxyScraper('test_urls.txt', check=True).Proxies(quantity=15) # Will save 15 of the scraped proxies and will check each one of them

Hope it is useful for you!

Contributing

Wanna contribute to the project? Great! Please follow the next steps in order to submit any feature or bug-fix :) You can also send me your ideas to my Telegram, any submit is greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

Contact

Telegram: - @freshSauce

Project Link: https://github.com/freshSauce/UniversalProxyScraper

Changelog

0.1.6

Added custom exceptions plus minor changes.

0.1.5

Added command-line support (yeah, no 0.1.3 nor 0.1.4, heh)

0.1.2

Added support to the first specific site: spys.one.

Now, I want to say that, if needed, I will create specific scripts for specific sites, this doesn't mean that I won't keep looking for an 'universal' solution, is just that sites like that one are pretty much different from the others.

Module created for that site.

0.1.1

Added support to some sites with JS-based write, such as: 'document.write'.
Added handlers for some exceptions.

0.1.0

Added proxy checker function
Fixed some typos on the script documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
exceptions		exceptions
images		images
specific		specific
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
test_urls.txt		test_urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Proxy Scraper BETA v 0.1.6

About The Project

Example of the code imported as module

Output

Modules used

Getting Started

Script usage

Setting up the list

Using via command-line

Example with every single argument

Setting up our code

Usage

Contributing

License

Contact

Changelog

0.1.6

0.1.5

0.1.2

0.1.1

0.1.0

About

Releases

Sponsor this project

Packages

Languages

License

freshSauce/UniversalProxyScraper

Folders and files

Latest commit

History

Repository files navigation

Universal Proxy Scraper BETA v 0.1.6

About The Project

Example of the code imported as module

Output

Modules used

Getting Started

Script usage

Setting up the list

Using via command-line

Example with every single argument

Setting up our code

Usage

Contributing

License

Contact

Changelog

0.1.6

0.1.5

0.1.2

0.1.1

0.1.0

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages