Skip to content

Python crawler for extracting all internal and external links from a website. It also supports deep crawls.

License

Notifications You must be signed in to change notification settings

giovanni-caiazzo/py-url-crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a fork of dsuc

URL Crawler

Python crawler for extracting internal and external links from a URL. It can deep-crawl sites too.

Usage

Instalation

git clone https://github.com/giovanni-caiazzo/py-url-crawler.git cd into directory and create a virtual env with the requirements.txt

Examples
  • Normal Crawl python3 link_crawler.py -d -u http://testsite.com
  • Normal Crawl with base path python3 link_crawler.py -d -u http://testsite.com -b /resources
  • Show External Links python3 link_crawler.py -d -u http://testsite.com -e
  • DeepCrawl python3 link_crawler.py -d -u http://testsite.com

About

Python crawler for extracting all internal and external links from a website. It also supports deep crawls.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%