Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 803 Bytes

README.md

File metadata and controls

25 lines (21 loc) · 803 Bytes

scrapy-site-downloader

Overview

Template project for downloading a site with Scrapy. Crawls, scrapes, and saves HTML files from a given website, domain, and URL filters.

Steps to run

  1. Clone this repository and cd into it
  2. Install the dependencies using the following command:
    pip install -r requirements.txt
    
  3. Configure the crawler/spiders/site.py file for the site you want to crawl
  4. Start the downloader using the following command (be sure to run this from the repository root!):
    scrapy crawl site
    
  5. Refer to the Scrapy documentation for best practices and other configuration options
  6. When the crawler finishes, the HTML files will be located in the /html directory