Silver Surfer of Qartvelian Web Media
pip install -r requirements.txt
navigate to the QartNewSurfer toolset directory:
cd qartnewsurfer
and run the desired surfer:
scrapy crawl <surf_board>
surf_board | surfer |
---|---|
onge | on.ge |
full shell command example with attributes:
scrapy crawl onge -a category=1 -a start_page=2 -a max_page=5
attribute | description | example |
---|---|---|
category | post category in int (check the categories for the surfboard) | category=2 |
start_page | page index to start scrapping from (default=0) | start_page=50 |
max_page | page index to end scrapping at (default is the last "next" page for the platform) | max_page=110 |
scrapy crawl onge -O pages.json
That will generate an quotes.json file containing all scraped items, serialized in JSON.
When appending to a file, consider using a different serialization format, such as JSON Lines:
scrapy crawl onge -o pages.jl
scrapy crawl onge -O quotes-humor.json -a category=1 -a start_page=2 -a max_page=5