This is a Jandan Spider
.
Stay Simple, Stay Naive.
Just for studying. Please don't consume Jandan
too much network traffic.
- Request by selecting
User Agetn
fromUser Agent List
randomly - Update
HTTP Proxy IP
by multiple process and check the status ofIP
automatically - Analyze the original picture url and download the popular picture into the
ooxx
directory - Save all items into
data.dat
- Python 2.7
- Scrapy
- Multiprocessing
- Proxy by mapleray
- Windows: Double click
run.bat
- Linux or OS X: Run command
scrapy crawl JandanPicture
Haipz @haipz.com