Skip to content

Commit

Permalink
注意使用多页爬取的方法.Request获取请求
Browse files Browse the repository at this point in the history
  • Loading branch information
JLUVicent committed Sep 16, 2021
1 parent 12a44a0 commit 1f87d0d
Showing 1 changed file with 21 additions and 2 deletions.
23 changes: 21 additions & 2 deletions scrapy_dangdang_040/scrapy_dangdang_040/spiders/dang.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@

class DangSpider(scrapy.Spider):
name = 'dang'
allowed_domains = ['http://category.dangdang.com/cp01.01.02.00.00.00.html']

# 如果是多页下载必须调整allowed_domains的范围,一般情况下只写域名
allowed_domains = ['category.dangdang.com']
start_urls = ['http://category.dangdang.com/cp01.01.02.00.00.00.html']

base_url = 'http://category.dangdang.com/pg'
page = 1

def parse(self, response):
print("-----------------------------")

Expand Down Expand Up @@ -36,4 +41,18 @@ def parse(self, response):

# 获取一个book就将book交给pipelines
yield book
pass


# 多页爬取
# 每一页爬取的业务逻辑都是一样的,只需要将执行页请求再次调用parse方法即可
# http://category.dangdang.com/pg2-cp01.01.02.00.00.00.html
# http://category.dangdang.com/pg3-cp01.01.02.00.00.00.html

if self.page < 100:
self.page = self.page+1
url = self.base_url+str(self.page)+'-cp01.01.02.00.00.00.html'

# 如何调用parse方法
# scrapy.Request就是scrapy的get请求
# url就是请求地址 callback是要执行的函数,不需要加圆括号
yield scrapy.Request(url=url, callback=self.parse)

0 comments on commit 1f87d0d

Please sign in to comment.