Python crawler crawling some websites(mafengwo、tripAdvisor) travel notes and Save to mongodb database(使用python写的爬虫爬取一些旅游网站(如,蚂蜂窝、tripadvisor)中旅游游记,并将保持至mongodb数据库)
Use python 2.7、mongodb、PyMongo and Django-1.11 in project
$ git clone https://github.com/pf12345/python-for-travel-notes.git
Go to the code folder and enter:
$ cd ./tourism/settings.py
find line 83, modify "DBNAME" to your db name and Create a collection named "tourism" in db
Go to the code folder and enter:
$ python manage.py runserver
Open your browser and visit http://127.0.0.1:8000
for example:
-
mafengwo(蚂蜂窝)
:
to save article http://www.mafengwo.cn/i/5311724.html,
Open your browser and visit http://127.0.0.1:8000/saveMafengwo/5311724 -
tripAdvisor(猫途鹰)
:
to save article https://www.tripadvisor.cn/TourismBlog-t5010.html?p=37085,
Open your browser and visit http://127.0.0.1:8000/saveTripAdvisor/5010
Go to the code folder and enter:
//蚂蜂窝游记自动爬取(auto crawler mafengwo)
$ cd tourism
$ cd autoCrawler
$ python mafengwo.py
//携程游记自动爬取(auto crawler ctrip)
$ cd tourism
$ cd autoCrawler
$ python ctrip.py
- 主页(游记列表): http://127.0.0.1:8000/
- 详情页: http://127.0.0.1:8000/detail/:id
- 保存马蜂窝游记:http://127.0.0.1:8000/saveMafengwo/:id
- 保存tripAdvisor(猫途鹰)游记:http://127.0.0.1:8000/saveTripAdvisor/:id