python-for-travel-notes

Python crawler crawling some websites(mafengwo、tripAdvisor) travel notes and Save to mongodb database（使用python写的爬虫爬取一些旅游网站（如，蚂蜂窝、tripadvisor）中旅游游记，并将保持至mongodb数据库）

Install

Use python 2.7、mongodb、PyMongo and Django-1.11 in project

Copy codes

$ git clone https://github.com/pf12345/python-for-travel-notes.git

Config mongodb

Go to the code folder and enter:

$ cd ./tourism/settings.py

find line 83, modify "DBNAME" to your db name and Create a collection named "tourism" in db

Run server

Go to the code folder and enter:

$ python manage.py runserver

Open your browser and visit http://127.0.0.1:8000

Crawling article

for example:

mafengwo(蚂蜂窝):
to save article http://www.mafengwo.cn/i/5311724.html,
Open your browser and visit http://127.0.0.1:8000/saveMafengwo/5311724
tripAdvisor(猫途鹰):
to save article https://www.tripadvisor.cn/TourismBlog-t5010.html?p=37085,
Open your browser and visit http://127.0.0.1:8000/saveTripAdvisor/5010

Auto crawler(自动抓取)

Go to the code folder and enter:

//蚂蜂窝游记自动爬取(auto crawler mafengwo)
$ cd tourism
$ cd autoCrawler
$ python mafengwo.py

//携程游记自动爬取(auto crawler ctrip)
$ cd tourism
$ cd autoCrawler
$ python ctrip.py

Urls

主页(游记列表): http://127.0.0.1:8000/
详情页: http://127.0.0.1:8000/detail/:id
保存马蜂窝游记：http://127.0.0.1:8000/saveMafengwo/:id
保存tripAdvisor(猫途鹰)游记：http://127.0.0.1:8000/saveTripAdvisor/:id

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
removeFile		removeFile
static		static
templates		templates
tourism		tourism
.gitignore		.gitignore
README.md		README.md
carSchool		carSchool
db.sqlite3		db.sqlite3
manage.py		manage.py
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python-for-travel-notes

Install

Copy codes

Config mongodb

Run server

Crawling article

Auto crawler(自动抓取)

Urls

LINKS

About

Releases

Packages

Languages

pf12345/python-for-travel-notes

Folders and files

Latest commit

History

Repository files navigation

python-for-travel-notes

Install

Copy codes

Config mongodb

Run server

Crawling article

Auto crawler(自动抓取)

Urls

LINKS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages