Scraping doctoralia.com using scrapy

Installing scrapy

This code is written on Python 3.6 and uses scrapy 1.4.0.

You should install scrapy with pip: pip install --user scrapy

Note: on my system, python points to python3.6 and pip points to pip3, you should check yours before running.

This spider starts from http://www.doctoralia.com.br/medicos, and goes to the next pages until reach 500 occurrences.

To run this spider, open the terminal on this repository, then type:

cd doctoralia
scrapy crawl doctoralia

Your csv file with the scrapped data should be on doctoralia.com/doctoralia/doctoralia_data.csv

This spider was designed for getting only the 500 first occurences, thus it will not scrap the entire web site;
I have only attached instructions for running this spider localy but we could also use scrapinghub's cloud, which has excellent tools for debugging, and data checking;
The csv structure should be: <Doctor's Name>, <Doctor's Speciality>, <Doctor's Locations> where fields are:

Field	Content Type
Doctor's Name	String
Doctor's Speciality	List of strings with all specialities
Doctor's Locations	List of strings with all locations

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
doctoralia		doctoralia
.gitignore		.gitignore
README.md		README.md
Scrapy-Logo-Horizontal.png		Scrapy-Logo-Horizontal.png
doctoralia_logo.png		doctoralia_logo.png