Skip to content

Latest commit

 

History

History
32 lines (14 loc) · 795 Bytes

README.md

File metadata and controls

32 lines (14 loc) · 795 Bytes

hh_scraping

my crawler for top russian recruiting website https://hh.ru/

Crawler based on scrapy framework https://scrapy.org/

Added bash script for easy using.

You just need :

$sh start.sh

(from project directory)

In scrapy parsers calling 'spiders' , so spider is HH_crawler/spiders/HH_spider.py

This spider crawl pages and scraping data from every vacancy to file by command :

$scrapy crawl hh_spider -o filename.csv

Selectors in spider tuned to scrape main skill tags and salary from every vacancy

Then scraped data calculating and analysing in HH_crawler/statistics.py :

$python3 statistics.py

statistics.py using pandas and some built-in python libs