I built a web scraper, which used BeautifulSoup to parse data science job listings in twenty different cities across the US. This scraper pulled 5,000 postings for jobs per location. This data was used to find out which factors most directly increased salaries for data scientists. Job listings were categories into either above the mean salary or below. Predicted salaries were developed with a random forests model and separately with support vector machines. L1 regularization was employed. The web scraper and all models were built with Python (BeautifulSoup, Scikit-Learn, NLTK, Pandas).
- Jupyter Notebook: "Project 3 - Web Scraping Indeed Job Listings Jupyter Notebook.ipynb"
- Presentation: "Presentation Web Scraping Indeed Data Science Positions.pdf"
- Executive Summary: "Exec Summary - Web Scraping Indeed Job Listings.pdf"