This tool displays skill requirements of ONet Occupation Classifications used scraped LinkedIn job postings.
- Python 3.8+
- PySpark
- Plotly
- Pandas
Clone the repository and install dependencies:
git clone https://github.com/zachpinto/onet-linkedin-occupation-classifications.git
cd onet-linkedin-occupation-classifications
pip install -r requirements.txt
- Download scraped LinkedIn Job Postings data from Kaggle Note: I did not scrape this data myself. I merely acquired it from Kaggle.
- Download "All Career Clusters" from O*Net Online
- Download "Alternate Titles" and "Sample of Reported Titles" from O*Net Online
Process LinkedIn and Occupation data files:
python src/data/process_linkedin_data.py
python src/data/process_occupation_data.py
To create new dataset using linkedin data and occupation classifications
python src/model/occupation_classification.py
python src/data/split_career_clusters_and_pathways.py
Sample Plotly Express Treemap
python src/visualization/visualize.py
Run streamlit app
streamlit run HOME.py
This project is licensed under the MIT License - see the LICENSE file for details.
- User @asaniczka for providing scraped LinkedIn Data
- O*NET Online for providing detailed occupational requirements
onet-linkedin-occupation-classifications/
│
├── data/
│ ├── external/ # O*Net Online Data
│ ├── interim/ # Intermediate data processing files
│ ├── processed/ # Processed data ready for streamlit app
│ └── raw/ # Raw linkedin data
├── docs/ # Documentation files and project notes
├── reports/ # Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures/ # Generated graphics and figures to be used in reporting
├── src/ # Source code for use in this project
│ ├── __init__.py # Makes src a Python module
│ ├── data/ # Scripts to download or generate data
│ ├── features/ # Scripts to turn raw data into features for modeling
│ ├── models/ # Scripts to train models and then use trained models to make predictions
│ └── visualization/ # Scripts to create exploratory and results oriented visualizations
├── LICENSE # The full license description
├── Makefile # Makefile with commands like `make data` or `make train`
├── README.md # The top-level README for developers using this project
├── requirements.txt # The requirements file for reproducing the analysis environment
├── setup.py # Makes project pip installable (pip install -e .) so src can be imported
├── test_environment.py # Test python environment is set-up correctly
└── tox.ini # tox file with settings for running tox; see tox.readthedocs.io