MBIT thesis: Explainable AI in Job Recommendation Systems

Graduation thesis project: Title: Explainable AI in Job Recommendation System

Four research questions:

RQ1. How well do State-Of-The-Art (SOTA) algorithms perform on job recommendations?
RQ2. Which features contribute mainly to the ranking results?
RQ3. How to evaluate the explanations generated by different XAI techniques?
RQ4. How to make explanations digestible for lay users

Data: Kaggle's CareerBuilder 2012

Requirements: important Python libraries

sklearn, pandas, numpy
myfm
interpretml
SHAP
LIME

Apart from public libraries, please import all modules in utils folder for generating recommmendations.

Snapshot:

Final report: link
High-resolution figures in the report: link
Summary analysis spreadsheets used in discussion: link

Guidelines for navigation and reproduce results:

Raw data can be obtained directly from Kaggle's website or from folder data_raw.

4.1: Data pre-processing and Feature Engineering: folder

NOTE: Large datasets have been compressed. Please extract them to the original format (eg: csv./ tsv.) before running the notebook.

4.1.1 Data cleaning
4.1.2 Data augmentation: Negative sampling for interaction data - link

Feature Engineering:

TFIDF for both jobs and user history - link

4.1.4 Feature Engineering: Generate location matching features
4.1.5 Feature Engineering: Transform text features

LDA for jobs - link, LDA for user history - link

4.1.6 Feature Engineering: Discretizing user profile features: link

4.2: Generating potential applications: folder

4.2.1 Potential application generation by random sampling with control on positive label
4.2.2 Potential application generation by unsupervised KNN models (2 variations: knn_lda, knn_tfidf)

4.3: Training ranking models:

You can re-train the models using notebooks or download the pre-trained models.

White-Box, Black-box models: 7 models link, pre-trained models
Factorization Machine models: 4 models. link. You can re-train the models using notebook (~ 5-10 mins) or download pre-trained models. Pickle pre-trained models are large (> 10GB/model), and need to be download separately GoogleDrive link
Explanable Boosting Machine models: 3 EBM models and 3 DPEBM models link with pre-trained models

4.4: Finalize ranking results and evaluation JRS

Generate top 20 recommendation: 20 jobs/ user
Output format: UserID, JobID, Y_pred, Y_prob, rank
(Y_pred: predicted label, Y_prob: probability of prediction, rank: ranking based on probability)

Each model have 2 potential sources of application. Please import all modules in utils folder for generating recommmendations.

White-box & Black-box recsys: link, output
FM recsys: link, output
EBM & DPEBM recsys: link, output
Compare confidence in JRS using KNN ranking data - link vs. random ranking data - link

4.5: Explaining recommendations:

4.5.2 Global explanation by model-specific approach: EBM models link
4.5.3 Global explanation by model-specific approach: DPEBM models link
4.5.4 Global self-explanation by white-box models and XGBoost link
4.5.5 KernelSHAP: Local feature importance link, output
4.5.6 LIME: Localfeature importance link, output

4.6: Evaluation explanation

4.6.1 Modelfidelity rate-Global explanation link
4.6.2 Feature importance fidelity rate - Local explanation link

4.7 Generating human-digestible explanation

4.7.2 Usecase Post-explanation: extract raw term from TF-IDF features link
4.7.3 Usecase Post-explanation: LDA topic contribution visualization: link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MBIT thesis: Explainable AI in Job Recommendation Systems

Snapshot:

Guidelines for navigation and reproduce results:

4.1: Data pre-processing and Feature Engineering: folder

4.2: Generating potential applications: folder

4.3: Training ranking models:

4.4: Finalize ranking results and evaluation JRS

4.5: Explaining recommendations:

4.6: Evaluation explanation

4.7 Generating human-digestible explanation

Files

README.md

Latest commit

History

README.md

File metadata and controls

MBIT thesis: Explainable AI in Job Recommendation Systems

Snapshot:

Guidelines for navigation and reproduce results:

4.1: Data pre-processing and Feature Engineering: folder

4.2: Generating potential applications: folder

4.3: Training ranking models:

4.4: Finalize ranking results and evaluation JRS

4.5: Explaining recommendations:

4.6: Evaluation explanation

4.7 Generating human-digestible explanation