MBIT thesis: Explainable AI in Job Recommendation Systems

Graduation thesis project: Title: Explainable AI in Job Recommendation System

Four research questions:

RQ1. How well do State-Of-The-Art (SOTA) algorithms perform on job recommendations?
RQ2. Which features contribute mainly to the ranking results?
RQ3. How to evaluate the explanations generated by different XAI techniques?
RQ4. How to make explanations digestible for lay users

Data: Kaggle's CareerBuilder 2012

Requirements: important Python libraries

sklearn, pandas, numpy
myfm
interpretml
SHAP
LIME

Apart from public libraries, please import all modules in utils folder for generating recommmendations.

Snapshot:

Final report: link
High-resolution figures in the report: link
Summary analysis spreadsheets used in discussion: link

Guidelines for navigation and reproduce results:

Raw data can be obtained directly from Kaggle's website or from folder data_raw.

4.1: Data pre-processing and Feature Engineering: folder

NOTE: Large datasets have been compressed. Please extract them to the original format (eg: csv./ tsv.) before running the notebook.

4.1.1 Data cleaning
4.1.2 Data augmentation: Negative sampling for interaction data - link

Feature Engineering:

TFIDF for both jobs and user history - link

4.1.4 Feature Engineering: Generate location matching features
4.1.5 Feature Engineering: Transform text features

LDA for jobs - link, LDA for user history - link

4.1.6 Feature Engineering: Discretizing user profile features: link

4.2: Generating potential applications: folder

4.2.1 Potential application generation by random sampling with control on positive label
4.2.2 Potential application generation by unsupervised KNN models (2 variations: knn_lda, knn_tfidf)

4.3: Training ranking models:

You can re-train the models using notebooks or download the pre-trained models.

White-Box, Black-box models: 7 models link, pre-trained models
Factorization Machine models: 4 models. link. You can re-train the models using notebook (~ 5-10 mins) or download pre-trained models. Pickle pre-trained models are large (> 10GB/model), and need to be download separately GoogleDrive link
Explanable Boosting Machine models: 3 EBM models and 3 DPEBM models link with pre-trained models

4.4: Finalize ranking results and evaluation JRS

Generate top 20 recommendation: 20 jobs/ user
Output format: UserID, JobID, Y_pred, Y_prob, rank
(Y_pred: predicted label, Y_prob: probability of prediction, rank: ranking based on probability)

Each model have 2 potential sources of application. Please import all modules in utils folder for generating recommmendations.

White-box & Black-box recsys: link, output
FM recsys: link, output
EBM & DPEBM recsys: link, output
Compare confidence in JRS using KNN ranking data - link vs. random ranking data - link

4.5: Explaining recommendations:

4.5.2 Global explanation by model-specific approach: EBM models link
4.5.3 Global explanation by model-specific approach: DPEBM models link
4.5.4 Global self-explanation by white-box models and XGBoost link
4.5.5 KernelSHAP: Local feature importance link, output
4.5.6 LIME: Localfeature importance link, output

4.6: Evaluation explanation

4.6.1 Modelfidelity rate-Global explanation link
4.6.2 Feature importance fidelity rate - Local explanation link

4.7 Generating human-digestible explanation

4.7.2 Usecase Post-explanation: extract raw term from TF-IDF features link
4.7.3 Usecase Post-explanation: LDA topic contribution visualization: link

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
analysis_summary		analysis_summary
data_interim		data_interim
data_interim_lda		data_interim_lda
data_interim_tfidf		data_interim_tfidf
data_processed		data_processed
data_raw		data_raw
figures_overleaf		figures_overleaf
nb_analysis		nb_analysis
nb_baseline_tabular		nb_baseline_tabular
nb_data_prep		nb_data_prep
nb_make_pyfiles		nb_make_pyfiles
nb_myfm		nb_myfm
nb_ranking_data		nb_ranking_data
nb_recsys_ebm		nb_recsys_ebm
nb_recsys_fm		nb_recsys_fm
nb_recsys_tabular		nb_recsys_tabular
nb_self_explanation		nb_self_explanation
nb_xai_fidelity		nb_xai_fidelity
nb_xai_viz		nb_xai_viz
others		others
output_baseline_tabular		output_baseline_tabular
output_lda_plot		output_lda_plot
output_lime		output_lime
output_myfm		output_myfm
output_shap		output_shap
output_topN_ebm		output_topN_ebm
output_topN_myfm		output_topN_myfm
output_topN_tabular		output_topN_tabular
thesis_paper		thesis_paper
utils		utils
xai_posthoc		xai_posthoc
xai_recsys		xai_recsys
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBIT thesis: Explainable AI in Job Recommendation Systems

Snapshot:

Guidelines for navigation and reproduce results:

4.1: Data pre-processing and Feature Engineering: folder

4.2: Generating potential applications: folder

4.3: Training ranking models:

4.4: Finalize ranking results and evaluation JRS

4.5: Explaining recommendations:

4.6: Evaluation explanation

4.7 Generating human-digestible explanation

About

Releases

Packages

Languages

anhtth16/ut_mbit_thesis

Folders and files

Latest commit

History

Repository files navigation

MBIT thesis: Explainable AI in Job Recommendation Systems

Snapshot:

Guidelines for navigation and reproduce results:

4.1: Data pre-processing and Feature Engineering: folder

4.2: Generating potential applications: folder

4.3: Training ranking models:

4.4: Finalize ranking results and evaluation JRS

4.5: Explaining recommendations:

4.6: Evaluation explanation

4.7 Generating human-digestible explanation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages