Skip to content

anhtth16/ut_mbit_thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MBIT thesis: Explainable AI in Job Recommendation Systems

Graduation thesis project: Title: Explainable AI in Job Recommendation System

Four research questions:

  • RQ1. How well do State-Of-The-Art (SOTA) algorithms perform on job recommendations?
  • RQ2. Which features contribute mainly to the ranking results?
  • RQ3. How to evaluate the explanations generated by different XAI techniques?
  • RQ4. How to make explanations digestible for lay users

Data: Kaggle's CareerBuilder 2012

Requirements: important Python libraries

Apart from public libraries, please import all modules in utils folder for generating recommmendations.

Snapshot:

  • Final report: link
  • High-resolution figures in the report: link
  • Summary analysis spreadsheets used in discussion: link

Guidelines for navigation and reproduce results:

Raw data can be obtained directly from Kaggle's website or from folder data_raw.

4.1: Data pre-processing and Feature Engineering: folder

NOTE: Large datasets have been compressed. Please extract them to the original format (eg: csv./ tsv.) before running the notebook.

  • 4.1.1 Data cleaning
  • 4.1.2 Data augmentation: Negative sampling for interaction data - link

Feature Engineering:

TFIDF for both jobs and user history - link

  • 4.1.4 Feature Engineering: Generate location matching features
  • 4.1.5 Feature Engineering: Transform text features

LDA for jobs - link, LDA for user history - link

  • 4.1.6 Feature Engineering: Discretizing user profile features: link

4.2: Generating potential applications: folder

  • 4.2.1 Potential application generation by random sampling with control on positive label
  • 4.2.2 Potential application generation by unsupervised KNN models (2 variations: knn_lda, knn_tfidf)

4.3: Training ranking models:

You can re-train the models using notebooks or download the pre-trained models.

  • White-Box, Black-box models: 7 models link, pre-trained models
  • Factorization Machine models: 4 models. link. You can re-train the models using notebook (~ 5-10 mins) or download pre-trained models. Pickle pre-trained models are large (> 10GB/model), and need to be download separately GoogleDrive link
  • Explanable Boosting Machine models: 3 EBM models and 3 DPEBM models link with pre-trained models

4.4: Finalize ranking results and evaluation JRS

Generate top 20 recommendation: 20 jobs/ user
Output format: UserID, JobID, Y_pred, Y_prob, rank
(Y_pred: predicted label, Y_prob: probability of prediction, rank: ranking based on probability)

Each model have 2 potential sources of application. Please import all modules in utils folder for generating recommmendations.

4.5: Explaining recommendations:

  • 4.5.2 Global explanation by model-specific approach: EBM models link
  • 4.5.3 Global explanation by model-specific approach: DPEBM models link
  • 4.5.4 Global self-explanation by white-box models and XGBoost link
  • 4.5.5 KernelSHAP: Local feature importance link, output
  • 4.5.6 LIME: Localfeature importance link, output

4.6: Evaluation explanation

  • 4.6.1 Modelfidelity rate-Global explanation link
  • 4.6.2 Feature importance fidelity rate - Local explanation link

4.7 Generating human-digestible explanation

  • 4.7.2 Usecase Post-explanation: extract raw term from TF-IDF features link
  • 4.7.3  Usecase Post-explanation: LDA topic contribution visualization: link

About

Graduation thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published