Skip to content

This project implements an unsupervised deep learning pipeline using an autoencoder to detect fraudulent transactions in credit card data. It follows a robust ML workflow with modular design, EDA, preprocessing, 5-fold CV training, anomaly detection using reconstruction loss, and SHAP-based interpretability.

Notifications You must be signed in to change notification settings

varnan6/Autoencoder-Based-Credit-Card-Fraud-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autoencoder-Based Credit Card Fraud Detection


License: MIT Python scikit-learn Last Commit Jupyter Notebook

This project implements an unsupervised deep learning pipeline using an autoencoder to detect fraudulent transactions in credit card data. It follows a robust ML workflow with modular design, EDA, preprocessing, 5-fold CV training, anomaly detection using reconstruction loss, and SHAP-based interpretability.

Project Structure


creditcard_autoencoder/
│   main.py
│   README.md
│
├─── /data
│       creditcard.csv
│
├─── /src
│       config.py
│       eda.py
│       evaluate.py
│       interpret.py
│       model.py
│       preprocess.py
│       train.py
│
└─── /notebooks
        exploration.ipynb

Features Used

From the original dataset, the following columns are selected and processed:

  • PCA-Transformed Features: V1 to V28
  • Original Features: Time, Amount
  • Target: Class (0 = Normal, 1 = Fraudulent)

Transactions are normalized and fraud is detected by analyzing reconstruction error using an unsupervised autoencoder.


Workflow

1. Data Preprocessing (preprocess.py)

  • Time and Amount are scaled with StandardScaler.
  • Data is splot into train/test with stratification on class.
  • Only normal transactions are used to train the autoencoder.

2. Exploratory Data Analysis (eda.py)

  • Class imbalance visualization.
  • Feature distribution histograms.
  • Correlation heatmaps.
  • PCA/t-SNE visualization of transaction clusters.

3. Autoencoder Model (model.py)

  • Trained to minimize reconstruction loss (MSE).
  • Tuned using parameters like encoding dimension, batch size, epochs, etc.

4. Training with Cross-Validation (train.py)

  • Used 5-fold Stratified K-Fold Cross Validation.
  • Trains on Class 0 (normal) data only.
  • Final model selected and evaluated on the full test set.

5. Evaluation (evaluate.py)

  • Predicts reconstruction error for test transactions.
  • Flags frauds based on a threshold.
  • Metrics: Confusion Matrix, Precision, Recall, F1, AUC

6. Explainability (interpret.py)

  • Uses SHAP to identify which features contribute most to reconstruction error.
  • Helps interpret model behavior for both normal and fraud predictions.

Requirements

Installing dependencies:

pip install -r requirements.txt

Running the project

python main.py

The pipeline will:

  1. Load and explore the data
  2. Preprocess features and scale them
  3. Train teh autoencoder with K-Fold Cross Validation
  4. Evaluate on test set with anomaly scoring
  5. Generate interpretability plots with SHAP

Current Output

Results:

Best parameters:


Dataset

The dataset used sources from Machine Learning Group - ULB Credit Card Fraud Detection Kaggle dataset website.


Future improvements

  • Compare autoencoder with isolation forests, One-Class SVM
  • Hyperparameter tuning via KerasTuner
  • Deploy API for fraud scoring
  • Real-time stream integration with Kafka or Spark
  • Web dashboard for fraud alerts

License

This project is licensed under the MIT License.

About

This project implements an unsupervised deep learning pipeline using an autoencoder to detect fraudulent transactions in credit card data. It follows a robust ML workflow with modular design, EDA, preprocessing, 5-fold CV training, anomaly detection using reconstruction loss, and SHAP-based interpretability.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published