This project implements an unsupervised deep learning pipeline using an autoencoder to detect fraudulent transactions in credit card data. It follows a robust ML workflow with modular design, EDA, preprocessing, 5-fold CV training, anomaly detection using reconstruction loss, and SHAP-based interpretability.
creditcard_autoencoder/
│ main.py
│ README.md
│
├─── /data
│ creditcard.csv
│
├─── /src
│ config.py
│ eda.py
│ evaluate.py
│ interpret.py
│ model.py
│ preprocess.py
│ train.py
│
└─── /notebooks
exploration.ipynbFrom the original dataset, the following columns are selected and processed:
- PCA-Transformed Features:
V1toV28 - Original Features:
Time,Amount - Target:
Class(0= Normal,1= Fraudulent)
Transactions are normalized and fraud is detected by analyzing reconstruction error using an unsupervised autoencoder.
TimeandAmountare scaled withStandardScaler.- Data is splot into train/test with stratification on
class. - Only normal transactions are used to train the autoencoder.
- Class imbalance visualization.
- Feature distribution histograms.
- Correlation heatmaps.
- PCA/t-SNE visualization of transaction clusters.
- Trained to minimize reconstruction loss (MSE).
- Tuned using parameters like encoding dimension, batch size, epochs, etc.
- Used 5-fold Stratified K-Fold Cross Validation.
- Trains on Class 0 (normal) data only.
- Final model selected and evaluated on the full test set.
- Predicts reconstruction error for test transactions.
- Flags frauds based on a threshold.
- Metrics: Confusion Matrix, Precision, Recall, F1, AUC
- Uses SHAP to identify which features contribute most to reconstruction error.
- Helps interpret model behavior for both normal and fraud predictions.
pip install -r requirements.txtpython main.pyThe pipeline will:
- Load and explore the data
- Preprocess features and scale them
- Train teh autoencoder with K-Fold Cross Validation
- Evaluate on test set with anomaly scoring
- Generate interpretability plots with SHAP
The dataset used sources from Machine Learning Group - ULB Credit Card Fraud Detection Kaggle dataset website.
- Compare autoencoder with isolation forests, One-Class SVM
- Hyperparameter tuning via
KerasTuner - Deploy API for fraud scoring
- Real-time stream integration with Kafka or Spark
- Web dashboard for fraud alerts
This project is licensed under the MIT License.


