Autoencoder-Based Credit Card Fraud Detection

This project implements an unsupervised deep learning pipeline using an autoencoder to detect fraudulent transactions in credit card data. It follows a robust ML workflow with modular design, EDA, preprocessing, 5-fold CV training, anomaly detection using reconstruction loss, and SHAP-based interpretability.

Project Structure

creditcard_autoencoder/
│   main.py
│   README.md
│
├─── /data
│       creditcard.csv
│
├─── /src
│       config.py
│       eda.py
│       evaluate.py
│       interpret.py
│       model.py
│       preprocess.py
│       train.py
│
└─── /notebooks
        exploration.ipynb

Features Used

From the original dataset, the following columns are selected and processed:

PCA-Transformed Features: V1 to V28
Original Features: Time, Amount
Target: Class (0 = Normal, 1 = Fraudulent)

Transactions are normalized and fraud is detected by analyzing reconstruction error using an unsupervised autoencoder.

Workflow

1. Data Preprocessing (`preprocess.py`)

Time and Amount are scaled with StandardScaler.
Data is splot into train/test with stratification on class.
Only normal transactions are used to train the autoencoder.

2. Exploratory Data Analysis (`eda.py`)

Class imbalance visualization.
Feature distribution histograms.
Correlation heatmaps.
PCA/t-SNE visualization of transaction clusters.

3. Autoencoder Model (`model.py`)

Trained to minimize reconstruction loss (MSE).
Tuned using parameters like encoding dimension, batch size, epochs, etc.

4. Training with Cross-Validation (`train.py`)

Used 5-fold Stratified K-Fold Cross Validation.
Trains on Class 0 (normal) data only.
Final model selected and evaluated on the full test set.

5. Evaluation (`evaluate.py`)

Predicts reconstruction error for test transactions.
Flags frauds based on a threshold.
Metrics: Confusion Matrix, Precision, Recall, F1, AUC

6. Explainability (`interpret.py`)

Uses SHAP to identify which features contribute most to reconstruction error.
Helps interpret model behavior for both normal and fraud predictions.

Requirements

Installing dependencies:

pip install -r requirements.txt

Running the project

python main.py

The pipeline will:

Load and explore the data
Preprocess features and scale them
Train teh autoencoder with K-Fold Cross Validation
Evaluate on test set with anomaly scoring
Generate interpretability plots with SHAP

Current Output

Results:

Best parameters:

Dataset

The dataset used sources from Machine Learning Group - ULB Credit Card Fraud Detection Kaggle dataset website.

Future improvements

Compare autoencoder with isolation forests, One-Class SVM
Hyperparameter tuning via KerasTuner
Deploy API for fraud scoring
Real-time stream integration with Kafka or Spark
Web dashboard for fraud alerts

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autoencoder-Based Credit Card Fraud Detection

Project Structure

Features Used

Workflow

1. Data Preprocessing (`preprocess.py`)

2. Exploratory Data Analysis (`eda.py`)

3. Autoencoder Model (`model.py`)

4. Training with Cross-Validation (`train.py`)

5. Evaluation (`evaluate.py`)

6. Explainability (`interpret.py`)

Requirements

Installing dependencies:

Running the project

Current Output

Dataset

Future improvements

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
notebook		notebook
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

varnan6/Autoencoder-Based-Credit-Card-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Autoencoder-Based Credit Card Fraud Detection

Project Structure

Features Used

Workflow

1. Data Preprocessing (preprocess.py)

2. Exploratory Data Analysis (eda.py)

3. Autoencoder Model (model.py)

4. Training with Cross-Validation (train.py)

5. Evaluation (evaluate.py)

6. Explainability (interpret.py)

Requirements

Installing dependencies:

Running the project

Current Output

Dataset

Future improvements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Data Preprocessing (`preprocess.py`)

2. Exploratory Data Analysis (`eda.py`)

3. Autoencoder Model (`model.py`)

4. Training with Cross-Validation (`train.py`)

5. Evaluation (`evaluate.py`)

6. Explainability (`interpret.py`)

Packages