Skip to content

The repository contains code to detect credit cards fraud transactions on a kaggle dataset using machine learning

License

Notifications You must be signed in to change notification settings

ShayanHodai/fraud-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection

Description

This repository contains code to analyze credit card transactions and predict whether transactions are fraudulent using machine learning algorithms. The machine learning workflow includes data collection and exploration, data processing, feature correlation analysis, automated processing using pipelines, model building, performance evaluation through cross-validation, and fine-tuning the best-performing model based on precision, recall, and F1 score metrics.

The dataset used in this project is sourced from Kaggle: Credit Card Fraud Detection Dataset.

Table of Contents

Installation

To work with the code, clone the repository:

git clone https://github.com/ShayanHodai/fraud-detection.git

Dataset

Example Image

The dataset is highly imbalanced, with less than 1% of total transactions being fraudulent.

Example Image

Data Processing

Features histograms: Most features are centered around 0. Example Image

Scaling

  • Time feature is scaled using StandardScaler, resulting in a range between 0 and 1.
  • Amount feature is scaled using RobustScaler, which deals better with outliers.

Example Image

Feature Selection

To address class imbalance in the dataset and create a balanced dataset, random undersampling is applied to reduce the number of instances in the majority class. This ensures that the machine learning model can learn to recognize patterns in both classes more effectively. After undersampling, the shape of the balanced dataset is (984, 31).

Correlation of fraud/normal transactions with non-redundant features

Example Image

Machine Learning Models:

The cost of false positives and false negatives varies in this problem, so precision, recall, and F1-score are used as evaluation metrics.

Logistic regression

Example Image

KNN

Example Image

SVM

Example Image

Decision tree classifier

This model tends to overfit. Example Image

Model Evaluation

ROC carve: Example Image

Fine-Tuning

Fine-tuning the best performing model, which is logistic regression: Example Image

Evaluation on Test Set

Example Image

Contributing

Contributions to this project are welcome. To contribute, follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit them (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Create a new Pull Request.

Please ensure your code adheres to the project's coding standards and includes

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or suggestions, please contact: [email protected]

About

The repository contains code to detect credit cards fraud transactions on a kaggle dataset using machine learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published