This repository contains code to analyze credit card transactions and predict whether transactions are fraudulent using machine learning algorithms. The machine learning workflow includes data collection and exploration, data processing, feature correlation analysis, automated processing using pipelines, model building, performance evaluation through cross-validation, and fine-tuning the best-performing model based on precision, recall, and F1 score metrics.
The dataset used in this project is sourced from Kaggle: Credit Card Fraud Detection Dataset.
- Installation
- Dataset
- Data Processing
- Feature Selection
- Machine Learning Models
- Model Evaluation
- Fine-Tuning
- Evaluation on Test Set
- Contributing
- License
- Contact
To work with the code, clone the repository:
git clone https://github.com/ShayanHodai/fraud-detection.git
The dataset is highly imbalanced, with less than 1% of total transactions being fraudulent.
Features histograms: Most features are centered around 0.
Time feature
is scaled using StandardScaler, resulting in a range between 0 and 1.Amount feature
is scaled using RobustScaler, which deals better with outliers.
To address class imbalance in the dataset and create a balanced dataset, random undersampling
is applied to reduce the number of instances in the majority class. This ensures that the machine learning model can learn to recognize patterns in both classes more effectively. After undersampling, the shape of the balanced dataset is (984, 31).
Correlation of fraud/normal transactions with non-redundant features
The cost of false positives and false negatives varies in this problem, so precision, recall, and F1-score are used as evaluation metrics.
Fine-tuning the best performing model, which is logistic regression:
Contributions to this project are welcome. To contribute, follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes and commit them (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
Please ensure your code adheres to the project's coding standards and includes
This project is licensed under the MIT License. See the LICENSE file for more details.
For any questions or suggestions, please contact: [email protected]