Skip to content

Built a logistic regression model using Scikit-learn to predict fraudulent transactions by training it on a kaggle dataset. Before training the model, created visualizations of the data using the Seaborn library.

Notifications You must be signed in to change notification settings

smazhuvan/fradulent-transaction-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Fradulent Transaction Prediction

Intro

The PwC global economic crime survey of 2016 suggests that approximately 36% of organizations experienced economic crime. Therefore, there is definitely a need to solve the problem of credit card fraud detection. The task of fraud detection often boils down to outlier detection, in which a dataset is scanned through to find potential anomalies in the data. In the past, this was done by employees which checked all transactions manually. With the rise of machine learning, artificial intelligence, deep learning and other relevant fields of information technology, it becomes feasible to automate this process and to save some of the intensive amount of labor that is put into detecting credit card fraud. In this assignment, we'll train a model with pandas, seaborn and scikit-learn to create a fraud detection classifier.

Context

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Inspiration

Identify fraudulent credit card transactions. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. A logistic regression model using Scikit-learn to predict fraudulent transactions by training it on this kaggle dataset

Dataset

https://www.kaggle.com/mlg-ulb/creditcardfraud

About

Built a logistic regression model using Scikit-learn to predict fraudulent transactions by training it on a kaggle dataset. Before training the model, created visualizations of the data using the Seaborn library.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published