Fraud Detection in Graph Neural Network

This repo is refactored from the model used in awslabs/sagemaker-graph-fraud-detection, and implemented based on Deep Graph Library (DGL) and PyTorch. Unlike Amazon's implementation, this repo does not require the use of Sagemaker for training. We can run it directly with the free Google Colab or with our own local devices.

In addition to the Fraud transaction detection problem addressed, the repo can also be used in other heterogeneous graph based scenarios, such as game account theft, online shopping fraudulent orders, etc.

Introduction

Many online businesses lose billions of dollars to fraud each year, but machine learning-based fraud detection models can help businesses predict which interactions or users are likely to be fraudulent in order to reduce losses.

This repo formulates the problem of fraud detection as a classification task for heterogeneous interaction networks. The machine learning model used is a graphical neural network (GNN) that learns potential representations of users or transactions, which can then be easily classified as Fraud or not.

This repo constructs a heterogeneous graph of the transaction data provided in the IEEE-CIS Fraud Detection data. The following are defined as features of nodes and edges, respectively:

NODE: Number of card associated, Number of address associated, Days between transactions, Match situation(name, card, address, email, etc.), Vesta engineered rich features, etc.
EDGE: Purchaser and recipient email domain, Product, Card information, Address, Device information, Network connection information (IP, ISP, Proxy, etc), Digital signature (UA/browser/os/version, etc)

Usage

If you want to run the code locally rather than on Colab, please skip the first 2 cell in each notebook.

1. Download dataset

First, we need to download the dataset from Kaggle. This link provided some additional information about the dataset.

Then put all of the CSV files into the ./ieee-data/ folder.

2. Data preparation

Before feeding the data to the model, we need to perform data pre-processing. Open 10_data_loader.ipynb and follow the introduction inside. The compiled data will be saved into the ./data/ folder.

3. Training

Open 20_modeling.ipynb and follow the introduction inside. CPU training is recommended. Using GPUs may require additional environmental issues to be addressed.

4. After training

The trained models and related files will be save into the ./model/ folder. You can also visualize the training process by 30_visual.ipynb. The related graphs and training record has been saved in ./output/ folder.

Results

The constructed heterogeneous graph contains a total of 726,345 Nodes and 19,518,802 Edges.

Considering that the data is very imbalanced, we need to make a trade-off between Recall and Precision. Considering that misclassifying non-fraud transactions as fraud will seriously affect the user experience, Precision is the priority. After training, the Precision is 0.86 and the ROC is 0.92.

Confusion Matrix:

type	Labels Positive	Labels Negative
Predicted Positive	1435	240
Predicted Negative	2629	113804

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
gnn		gnn
ieee-data		ieee-data
model		model
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
10_data_loader.ipynb		10_data_loader.ipynb
20_modeling.ipynb		20_modeling.ipynb
30_visual.ipynb		30_visual.ipynb
LICENSE		LICENSE
README.md		README.md
graph_intro.png		graph_intro.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection in Graph Neural Network

Introduction

Usage

1. Download dataset

2. Data preparation

3. Training

4. After training

Results

About

Releases

Packages

Languages

License

waittim/graph-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection in Graph Neural Network

Introduction

Usage

1. Download dataset

2. Data preparation

3. Training

4. After training

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages