Identifying Fraud in the Enron Dataset: Building a POI Classifier

Introduction

Background

In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. As a result of the ensuing trial, many of the employees emails had their emails and financial information released.

Full project code is the poi_id.py file, while the write-up is the EnronFraudDetection_FINAL jupyter notebook/html/markdown files.

Project Goal

This project seeks to classify former Enron employees as "Persons of Interest",or people that the authorities should interview in the investigation, based on the dataset described above.

I will be using Python and Python's data analysis and machine learning libraries to accomplish this task. Python is a flexible, general purpose programming language with libraries that allow mining of all data, including text data, easy and efficient.

Machine learning is great for this task, as there are many strong classification algorithms that can seek patterns in the data that someone manually looking over might not, while being quicker as well.

This project was completed as part of my Data Analyst Nanodegree from Udacity.

I hereby confirm that this submission is my work. I have cited above the origins of any parts of the submission that were taken from Websites, books, forums, blog posts, github repositories, etc.

Results

Best Model

The best results came from using an AdaBoost Classifier using 10 features, with a learning rate of 1 and 35 estimators:

AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1,
          n_estimators=35, random_state=None)

Accuracy: 0.88040	Precision: 0.56527	Recall: 0.44600	F1: 0.49860	F2: 0.46565
Total predictions: 15000	True positives:  892	False positives:  686	False negatives: 1108	True negatives: 12314

Variables Used

['deferral_payments', 'total_payments', 'loan_advances', 'restricted_stock_deferred', 'deferred_income', 'expenses', 'exercised_stock_options', 'long_term_incentive', 'from_this_person_to_poi', 'perc_from_poi'])

Conlcusion

An AdaBoost Classifier was built that was able to predict a person of interest in the Enron dataset, with 88% accuracy, 56.5% precision, and 44.6% recall.

We were able to predict whether an employee was a person of interest to an extent. The model was right 88% of the time, but when it guessed someone was a person of interest, it was right slightly less than half the time (45%).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Notes		Notes
model_scripts		model_scripts
text_processing_scripts		text_processing_scripts
EnronAnalysis_LinearRegression.ipynb		EnronAnalysis_LinearRegression.ipynb
EnronFraudDetection_FINAL.html		EnronFraudDetection_FINAL.html
EnronFraudDetection_FINAL.ipynb		EnronFraudDetection_FINAL.ipynb
EnronFraudDetection_FINAL.md		EnronFraudDetection_FINAL.md
README.md		README.md
_config.yml		_config.yml
my_classifier.pkl		my_classifier.pkl
my_dataset.pkl		my_dataset.pkl
my_feature_list.pkl		my_feature_list.pkl
output_17_0.png		output_17_0.png
poi_id.py		poi_id.py
references.txt		references.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Fraud in the Enron Dataset: Building a POI Classifier

Introduction

Background

Full project code is the poi_id.py file, while the write-up is the EnronFraudDetection_FINAL jupyter notebook/html/markdown files.

Project Goal

I hereby confirm that this submission is my work. I have cited above the origins of any parts of the submission that were taken from Websites, books, forums, blog posts, github repositories, etc.

Results

Best Model

Variables Used

Conlcusion

About

Releases

Packages

Languages

jacobod/Enron-POI-Classifier

Folders and files

Latest commit

History

Repository files navigation

Identifying Fraud in the Enron Dataset: Building a POI Classifier

Introduction

Background

Full project code is the poi_id.py file, while the write-up is the EnronFraudDetection_FINAL jupyter notebook/html/markdown files.

Project Goal

I hereby confirm that this submission is my work. I have cited above the origins of any parts of the submission that were taken from Websites, books, forums, blog posts, github repositories, etc.

Results

Best Model

Variables Used

Conlcusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages