Skip to content
This repository has been archived by the owner on Apr 4, 2019. It is now read-only.

Using sklearn to predict if an individual earns above or below $50k from census information

Notifications You must be signed in to change notification settings

jd-13/sklearn-census-earnings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

This notebook uses US census information to attempt to predict whether an individual earns more or less than $50k.

The final model achieves an accuracy of 86.7% on the test data set.

The data used can be found here: https://www.kaggle.com/uciml/adult-census-income

Overview

A rough outline of the steps taken is as follows:

  1. Visualize the data to identify trends and artifacts that my affect later processes
  2. Prepare the data (train/test split, scaling, one-hot encoding, etc)
  3. Grid search on several models to identify which will most likely be successful
  4. Visualise the performance of the models to better understand shortcomings and areas for improvement
  5. Continue to tune the most promising models
  6. Run the model on the test set

About

Using sklearn to predict if an individual earns above or below $50k from census information

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published