Skip to content

SebastianThomas1/king_county

Repository files navigation

House sales prices in King County

A project on exploratory data analysis.

Sebastian Thomas @ neue fische Bootcamp Data Science
(datascience at sebastianthomas dot de)

This was my first project at the neue fische Bootcamp Data Science. It was centered around exploratory data analysis techniques and simple predictive analysis using ordinary linear regression. After the bootcamp, the analysis was extended.

The instances in the data set represent house sales. The task is to describe the impact of the given features on the house sales prices resp. to predict the latter with machine learning methods.

Results

We have the following key insights:

  • The distribution of house sale prices is left modal, with a median of about 0.5 million US Dollars.

distribution of price

  • Location has a big impact on house sale price as can be visualized by the median house sale prices grouped by the zipcode:

median prices grouped by zipcode

The area with the highest housesale prices is Medina with zipcode 98039, a city in Eastside in the metropol region of Seattle.

zipcodes map

  • There is a rough linear relationship between the living space area and the house sales price.

living space

  • While there is a rough linear relationship between house condition and the house sales price, the quality of the interior (design/materials) has an exponential impact on the house sales price.

condition grade

  • The better the view, the higher the house sales price. Most properties don't have an extraordinary view.

view

  • If the house is on a waterfront, the median house sale price increases about 1 million US Dollars.

waterfront

  • The average error of the predictive model is about 12% (mean absolute percentage error) resp. $ 37,000 (median absolute error).

Content

Future work

  • try more regression algorithms
  • try more ensemble methods
  • try more feature selection methods
  • try artificial neural networks

Releases

No releases published

Packages

No packages published