Data Science Course Project

Predict Asking Rents for NYC apartments

Team: Sagun Pandey, Ahsun Rasool, Phurpa Sherpa, Sanjay Gurung

Goal

The goal of this project is to apply our data handling and modeling skills taught in the class to a real world data set. Our task is to predict asking rents for and answer several modeling questions pertaining to for New York City apartments posted on StreetEasy, an online marketplace for New York City homes. Predictions will be judged on the mean squared error of our estimated rents for the provided test sets.

Important: The datasets from NYC Open Data are large and therefore exceed Githib's 25 MB upload limit. If attempting to replicate the whole modeling process from the start, make sure to download them to your local machine and change the import path. URLs to the datasets are provided in the notebook cells.

Data

The data sets for the project come from a random selection of homes posted for rent on StreetEasy during the summer of 2018. A training set with a sample of 12,000 homes posted in May, June, and July of 2018, along with their respective asking rents and several details pertaining to their listing on StreetEasy, including publicly posted bedroom count, bathroom count, descriptions, and select building and unit amenities. We are required to generate predictions on a random set of listings posted on StreetEasy during August 2018. One full set, including observed rents, is provided with the project posting. We are required to submit predicted rents on two additional sets, including test2 and test3, which do not include the observed rents.

We are expected to attach at least one additional data set to the set provided. The data set includes several data points designed to facilitate attaching additional third party data sets to the StreetEasydata set. Examples of these include the street address, latitude and longitude, and New York City BIN and BBL numbers. Additional data could come from the U.S. Census Bureau, New York City open data, the NYC Geoclient or any number of other open sources.

Deliverables

csv with predictions against test2.csv
A 200-300 word explanation:
- Expected performance of the model in terms of mean squared error
- Key features driving the team’s modeling performance.
A 200-300 word explanation:
- intended strategy to improve the predictions for the final round
csv with predictions against test3.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
.DS_Store		.DS_Store
Project_Rentflix.ipynb		Project_Rentflix.ipynb
README.md		README.md
Rentflix_Part2.ipynb		Rentflix_Part2.ipynb
initial_findings.md		initial_findings.md
project_findings.md		project_findings.md
sample_submission2.csv		sample_submission2.csv
test2_results.csv		test2_results.csv
test3_results.csv		test3_results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Course Project

Predict Asking Rents for NYC apartments

Goal

Data

Deliverables

About

Releases

Packages

Contributors 4

Languages

psagun/rent-predictor

Folders and files

Latest commit

History

Repository files navigation

Data Science Course Project

Predict Asking Rents for NYC apartments

Goal

Data

Deliverables

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages