Real Estate Data Analytics

This repository contains my project work for the CSCI-GA.2437 Big Data Application Development course at NYU.

Project Overview

This project involves the analysis of a large NYC real estate dataset using Apache Spark, Scala, and Zeppelin Notebook. The goal was to clean and transform the data, perform clustering to segment properties, and analyze spatial and temporal patterns to uncover insights into market trends and neighborhood dynamics.

Tech Stack

HDFS
Apache Spark (MapReduce)
Apache Zeppelin
Scala

Structure

data/: small pieces of dataset for preview
figures/: visualizations created with Apache Zeppelin / Python Pandas
code.ipynb: zeppelin notebook for data exploration and modeling
slides.pdf: our presentation slides
report.pdf: our final report

Analytics:

Utilized Apache Spark, Scala, and Zeppelin Notebook to process and analyze a large NYC real estate dataset, implementing scalable workflows for data cleaning and transformation.
Built a KMeans clustering pipeline with dimensionality reduction to segment properties into distinct clusters.
Calculated growth rates and analyzed spatial and temporal patterns to uncover insights into seasonal market behaviors and neighborhood dynamics.
Visualized results with Zeppelin and Pandas Matplotlib: bar charts for sale prices, scatter plots for clustering, and pie charts for transaction volumes to highlight trends.

Overview

Below are some selected results showcasing the figures generated during the analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real Estate Data Analytics

Project Overview

Tech Stack

Structure

Analytics:

Overview

Bar Chart: Sale Prices by Category

Bar Chart: Sale Prices by Neighborhood

Growth Analysis: Temporal Patterns

Scatter Plot: KMeans Clustering Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
figures		figures
.DS_Store		.DS_Store
README.md		README.md
code.ipynb		code.ipynb
report.pdf		report.pdf
slides.pdf		slides.pdf

guochenmeinian/CSCI-GA.2437

Folders and files

Latest commit

History

Repository files navigation

Real Estate Data Analytics

Project Overview

Tech Stack

Structure

Analytics:

Overview

Bar Chart: Sale Prices by Category

Bar Chart: Sale Prices by Neighborhood

Growth Analysis: Temporal Patterns

Scatter Plot: KMeans Clustering Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages