In a past life I was a real estate broker. As a broker, I noticed that real estate price prediction algorithms didn't seem to adequately account for the things that most affected my clients’ purchasing behavior. My clients would spend much of their time discussing the number of trees on a block, the trash on the street, the number of abandoned houses on a block, and the construction on a block. Real estate algorithms seemed to analyze price/sq.ft, adjacency to a school, or crime in a neighborhood. All good factors to consider, but not the gritty, on the ground analysis my clients needed. Especially in redeveloping neighborhoods in old cities, young buyers are looking to purchase potential.
I listened to a podcast about a year and a half ago regarding two data scientists that trained an algorithm to identify year, make and model of vehicles on the street via Google Street View. They were able to use this information to very accurately predict election results.
Fundamentally, real estate is local and grounded in the physical space directly viewable to the human eye. In Philadelphia, being a very provincial city, this is even more the case than many cities. Anecdotally, I bought a house 4 blocks away from a friend. My house cost 5 times less. It is difficult for a big picture algorithm to take into account the small details of the literal, on the ground, built environment that buyers take into account when purchasing real estate.
Curb Appeal is my attempt to capture one small piece of the on-the ground factors affecting real estate prices.
Driving Question: Do the number of trees on a block affect real estate prices?
Objective: Using a convolutional neural network, build an algorithm to count the number of trees on a block
My pipeline moves in the following fashion:
FETCH -> LABEL -> RESIZE -> FEED
I wrote fetch_images.py to utilize Google Street View API (GSV) and GSV meta data API to perform operations in the following manner.
- Given a street name and a block range, (e.g. 'N 26th st', 1300-1500 blocks) my script will calibrate its heading, then set the camera to take pictures at a 90 degree angle to heading* The heading resets at the beginning of every block
-
Often GSV will have one picture for multiple addresses, or GSV will have the same picture for many addresses at the end of the block. In fetch_images.py each picture is checked against the last picture to make sure it is not a repeat.
-
Each original picture is saved as house number, street name, city, state and zip. E.g. '1338_n_26th_st_philadelphia_pa_19121.jpg'
-
At the end of each series of fetches, the script will write to a pandas data frame the time, date, and number of fetches and report this back to the user (GSV will allow 25,000 free images a day).
-
I selected 4 differing neighborhoods within Philadelphia. From my knowledge of the region I know that each neighborhood contained differing house architecture, street architecture and number of trees, as well as reflecting vastly different average price points. I had my function fetch all pictures from each neighborhood.
After fetching, the process moves through label_pics.py. Label pics heavily utilizes OpenCV. Many thanks to the excellent OpenCV tutorials at pyimagesearch
The OpenCV build is specific and can be installed via 'pip install opencv-contrib-python'.
One of the challenges of labeling is to decide what is a tree and what is not a tree.
Since I am trying to measure curb appeal, I decided on a strict criteria:
- I want to count trees that are a part of the sidewalk architecture.
- Trees visible in the background do not count.
- Intersections prove especially problematic. I decided that trees must be on a parallel running block to the one the "car" is on. Stated another way, I am looking trees that are orthogonally adjacent to the path of the "car."
- A portion of the tree trunk, however small, must be visible for me to label as a tree. Only leaves, or only branches is not enough.
Again, the process of labeling happens within label_pics.py using the Labeler class. The labeling happens in the following manner:
-
I wrote my function to randomly choose a block in the neighborhood I specified, then fetch all pictures associated with that block up to the number of pics I specify
-
The Labeler fetches a picture
- To avoid over or under counting trees, and provide more data to train on, the Labeler then splits the photo vertically and displays each image for labeling.
- I then label as YES tree or NO tree for each split image.
- The Labeler than mirrors each photo with my assigned label and saves the picture
Labeler Review:
original image -> image split into image1, image2 -> image1 displayed and assigned label -> image1 and image2 mirrored -> images saved
From the original **one** image **four** images are labeled and saved in following manner:
address_pic1
address_pic1_flip
address_pic2
address_pic2_flip
- Every 100 pics, the labeler will save the filename and label to a pandas data frame as well as saving a backup of the data frame.
After labeling, the pictures are resized using the Resizer class in label_pics.py in the following manner:
-
The folder of labeled pics is specified.
-
Each picture is resized. This can be specified, in my particular case images are 600x600 from GSV(max resolution from GSV is 1000x1000). They are then split vertically, so when split they become 600x300. When resized the images become 100x50.
-
The resized image is saved for record keeping and bookkeeping purposes.
-
When reading an image, OpenCV converts it to a numpy array. The filename and array is saved to a pandas data frame.
Neural Network Architecture:
I used a Convolutional Neural Network using Keras and Tensorflow on the backend. My model is built in cnn.py with the following architecture:
-
I trained my model on AWS using a p2.8xlarge instance. I used this instance, because being a GPU instance my network took on average about 2 minutes to train. As a comparison, on my own macbook air it took about 45 minutes to train my model.
-
I decided to keep my network simple for experimentation purposes. I used two convolution layers and one pooling layer.
-
Each layer uses the hyperbolic tangent activation function, as this produced the best accuracy for my models.
-
My pooling layers drops out 25% of the neurons before and after pooling to prevent overfitting.
Train the Network:
-
My labeled data frame and numpy array data frame joined on 'filename'
-
I split my data in the following manner:
80% Train, 20% Test
The 80% Train data is then split 80%/20% train/validation during training.
-
The numpy arrays are scaled and formatted in the correct manner to feed into Keras.
-
The information passes through the CNN and I am able to see the accuracy of the model against my validation set.
-
After training, I test the model against my test data. If the accuracy of the model beats previous model accuracies, then the model is saved to used for the prediction stage.
My data set contains approximately 15,000 images. Therefore the Network trained on ~9,500 images, validated on ~2,500 images and tested on ~3,000 images.
The numbers are unequivocally disappointing.
But most important is what the data and the trained network are telling me in order for me to learn and adjust the model moving forward.
I decided to dig in a little deeper:
Recall is not bad. I could give up on some recall in order to gain precision.
What is going on with the false positives to drive the precision down?
There are many False Positives. Lets see what these false positives are.
Analysis:
The model likes phone poles.
The model is overfitting on trees that do not fit my criteria
The depth perception of the model is off
The model likes green
Unbalanced data My data consists of ~15,000 images, of which ~2,250 are labeled as 'TREE' or approximately an 85%/15% imbalance. I knew that given the small amount of images and imbalanced nature of my data, predicting on the test set would be a challenge. I adjusted the weights within my model to account for this imbalance, however, balanced data would still be best.
Depth perception Depth perception can be an issue for convolutional neural networks. This article from Stanford discusses CNN's trouble with depth perception
Good criteria? I chose very stringent criteria for my model. The goal is to use my model as a feature in real estate price predictions. With this in mind, does being able to see a tree down the block or in the distance affect an individual's purchase decision? If I constructed my model around this hypothesis I'd have a more precise feature to pass into my real estate prediction model and be able to test its' validity.
Increase complexity of Neural Network In my quest to provide a proof of concept, I chose to keep the neural network very simple and to live with the results. I could reasonably expect my network to improve with greater depth.
Larger images My actual images size is 100x50
Impetus for project: Estimate demographic makeup of US using GSV and machine learning