nyctaxi

Prerequisites

Download the 2013 taxi data using this shell script.
- To download the 2015 taxi data (includes both yellow and green taxi data but lacks medallion and hack license info), use this one. To load in R, use this script.
[This R script] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/load_one_week.R) loads the csvs, adds necessary and convenient columns (e.g. neighborhood names) and saves them as taxi_clean in one_week_taxi.Rdata. To use the dataframe, simply call load('one_week_taxi.Rdata').
This R script uses taxi_clean to create a dataframe calles shifts_clean of drivers (hack_licenses) and their shifts (as measured by the cutoff analysis here), and a dataframe called taxi_clean_shifts with a shift number for each ride, and stores it in an Rdata file called shifts_clean.Rdata.

####NOTE: AS OF 7/26 YOU SHOULD MOVE ALL .RDATA FILES INTO THE RDATA FOLDER, AND SAVE ALL FUTURE RDATA FILES TO THAT FOLDER

##Descriptives

Cool figures, plots, and maps (output of some of the scripts below) are in this dir
This script creates a function (visualize_trips_by_shift) that can plot the route of a random taxicab driver over the course of a shift or a day of the week (visualize_trips_by_day).
- Usage: visualize_trips_by_shift(df, hacklicense, shift = NULL). df is the dataframe (usually taxi_clean but sometimes a subset of that. hacklicense is the hack_license of the driver (usually randomly chosen from df). shift is optional - it takes a shift number; when ommitted, all shifts will be shown as a faceted plot. visualize_trips_by_day(df, hacklicense, day = NULL) works in a similar manner except that it can take in a particular day in the format "Mon", "Tue", etc.

Trip-based

Stats for one week of taxi rides by day of week, hour of day, pickup location, and dropoff location are computed by this R script.
Trip based descriptive plotting (distributions of distance, time, fare, etc) can be found here
Neighborhood popularity plots (in R) are here
Interactive popularity heatmaps by neighborhood can be created using this script
Ggmap (not-interactive) popularity heatmaps can be created using the functions in here

Driver-based

Driver based descriptive plotting (distributions of distance, time, fare, etc, by number of drivers) are here
Visualize shifts, and rides within them, for n random drivers by calling the visualize_rides_and_shifts() function created by this R script.

Shift-based

Some plots using shift intervals [here] (https://github.com/msr-ds3/nyctaxi/blob/master/exploratory_analysis/plots_with_shift_interval.R)

Predicting Efficiency

Predicting shift efficiency

Features to be included in the design matrix for the shifts prediction task are listed in this markdown file.
The design matrix can be created and saved as an Rdata file using the script here
Descriptive plots for both regression and classification for each individual feature here
Created some models and efficiency prediction here

Predicting driver efficiency

future work: Features to be included in the design matrix

Analyzing flow

Visualizing flow over the day.
Analysis on carpooling possibilities, here
Plots on carpooling analysis.
Probabilites of lat/lng destinations given a source neighborhood and a hour of day.
Diving into carpool savings in more depth, at this link.

Shiny apps

A shiny app to visualize NYC taxi flow as a heatmap can be found here
A shiny app (inspired by Todd Schneider's post) to visualize average trip times from neigborhood to neighborhood.
An app to see popular neighborhood destinations, and unusual neighborhoods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

nyctaxi

Prerequisites

Trip-based

Driver-based

Shift-based

Predicting Efficiency

Predicting shift efficiency

Predicting driver efficiency

Analyzing flow

Shiny apps

Other work

De-anonymization

Games

Files

README.md

Latest commit

History

README.md

File metadata and controls

nyctaxi

Prerequisites

Trip-based

Driver-based

Shift-based

Predicting Efficiency

Predicting shift efficiency

Predicting driver efficiency

Analyzing flow

Shiny apps

Other work

De-anonymization

Games