-
Notifications
You must be signed in to change notification settings - Fork 0
Home
-
We would then need to load the datasets for April, May, June, July, August and September into R. As we want to treat the datasets as a single dataset for more efficient analysis, we will then bind them together into a single dataframe for further analysis.
-
A peek at our dataset structure shows our Date/Time column is in a character format. We would need to convert that to a proper date-time format before conducting any further action on our dataset. We will then use lubridate to create new columns for month, day, week day and hour. We will express the month and weekday columns as categorical variables since they have a finite amount of groups, that is, week days and months. The days of the month will be expressed in an integer format, that is to represent the 1-30/31 days in a month
-
Our first task, we would like to visualize the number of trips per month to see the months with the highest and lowest trips.
-
As we can see, the total number of rides gradually increased as the year went by. April had the lowest number of rides and in September, the highest rides total was recorded surpassing 1 million rides. Now that we know September had the highest number of total rides, wouldn’t we want to see what day of the week people booked a lot of rides or the fewest number of rides? Do people book more trips on say Friday than any day of the week? Or is it Monday? Enough suspense! Let’s find out!
-
Interestingly, the week day with the highest number of rides was Thursday. Friday comes in close second followed by Wednesday. An interesting pattern the visual above shows us is that, the total number of rides per day peak from Wednesday to Friday, then steeply falls off on Saturday then recovers slowly to peak again on Thursday. Okay now time for some more deeper insights. Does our data have anything else to tell us? We have found the month and day of the week with the highest number of trips but what about the hour of the day in which most rides happen in? Will that reveal something else?
-
From our above visual, we definitely have some even more interesting insights we have uncovered. To begin with if we group our data into two parts; 7:00 - 9:00 and 16:00 - 18:00, we can presume that, more New Yorkers use Uber in their evening commute from work than to work. 17:00 recorded more trips than any other hour and we can see a steady decline in trips from that hour to 02:00, the hour of the day with the lowest number of rides, 45,865. This shows us that in New York, Uber is heavily used at every single hour in any given day. We will now want to check the pattern of weekend trips against working day trips to check for any pattern variation.
-
Hmm. . . interesting. The visuals above show us some difference in ride patterns for weekends. First of, the number of rides on weekends is lower than that of weekdays and the number of rides per hour for weekends peak earlier than that for working days. The pattern of working days show a steady but sure decrease in rides from 17:00 to 02:00 but for weekends, the opposite is true. There is a relatively high and sustained rate of rides from 16:00 through to 00:00. There is then a steady decrease from 0:00 to 05:00 then a consistent increase in rides till 16:00 again, completely different to weekdays where rides are peak at 17:00 and fall off till about 02:00 and increase again till 07:00 and drops till 10:00 then records a steady increase again till 17:00, the peak hour for rides in New York on working days.
-
Base B02617 recorded the highest number of total rides with 1.45 million rides taking place from there. Base B02512 in comparison had the lowest number of rides with only 205,673 rides taking place in the base. Base B02617 had a total seven times larger than Base B02512.
-
As displayed here, September has the highest number of rides for each weekday out of the other months. It is also interesting to note that, July, despite having a comparatively lower number of total rides than August, ranks next best for total number of rides on Tuesdays, Wednesdays and Thursdays. April ranks lowest for all days while May has the third highest trips on a Friday.
-
Base B02617, the base with the highest number of total rides was first in total rides for the months July, August and September, with 310,160, 355,803 & 377,695 total rides respectively. There was a gradual increase in the number of rides from April to September. Base B02598 also recorded the highest number of rides for the months May and June with 260,549 and 242,975 respectively. Base B02682 recorded the highest number of rides for April then dropped off in May then in June. Base B02764 recorded the lowest ride total for April, May and June but experienced a surge in rides in September to 178,333, three times its next highest total rides.