The dataset recorded Ford GoBike's bike-sharing service in the greater SF Bay area. Each record is an individual ride which consists of 16 features, such as duration, ride start/end time, start and end station id/name, start/end station coordinates, bikes id, user type, member birth year, member age, gender, and bike share for all trips. For this analysis I have selectected one year period from 2018-05 until 2019-04, that contains 2,161,106 bike trips.
Exploration of time variables showed that customers used bikes most for travel to work and back on the workdays. The most interesting discovery, that bikes are used most during the spring and autumn season. The customers are mostly young men between 25 and 35 who are locals and have Ford GoBikes subscriptions. Surprising fact that men are using bikes 3 times more than womens. Also, users usually have short trips and use small speed. The subscribers use bikes mostly before working hours and after, but customers use bike rides during the day. Customers are usually tourists using bikes consistently during the week, but subscribers are usually locals using bikes more on weekdays and less on weekends. However customers and subscribers use bikes similar during the year. The most popular months are March, April and October and less popular November and December. The younger users are those who take a longer bike trip, have longer distance and have a higher speed of the trip. The customers are more likely to use bikes longer and have the highest speed. Although the subscribers use bikes for a shorter time and have lower speed. Female users are more likely to use bikes with longer trip duration and have the lowest speed. The male user uses bikes for the shorter trip duration and has the highest speed. The trip duration mean distribution is higher on weekend days and on summer months. The highest average trip duration is 17.7 minutes in July month and on Saturdays. The trip duration mean distribution by day hour is highest at the night time. The highest average trip duration is 29.7 min at 3 am. on Thursdays. Compered distribution of bike trips by customer and subscriber types shows that the customer users are younger than subscribers users. Also the gender gaps are higher in subscriber type users. The trends between duration and distance shows significant differences between the user gender and users type.
The 1,8 mln trips and 88,7 % of the all trips are made by subscriber users. From two plots we can see that most bike users are subscribers and just small slides are random customers. Probably the local people are most likely to be subscribers and tourists are customers. We can see how darkest color and clarity grades showing trip frequency by plotting the user gender and age range together. The plot clearly shows that the most of the customers are males and between 28 and 32 years old. These three plots show trips count by customer gender and travel duration, distance and speed. It shows that females are more likely to use bikes with longer trip duration and have the lowest speed. Although the male user uses bikes for the shorter trip duration and has the highest speed. From the above figure plots we can make final trip frequency distribution by time variables. So we could say that the most customers have used bikes at 8 and 17 hour, during workdays and in March and April months. The distribution of average trip duration plot shows very clearly that users have longer bike trips on weekends and shorter trips on workdays. The highest average trip was 17.7 minutes in July. Also, we can mention that the average trip duration drops in winter months.
Citi Bike 2017 Analysis - https://towardsdatascience.com/citi-bike-2017-analysis-efd298e6c22c Fast Haversine Approximation (Python/Pandas) - https://stackoverflow.com/questions/29545704/fast-haversine-approximation-python-pandas Rotating axis labels in matplotlib and seaborn - https://drawingfromdata.com/seaborn/matplotlib/visualization/2020/11/01/rotate-axis-labels-matplotlib-seaborn.html