Skip to content

A interactive data visualization tool for Chicago Divvy Dataset

Notifications You must be signed in to change notification settings

duynguyen2001/Divvy-data-exploration-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Web-Based Visualisation System for Divvy Data Analysis

🌎 Webpage Link

Our Webpage, hosted with GitHub Pages.

Link to the repository hosting the source code.

🧾 Logistics

Group Members

Khanh Duy Nguyen ([email protected]) Raka Primardika ([email protected])

Organisation

University of Illinois at Chicago, CS 424: Visualisation and Visual Analytics, Project 3

Project description can be found on this direct link, or from the course website.

🏞️ Background

We are analysing data from Divvy bicycle rideshare service in Chicago, IL, USA. Both of us are regular uses of the service, which adds to our familiarity and interest on working with their data.

πŸ™‹β€β™‚οΈ Domain Questions

Knowing that we are analysing Divvy data, we constructed this set of domain questions as our lines of inquiry. This was done before we took a close look at the dataset attributes. We did this so that we had a clear picture on what we are generally interested in finding out from the data.

  • Q01: Do paid members use more e-bikes than non-paid users?
  • Q02: At which times of the day do UIC students use Divvy bikes the most?
  • Q03: Do people ride faster and farther in the afternoon compared to morning time?

These domain questions were driven from our personal experiences using Divvy bikes as UIC students.

Since Divvy increased their annual membership fee (see article here) and are transitioning to electric bikes, we were interested in finding out whether people who are members use more electric bikes than casual members (hypothesis), taking advantage of the reduced prices. This was our main thought process behind Q01.

As UIC students, we were also interested in how other students use Divvy bikes for their commutes. First of all, we wanted to know what time of day they use bikes the most. We hypothesise that this would be afternoon time, around 16:00. Since we also think that students here use Divvy mainly to commute to and from home, we think that during those afternoon times, most people who use Divvy travel longer distances and as a result, longer commute times on bikes. This was our main idea for Q02 and Q03.

After looking at the results of the previous questions in Project 2, we wanted to add another line of inquiry:

  • Q04: Do members and casual riders ride in different areas?

We believe that since casual riders have been shown to use a higher proportion of electric bikes than members (Project 2 results), they would most likely be concentrated in areas farther from downtown. We believe that Q04 is a good opportunity to create a spatial view visualisation.

πŸ“Š Dataset Description

The dataset we are handling spans from 24/10/2022 through 30/10/2022, which is only one week of data. This is ~85000 rows. We believe that even though this is little, it's still sufficient to gather meaningful knowledge from the visualisations created from the data. We wanted to at least make sure we get all the days in the week, so that we can compare between the days.

Note: Data was preprocessed using python. Code for this can be found under the python/ folder. Python spits out data in the form of a .csv file. We further converted it to a .json file using a CSV to JSON online converter.

The dataset contains various information: start and end time, start and end coordinates, member types, bike types, station IDs and ride IDs. We are most interested in the spatiotemporal data, combining them with the bike and member type data to produce meaningful plots.

Transformations we did was simply to add two columns, duration and distance of trips. More details can be found in the transformations section.

🀨 Data Questions

  • Q05: How do e-bike and normal bike ridership ratios differ for each membership type?
  • Q06: Given a region in Chicago, how does the number of all Divvy rides change for each hour of the day?
  • Q07: Given a time of the day, how do distance and duration of trips compare for both types of bikes?

To answer Q01, we believe the appropriate data question is Q05, a simple ratio comparison between member vs casual riders and electric vs normal bike counts.

We decided to extend Q02 to explore any region of Chicago, not just UIC. This would be a good opportunity to practice creating an interactive visualisation where the viewer gets to choose which part of Chicago they want to explore. For a given region that the viewer selects, we can make a plot showing the times of day where Divvy is most popular.

For Q07, our idea behind that was to simply compare trip distances and durations for a given time of day. The time of day attribute can be made interactive so that the viewer can choose it. We also decided to colour code the bike types as well just to see if there's a significant difference between e-bike and normal bike distances and durations.

Additional data question for this project:

  • Q08: Given the type of member and the type of bike, how are Divvy rides concentrated in Chicago?

We thought it would be a good opportunity to implement a select-type interaction where the user can choose which member/bike type they're interested in finding out the answer for Q04.

Transformations

The attributes distance and duration mentioned in Q07 aren't available immediately from the raw dataset, but we can derive them from the attributes that are already there: start_lat, start_lng, end_lat, end_lng, started_at and ended_at. We create the new columns for these attributes.

Calculating duration was pretty simple. It was just a matter of finding the difference between started_at and ended_at and then converting from milliseconds to seconds.

Calculating distance was a bit trickier. We eventually made use of a function we were taught in our data structures class (CS 251 at UIC), where we had a project that dealt with distances. This function returns the straight line distance (or displacement, in physics terms) in statute metres, given the start and end coordinates, which of course we already have.

function distBetween2Points(lat1, lon1, lat2, lon2) {
  //
  // Reference: http://www8.nau.edu/cvm/latlon_formula.html
  //
  const PI = Math.PI;
  const earth_rad_miles = 3963.1;  // statute miles
  const earth_rad_metres = 1609.34 * earth_rad_miles;  // statute metres

  var lat1_rad = lat1 * PI / 180.0;
  var lon1_rad = lon1 * PI / 180.0;
  var lat2_rad = lat2 * PI / 180.0;
  var lon2_rad = lon2 * PI / 180.0;

  const dist = earth_rad_metres * Math.acos(
    (Math.cos(lat1_rad) * Math.cos(lon1_rad) * Math.cos(lat2_rad) * Math.cos(lon2_rad))
    +
    (Math.cos(lat1_rad) * Math.sin(lon1_rad) * Math.cos(lat2_rad) * Math.sin(lon2_rad))
    +
    (Math.sin(lat1_rad) * Math.sin(lat2_rad))
  );
  
  return dist;
}

πŸ—’οΈ Initial Plans and Sketches

Sketches can be found in the sketches/ folder.

In our initial discussions, we wanted to have several modes of web pages, separated by the types of interaction.

Sketch Description
This is our initial concept for our webpage. We are to switch between plots depending on the mode, selected by using the buttons on top of the page. In all modes, the full map will be shown as the background. Overlays of the map will change depending on the mode.

Upon discussing, we eventually aggred to have three modes: brush mode, slider mode and select mode. This is our initial plan. Depending on our progress, this may change.

Brush Mode Slider Mode Select Mode
Brush mode is our linked view line plot from P2, where the interaction is from brushing points on the map. Slider mode is our single view scatterplot from P2, where the interaction is from using a slider. In this project, we propose to add a new linked flow map. Select mode allows interaction of selecting parts of the pie chart. The selected data is represented in the new heat map and the new bar chart.
map scatter plot (spatial, linked, interact), line chart (linked) slider (interact), scatter plot (linked), flow map (spatial, linked) pie chart (interact, linked), heat map (spatial, linked), bar chart (linked)

πŸ‘·πŸ»β€β™‚οΈ Initial Work

For this project, we are deciding to use React to build our website.

In this repository, all the code is located in the ReactJS/cs424-project3/ folder. ReactJS/ is the root directory for our webpage, it's a copy of the repository we used for GitHub Pages.

πŸ“Έ Screenshots

In this description, we provide brief overviews of our finalised modes.

Home Slider Mode
This is the home page, shown upon loading the website. Slider mode is a simple single view scatterplot, showing distribution of ride distances and durations given a window of time (1 hour). The slider here is used to move the window of time in 15 minute intervals.
Select Mode Brush mode
Select mode is a static display of our pie chart showing the proportion of ridership for different bike and member types. This plot has checkboxes as well. Paired with a heatmap layer which shows ridership density, the checkboxes filter the data that is to be displayed on the map. Brush mode allows the viewer to select an area of the map, using a rectangle or a polygon tool. Data from the points within the shape drawn is reflected in the line plot, which represents the ridership per hour of day.

πŸ•΅οΈ Findings

Select Mode

The most interesting takeaway from this data is the fact that for casual members, electric bike ridership has surpassed 50% of all rides this October. A massive increase compared to last September's 40% ratio (Project 2 results). As for the density heatmap, we can see that members occupy a larger area of Chicago, which contradicts our hypothesis of Q04. As for the different bike types, electric bikes do seem to occupy a larger area, but not by as much difference as the differenct member types.

Slider Mode

Similar to our findings from Project 2, we see that there are a lot more rides that happen during the afternoon times as opposed to morning times, thus naturally creating more rides that go farther and for longer. A speed separation between the two types of bikes can also be seen.

Brush mode

Similar to Project 2, we see that most of the rides that happen in the UIC area happen at around afternoon time. Difference is that this time, most of the rides are concentrated around 4pm rather than 2pm (Project 2 results). We're not quite sure why that is because at that point, Daylight Savings wouldn't have happened yet.

βœ… Task Completion

Task 0

We set up our web page using GitHub Pages.

Task 1

We have our web page done, made from scratch using React.

Task 2

We used all three of our visualisations from Project 2. Two interactive ones include brush mode where we can brush over a map and changes are reflected in the line plot and the scatterplot in slider mode, which is a single view interactive plot.

Task 3

Our new multiple linked view is the select mode. We introduce a new interaction method of selecting check boxes, and the data is reflected on the heatmap. The selection is based on our existing pie chart, making it a multiple linked view.

Task 4

Our new spatial view plot is the heatmap in select mode, also part of the previous task. This was implemented using Leaflet.

Task 5

  1. This README.md is the markdown documentation.
  2. Source code is located in the ReactJS/ folder.
  3. Screenshots of the web page can be found in the πŸ“Έ screenshots section or in the images/ folder.
  4. Web page is linked here.
  5. Presentation was done on Tuesday 29 November 2022 in class.

References

About

A interactive data visualization tool for Chicago Divvy Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published