Skip to content

Latest commit

 

History

History

ScatterPlot

Scatter Plot

Scatter Plot is a visualization display the relationship of data between two dimension of attributes. Line Chart, Bubble Chart and Scatter Plot in Plotly use the same module to plot the visualization. In this folder, we are going to go over how to create scatter plots with Python and Plotly.

Files

The following scripts are used in this chapter:

  • SimpleScatterplot.py
  • NumColourScatterplot.py
  • CateColourScatterplot.py
  • Bubblechart.py

Pacakges Needed

This chapter requires the following packages for the scripts used:

  • Pandas
  • Plotly

Data Used

This chapter may use the following data from the Data folder:

  • tips.csv

Syntax

Data

Data is a list of go.Scatter(), each go.Scatter() represents a category of points and/or a line.

If the data list has only 1 go.Scatter(), it is a line or a collection of points (If it is a line, refer to the Line Chart folder).

go.Scatter() has the following parameters:

  • x: Attribute on x-axis
  • y: Value on y-axis
  • marker_color: Display the color based on a numeric column
  • marker: In depth setting for data point, including
    • size: Numeric value for the data point size
    • color: Same as marker_color
    • colorscale: colourscale if color is a numeric array
    • showscale (True/False): If colorscale presents, show the scale. It does not work when color is passed with RGB values/Colour keywords
    • autocolorscale (True/False): Automatically choose a colorscale when color is a numeric array if no value is set for colorscale
  • mode: Setting on how to display
    • markers: Display data points only
    • lines: Display in lines only
    • lines+maker: Dispaly in lines with data points
  • text: The text label will displayed on the lines
  • textfont (Dictionary): Text label setting
  • hoverinfo: What information to be displayed when user hover over the bar, all the options are:
    • percent
    • label+percent
    • text
    • name

Layout

Genetic Layout parameters suggested to use:

  • title (Dictionary): Chart title and fonts
    • text: Chart title to be displayed
    • x: text location on x-dimension, from 0-1
    • y: text location on y-dimension, from 0-1
  • xaxis (Dictionary): X-axis setting
    • tickmode: Setting of ticks
    • tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
    • categoryorder: Sort the order of attributes on X-axis, either ascending or descending
      • category ascending: Sort attribute (attribute in name in Data) in ascending orders
      • category descending: Sort attribute (attribute in name in Data) in descending orders
      • total ascending: Sort value in ascending orders
      • total descending: Sort value in descending orders
      • min ascending/min descending: Sort by minimum value
      • max ascending/max descending: Sort by maximum value
      • sum ascending/sum descending: Sort by summation value
      • mean ascending/mean descending: Sort by average value
      • median ascending/median descending: Sort by median value
      • array: Follow the sorting order defined in categoryarray
    • dtick: The frequency the labels appear, the default setting is determined automatically
    • categoryarray: Define the sorting order when categoryorder is array
    • type: Set axis scale, default is linear (linear, log, date, category, multicategory)
  • yaxis (Dictionary): y-axis setting
    • tickmode: Setting of ticks
    • tickangle: Degree the tick rotate (-: Anticlockwise, +: Clockwise)
    • dtick: The frequency the labels appear, the default setting is determined automatically
    • type: Set axis scale, default is linear (linear, log, date, category, multicategory)


Scatter Plot Exclusive parameters:

  • marker_color
  • marker

Examples

Example 1 - Simple Scatterplot

# Data
data = []
data.append(go.Scatter(x=df['grand_total'], y=df['tips'],
					mode='markers'))
# Layout
layout = {'title':{'text':'Everybody\'s Tipping Distribution', 'x':0.5}}

Note: You must set mode='markers', or else the data points would be connected

Example 2 - Coloured Scatterplot (With a 3rd Numeric Dimension)

data = []
data.append(go.Scatter(x=df['grand_total'], y=df['tips'],
					marker_color=df['wait_mins'], 
					mode='markers'))
# Layout
layout = {'title':{'text':'Everybody\'s Tipping Distribution', 'x':0.5}}

Example 3 - Coloured Scatterplot (With a 3rd Categorical Dimension)

meal_type_color = [color_scheme[meal] for meal in df['meal_type'].tolist()]

# Data
data = []
data.append(go.Scatter(x=df['grand_total'], y=df['tips'],
					marker_color=meal_type_color,
					mode='markers'))
# Layout
layout = {'title':{'text':'Everybody\'s Tipping Distribution', 'x':0.5}}

Note 1: marker_color accepts both numeric values or RGB values/Colour keywords.
Note 2: However, you must convert a categorical label to RGB values/Colour keywords

Example 4 - Coloured Scatterplot (With a 3rd Categorical Dimension) with Legend

color_scheme = {'Lunch':'red','Dinner':'blue','Coffee':'brown'}

meal_type_color = [color_scheme[meal] for meal in df['meal_type'].tolist()]

# Data
data = []
for meal in df['meal_type'].unique():
	df_temp = df[df['meal_type']==meal]
	data.append(go.Scatter(x=df_temp['grand_total'], y=df_temp['tips'],
						marker_color=color_scheme[meal],
						name=meal,
						mode='markers'))
# Layout
layout = {'title':{'text':'Everybody\'s Tipping Distribution', 'x':0.5}}

Note: name does not accept array values, string only! In order to pass show a legend of the category, you have to pass mutliple go.Scatter() in order to group the data points of the same category together.

Example 5.0 - Advance Marker Configuration (Bubble Chart)

meal_type_color = [color_scheme[meal] for meal in df['meal_type'].tolist()]

# Data
data = []
data.append(go.Scatter(x=df['grand_total'], y=df['tips'],
					marker={
						'size': df['wait_mins'],
						'color': meal_type_color
					},
					mode='markers'))
# Layout
layout = {'title':{'text':'Everybody\'s Tipping Distribution', 'x':0.5}}

The simplest way to create a bubble chart is to pass a dictionary with arrays for size and/or color to marker. However, it will not generate a legend.

Example 5.1 - Advance Marker Configuration (Bubble Chart) with Legend

color_scheme = {'Lunch': 'red', 'Dinner': 'blue', 'Coffee': 'brown'}

# Data
data = []
for meal in df['meal_type'].unique():
    df_temp = df[df['meal_type'] == meal]
    meal_type_color = [color_scheme[meal]
                       for meal in df_temp['meal_type'].tolist()]
    data.append(go.Scatter(x=df_temp['grand_total'], y=df_temp['tips'],
                           marker={
        'size': df_temp['wait_mins'],
        'color': meal_type_color
    },
        name=meal,
        mode='markers'))
# Layout
layout = {'title': {'text': 'Everybody\'s Tipping Distribution', 'x': 0.5},
          'legend': {'itemsizing': 'constant'}}

If you wish to put a legend on the bubble chart, you would need to partition for each categorical label into each go.scatter(). Also, indicate legend itemsizing to be constant to prevent various size of bubble in the legend.

Reference

Plotly Documentation Scatter Plot