Skip to content

Latest commit

 

History

History

BoxPlot

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Box Plots

Box Plots are plots to display the statistics of the given data, including median, quartiles, sometimes mean, and even outliers. In this folder, we will go over how to create box plots with Python and Plotly.

Files

The following scripts are used in this chapter:

  • Example_box.py

Pacakges Needed

This chapter requires the following packages for the scripts used:

  • Pandas
  • Plotly

Data Used

This chapter may use the following data from the Data folder:

  • salary.csv

Syntax

Data

Data is a list of go.Box(), each go.Box() represents an attribute. If the data list has only 1 go.Box(), there is only 1 attribute (Only 1 box bar presents).

go.Box() has the following parameters:

  • x: Value on x-axis (Horizontal Box Plot, if used, y should be empty)
  • y: Value on y-axis (Vertical Box Plot, if used, x should be empty)
  • name: Attribute, it will be represented as 1 box bar and displayed on x-axis
  • marker_color: Box bar colour (Take colour spelliing in string or RGB in string)
  • quartilemethod: The method to calculate quartile, takes only linear, inclusive, and exclusive
  • boxmean: Parameter whether the mean is presented on the chart, takes only True and False, default as False
  • boxpoints: What data point is plotted, takes only all, outliers, suspectedoutliers, and False
  • jitter: Space created to separate data points and box bars, takes between 0 and 1
  • pointpos: Relative position of points wrt box

Layout

Genetic Layout parameters suggested to use:

  • title (Dictionary): Chart title and fonts
    • text: Chart title to be displayed
    • x: text location on x-dimension, from 0-1
    • y: text location on y-dimension, from 0-1
  • xaxis (Dictionary): X-axis setting
    • tickmode: Setting of ticks
  • yaxis (Dictionary): y-axis setting
    • tickmode: Setting of ticks
  • boxmode: Use group to group together boxes of different traces


Box Plot Exclusive parameters:
  • boxmean: Parameter whether the mean is presented on the chart
  • boxpoints: What data point is plotted, takes only all, outliers, suspectedoutliers, and False
  • jitter: Space created to separate data points and box bars, takes between 0 and 1
  • pointpos: Relative position of points wrt box
  • boxmode: Use group to group together boxes of different traces

Syntax Difference with Bar Chart

One difference between box plot and bar chart is syntax used for attribute and values. In box plot's go.Box(), name is used for attribute while either x or y is used for values. Note that, no error will be threw if both x and y are passed with values. However, it does not display the proper box plot as desired. Therefore, only pass the values with x or y, never pass both parameters (Meaning: Only use y and name if you want vertical box plot or use x and name if you want horizontal box plot)

Example - Simple Box Plot

# Prepare data
data = []
for school in df['school'].unique():
	df_temp = df[df['school']==school]
	data.append(go.Box(y=df_temp['salary'], name=school, boxmean=True))

# Layout
layout = dict(title={'text':'Alumni Salary across Schools', 'x':0.5},
              barmode='group', xaxis=dict(tickmode='linear'))

Reference

Plotly Documentation Box Plot