How those two columns were calculated ? - Likert data #3101

Andrzej-Andrzej · 2023-07-10T11:30:41Z

Andrzej-Andrzej
Jul 10, 2023

Hi All,

I would like to kindly ask you how those two columns were calculated ?
These are Likert data.
I have got a survey to analyse involving Likert data and I am learning python, I would like to use Altair and recreate that plot for my data.
My question is, how these two columns: percentage_start and percentage_end were calculated in this dataframe ?
https://altair-viz.github.io/gallery/diverging_stacked_bar_chart.html

Thank you for your help.
Regards,
Andrzej

jonmmease · 2023-07-11T14:14:39Z

jonmmease
Jul 11, 2023
Maintainer

Hi @Andrzej-Andrzej, I'm not sure where this data originally came from, but here's an example of computing the percentage, percentage_start and percentage_end columns using pandas groupby and apply.

import pandas as pd
import altair as alt

source = pd.DataFrame([
      {
        "question": "Question 1",
        "type": "Strongly disagree",
        "value": 24,
      },
      {
        "question": "Question 1",
        "type": "Disagree",
        "value": 294,
      },
      {
        "question": "Question 1",
        "type": "Neither agree nor disagree",
        "value": 594,
      },
      {
        "question": "Question 1",
        "type": "Agree",
        "value": 1927,
      },
      {
        "question": "Question 1",
        "type": "Strongly agree",
        "value": 376,
      },
      {
        "question": "Question 2",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 2",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 2",
        "type": "Strongly agree",
        "value": 11,
      },
      {
        "question": "Question 3",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 3",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 3",
        "type": "Strongly agree",
        "value": 2,
      },

      {
        "question": "Question 4",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 4",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 4",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 4",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 4",
        "type": "Strongly agree",
        "value": 6,
      },

      {
        "question": "Question 5",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 5",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 5",
        "type": "Neither agree nor disagree",
        "value": 3,
      },
      {
        "question": "Question 5",
        "type": "Agree",
        "value": 16,
      },
      {
        "question": "Question 5",
        "type": "Strongly agree",
        "value": 4,
      },

      {
        "question": "Question 6",
        "type": "Strongly disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 6",
        "type": "Agree",
        "value": 9,
      },
      {
        "question": "Question 6",
        "type": "Strongly agree",
        "value": 3,
      },

      {
        "question": "Question 7",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 7",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 7",
        "type": "Strongly agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly agree",
        "value": 2,
      }
])

# Add type_code that we can sort by
source["type_code"] = source.type.map({
    "Strongly disagree": -2, 
    "Disagree": -1, 
    "Neither agree nor disagree": 0,
    "Agree": 1,
    "Strongly agree": 2
})
source

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentages)
    .reset_index(drop=True)
)

# Make chart
color_scale = alt.Scale(
    domain=[
        "Strongly disagree",
        "Disagree",
        "Neither agree nor disagree",
        "Agree",
        "Strongly agree"
    ],
    range=["#c30d24", "#f3a583", "#cccccc", "#94c6da", "#1770ab"]
)

y_axis = alt.Axis(
    title='Question',
    offset=5,
    ticks=False,
    minExtent=60,
    domain=False
)

alt.Chart(source).mark_bar().encode(
    x='percentage_start:Q',
    x2='percentage_end:Q',
    y=alt.Y('question:N').axis(y_axis),
    color=alt.Color('type:N').title('Response').scale(color_scale),
)

2 replies

Andrzej-Andrzej Jul 11, 2023
Author

Thank you very much indeed Jon,
If I would like to separate that function into three independent but consecutive functions, in order to add following columns in turn: percentage, percentage_end, and percentage_start, meaning doing it step-by-step, and updating "source" dataframe each and every step, how do I do it please ?

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentages)
    .reset_index(drop=True)
)

nayan2167 Jul 19, 2023

Hi, @Andrzej-Andrzej I have tried to write the code you are asking for please review it.
let me know if need further changes

# source = pd.DataFrane(...)

source["type_code"] = source.type.map({
    "Strongly disagree": -2, 
    "Disagree": -1, 
    "Neither agree nor disagree": 0,
    "Agree": 1,
    "Strongly agree": 2
})

source = source.set_index("type_code").sort_index()
source.head()

# Compute percentage of value with question group
def compute_percentage(df):
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc
    return df
source = source.groupby("question", group_keys=True).apply(compute_percentage).reset_index(drop=True).round(1)

# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
def compute_percentage_end(df):
    perc = df["percentage"]
    df["percentage_end"] = perc.cumsum() - (df[:2]["percentage"].sum() + df[2:3]["percentage"].sum() / 2)
    return df
source = source_1.groupby("question", group_keys=True).apply(compute_percentage_end).reset_index(drop=True).round(1)

# Compute percentage start by subtracting percent
source["percentage_start"] = source["percentage_end"] - source["percentage"]

# Make chart
color_scale = alt.Scale(
    domain=[
        "Strongly disagree",
        "Disagree",
        "Neither agree nor disagree",
        "Agree",
        "Strongly agree"
    ],
    range=["#c30d24", "#f3a583", "#cccccc", "#94c6da", "#1770ab"]
)

y_axis = alt.Axis(
    title='Question',
    offset=5,
    ticks=False,
    minExtent=60,
    domain=False
)

alt.Chart(source_1).mark_bar().encode(
    x='percentage_start:Q',
    x2='percentage_end:Q',
    y=alt.Y('question:N').axis(y_axis),
    color=alt.Color('type:N').title('Response').scale(color_scale),
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How those two columns were calculated ? - Likert data #3101

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How those two columns were calculated ? - Likert data #3101

Andrzej-Andrzej Jul 10, 2023

Replies: 1 comment · 2 replies

jonmmease Jul 11, 2023 Maintainer

Andrzej-Andrzej Jul 11, 2023 Author

nayan2167 Jul 19, 2023

Andrzej-Andrzej
Jul 10, 2023

Replies: 1 comment 2 replies

jonmmease
Jul 11, 2023
Maintainer

Andrzej-Andrzej Jul 11, 2023
Author