How those two columns were calculated ? - Likert data #3101
Unanswered
Andrzej-Andrzej
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Hi @Andrzej-Andrzej, I'm not sure where this data originally came from, but here's an example of computing the import pandas as pd
import altair as alt
source = pd.DataFrame([
{
"question": "Question 1",
"type": "Strongly disagree",
"value": 24,
},
{
"question": "Question 1",
"type": "Disagree",
"value": 294,
},
{
"question": "Question 1",
"type": "Neither agree nor disagree",
"value": 594,
},
{
"question": "Question 1",
"type": "Agree",
"value": 1927,
},
{
"question": "Question 1",
"type": "Strongly agree",
"value": 376,
},
{
"question": "Question 2",
"type": "Strongly disagree",
"value": 2,
},
{
"question": "Question 2",
"type": "Disagree",
"value": 2,
},
{
"question": "Question 2",
"type": "Neither agree nor disagree",
"value": 0,
},
{
"question": "Question 2",
"type": "Agree",
"value": 7,
},
{
"question": "Question 2",
"type": "Strongly agree",
"value": 11,
},
{
"question": "Question 3",
"type": "Strongly disagree",
"value": 2,
},
{
"question": "Question 3",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 3",
"type": "Neither agree nor disagree",
"value": 2,
},
{
"question": "Question 3",
"type": "Agree",
"value": 4,
},
{
"question": "Question 3",
"type": "Strongly agree",
"value": 2,
},
{
"question": "Question 4",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 4",
"type": "Disagree",
"value": 2,
},
{
"question": "Question 4",
"type": "Neither agree nor disagree",
"value": 1,
},
{
"question": "Question 4",
"type": "Agree",
"value": 7,
},
{
"question": "Question 4",
"type": "Strongly agree",
"value": 6,
},
{
"question": "Question 5",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 5",
"type": "Disagree",
"value": 1,
},
{
"question": "Question 5",
"type": "Neither agree nor disagree",
"value": 3,
},
{
"question": "Question 5",
"type": "Agree",
"value": 16,
},
{
"question": "Question 5",
"type": "Strongly agree",
"value": 4,
},
{
"question": "Question 6",
"type": "Strongly disagree",
"value": 1,
},
{
"question": "Question 6",
"type": "Disagree",
"value": 1,
},
{
"question": "Question 6",
"type": "Neither agree nor disagree",
"value": 2,
},
{
"question": "Question 6",
"type": "Agree",
"value": 9,
},
{
"question": "Question 6",
"type": "Strongly agree",
"value": 3,
},
{
"question": "Question 7",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 7",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 7",
"type": "Neither agree nor disagree",
"value": 1,
},
{
"question": "Question 7",
"type": "Agree",
"value": 4,
},
{
"question": "Question 7",
"type": "Strongly agree",
"value": 0,
},
{
"question": "Question 8",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Neither agree nor disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Agree",
"value": 0,
},
{
"question": "Question 8",
"type": "Strongly agree",
"value": 2,
}
])
# Add type_code that we can sort by
source["type_code"] = source.type.map({
"Strongly disagree": -2,
"Disagree": -1,
"Neither agree nor disagree": 0,
"Agree": 1,
"Strongly agree": 2
})
source
def compute_percentages(df):
# Set type_code as index and sort
df = df.set_index("type_code").sort_index()
# Compute percentage of value with question group
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
# Compute percentage start by subtracting percent
df["percentage_start"] = df["percentage_end"] - perc
return df
source = (
source
.groupby("question", group_keys=True)
.apply(compute_percentages)
.reset_index(drop=True)
)
# Make chart
color_scale = alt.Scale(
domain=[
"Strongly disagree",
"Disagree",
"Neither agree nor disagree",
"Agree",
"Strongly agree"
],
range=["#c30d24", "#f3a583", "#cccccc", "#94c6da", "#1770ab"]
)
y_axis = alt.Axis(
title='Question',
offset=5,
ticks=False,
minExtent=60,
domain=False
)
alt.Chart(source).mark_bar().encode(
x='percentage_start:Q',
x2='percentage_end:Q',
y=alt.Y('question:N').axis(y_axis),
color=alt.Color('type:N').title('Response').scale(color_scale),
) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
I would like to kindly ask you how those two columns were calculated ?
These are Likert data.
I have got a survey to analyse involving Likert data and I am learning python, I would like to use Altair and recreate that plot for my data.
My question is, how these two columns: percentage_start and percentage_end were calculated in this dataframe ?
https://altair-viz.github.io/gallery/diverging_stacked_bar_chart.html
Thank you for your help.
Regards,
Andrzej
Beta Was this translation helpful? Give feedback.
All reactions