Skip to content

The purpose of this exercise is to calculate how many people need to be shown the new assets before we can check if the results are a significant improvement.

Notifications You must be signed in to change notification settings

charlstown/SampleSizeCalculation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sample size determination in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The purpose of this exercise is to calculate how many people need to be shown the new assets before we can check if the results are a significant improvement.\n",
    "\n",
    "Nosh Mish Mosh wants to run an experiment to see if we can convince more people to purchase meal plans if we use a more artisanal-looking vegetable selection. We’ve photographed these modern meals with blush tomatoes and graffiti eggplants, but aren’t sure if this strategy will sell enough units to benefit from establishing a business relationship with a new provider.\n",
    "\n",
    "Before running this experiment, of course, we need to know how many people have to see the new assets. We don’t want customers seeing food that we won’t end up offering. Of course, there are three things we need to know before we determine that number."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The basics in sample size determination for A/B Tests\n",
    "- Baseline conversion rate: the approximate percent of the population that satisfies the hypothesis.\n",
    "- Statistical significance: the probability of the study rejecting the null hypothesis, given that the null hypothesis was assumed to be true.\n",
    "- Minimum detectable effect: the lift is the minimum difference between the A and B samples we want to be able to detect.\n",
    "- The confident level: How confident we need to be."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.619751Z",
     "start_time": "2020-02-25T19:40:15.154006Z"
    }
   },
   "outputs": [],
   "source": [
    "# Libraries\n",
    "import noshmishmosh\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy import stats\n",
    "import math\n",
    "import seaborn as sns\n",
    "from matplotlib import pyplot\n",
    "\n",
    "# Functions\n",
    "def tolist(tag):\n",
    "    out = [i[tag] for i in visits]\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 00. Generating the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.640737Z",
     "start_time": "2020-02-25T19:40:16.622747Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "     ids            name  clickedthrough  purchased  moneyspent\n",
      "0  83421    Michael Todd            True      False         0.0\n",
      "1  46042  Brianna Harmon            True      False         0.0\n",
      "2  23766    Mario Arnold           False      False         0.0\n",
      "3  20859      Paul Quinn           False      False         0.0\n",
      "4  57771    Jerome Moore            True      False         0.0\n"
     ]
    }
   ],
   "source": [
    "visits = noshmishmosh.customer_visits\n",
    "\n",
    "df_visits = pd.DataFrame({'ids': tolist('id'),\n",
    "              'name': tolist('name'),\n",
    "              'clickedthrough': tolist('clickedthrough'),\n",
    "              'purchased': tolist('purchased'),\n",
    "              'moneyspent': tolist('moneyspent')\n",
    "             })\n",
    "\n",
    "\n",
    "print(df_visits.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 01. Calculating the baseline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.653729Z",
     "start_time": "2020-02-25T19:40:16.644734Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of visitors that purchased: 93\n",
      "Number of total visitors: 500\n",
      "The baseline is: 18.6 %\n"
     ]
    }
   ],
   "source": [
    "paying_visitors = df_visits[df_visits.purchased == True].ids.count()\n",
    "print('Number of visitors that purchased: {}'.format(paying_visitors))\n",
    "\n",
    "total_visitors = df_visits.ids.count()\n",
    "print('Number of total visitors: {}'.format(total_visitors))\n",
    "\n",
    "baseline = paying_visitors/total_visitors\n",
    "print('The baseline is: {} %'.format(baseline*100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 02. Minimum Detectable Effect\n",
    "We’d like to know for sure that we’ll be pulling in at least $1240 more every week. In order to figure out how many more customers we need. We’ll have to investigate the average revenue generated from a given sale. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.663722Z",
     "start_time": "2020-02-25T19:40:16.656727Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "These are the first 5 payments sample: [39.01, 10.16, 36.88, 23.41, 33.49]\n",
      "The average payment is: 27.0 $\n",
      "We need 46 payments to pull in the revenue\n"
     ]
    }
   ],
   "source": [
    "revenue = 1240\n",
    "\n",
    "payments = noshmishmosh.money_spent\n",
    "print('These are the first 5 payments sample: {}'.format(payments[:5]))\n",
    "\n",
    "mean_payments = round(np.mean(payments))\n",
    "print('The average payment is: {} $'.format(mean_payments))\n",
    "\n",
    "n_payments = np.ceil(revenue/mean_payments)\n",
    "print('We need {} payments to pull in the revenue'.format(int(n_payments)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 03. Calculating the lift percentage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now find the percent lift required. What percentage increase is needed to pull in the revenue?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.670717Z",
     "start_time": "2020-02-25T19:40:16.665721Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The lift required is: 9.2%\n"
     ]
    }
   ],
   "source": [
    "lift = n_payments/total_visitors\n",
    "print('The lift required is: {}%'.format(lift*100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "In order to find our minimum detectable effect, we need to express percentage_point_increase as a percent of baseline_percent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.677713Z",
     "start_time": "2020-02-25T19:40:16.672716Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The minimum detectable effect is: 1.0\n"
     ]
    }
   ],
   "source": [
    "minimum_detectable_effect = np.ceil(lift/baseline)\n",
    "print('The minimum detectable effect is: {}'.format(minimum_detectable_effect))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 04. Overview of the two proportion Z test\n",
    "The two-sample Z test for proportions determines whether a population proportion p1 is equal to another population proportion p2. In our example, p1 and p2 are the proportion of visitors before and after the marketing change, and we want to see whether there was a statistically significant increase in p2 over p1.\n",
    "\n",
    "\\begin{equation}\n",
    "Z = \\frac{P2-P1}{\\sqrt {P*(1-p*)(\\frac{1}{n} + \\frac{1}{n})}}\n",
    "\\end{equation}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{equation}\n",
    "p* = \\frac{n1p1 + n2p2}{n1 + n2}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where p* is the proportion of 'successes'. In this example the number of paying visitors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ultimately, we want to make sure we’re able to calculate a difference between p1 and p2 when it exists. So, let’s assume you know that the “true” difference that exists between p1 and p2. Then, we can look at sample size requirements for various confidence levels and absolute levels of p1.\n",
    "\n",
    "We need a way of figuring out Z, so we can determine whether a given sample size provides statistically significant results, so let’s define a function that returns the Z value given p1, p2, n1, and n2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.684708Z",
     "start_time": "2020-02-25T19:40:16.679712Z"
    }
   },
   "outputs": [],
   "source": [
    "# Test that both populations have the same proportion.\n",
    "def z_calc(p1, p2, n1, n2):\n",
    "    p_star = (p1*n1 + p2*n2) / (n1 + n2)\n",
    "    return (p2 - p1) / math.sqrt(p_star*(1 - p_star)*((1.0 / n1) + (1.0 / n2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we can define a function that returns the sample required, given p1 (the before probability), p_diff (i.e. p2-p1), and alpha (which represents the p-value, or 1 minus the confidence level). For simplicity, we’ll just assume that n1 = n2. If you know in advance that n1 will have about a quarter of the size of n2, then it’s trivial to incorporate this into the function. However, you typically don’t know this in advance and in our scenario an equal sample assumption seems reasonable.\n",
    "\n",
    "The function is fairly simplistic: it counts up from n starting from 1, until n gets large enough where the probability of that statistic being that large (i.e. the p-value) is less than alpha (in this case, we would reject the null hypothesis that p1 = p2). The function uses the normal distribution available from the scipy library to calculate the p-value and compare it to alpha."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:16.692704Z",
     "start_time": "2020-02-25T19:40:16.687707Z"
    }
   },
   "outputs": [],
   "source": [
    "# Sample calculator\n",
    "def sample_required(p1, p_diff, alpha):\n",
    "    n = 1\n",
    "    while True:\n",
    "        z = z_calc(p1, p1+p_diff, n1=n, n2=n)\n",
    "        p = 1 - stats.norm.cdf(z)\n",
    "        if p < alpha:\n",
    "            break\n",
    "        n += 1\n",
    "    return n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 04. Calculating the sample size\n",
    "These functions we’ve defined provide the main tools we need to determine the minimum sample levels required. In this example, we want to calculate a 9.2% difference with a 95% confidence level, with a p1 as a baseline of 18.6%. We can calculate our Sample Size needed in this case and plot all the sample sizes depending on the baseline or the initial probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:45:12.949699Z",
     "start_time": "2020-02-25T19:45:12.925714Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The final sample size is calculated with a baseline of 18.6% and a lift of 9.2%. \n",
      "\n",
      "For this example Nosh Mish Mosh needs to show the new pictures to 114 people to make sure there is any improvement\n"
     ]
    }
   ],
   "source": [
    "sample_size = sample_required(baseline, lift, .05)\n",
    "print('The final sample size is calculated with a baseline of {}% and a lift of {}%.'.format(baseline*100, lift*100), '\\n')\n",
    "print('For this example Nosh Mish Mosh needs to show the new pictures to {} people to make sure there is any improvement'.format(sample_size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 05. Ploting the minimum sample size needed by the initial probability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:40:17.964127Z",
     "start_time": "2020-02-25T19:40:16.719287Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Probability Difference  Sample Size to Detect Difference Confidence Level  \\\n",
      "0                   0.092                                29              95%   \n",
      "1                   0.092                                34              95%   \n",
      "2                   0.092                                40              95%   \n",
      "3                   0.092                                45              95%   \n",
      "4                   0.092                                51              95%   \n",
      "\n",
      "   Initial Probability  \n",
      "0                  0.0  \n",
      "1                  1.0  \n",
      "2                  2.0  \n",
      "3                  3.0  \n",
      "4                  4.0  \n"
     ]
    }
   ],
   "source": [
    "baseline_range = [i*.01 for i in range(96)]\n",
    "\n",
    "data = []\n",
    "for bsl in baseline_range:\n",
    "    record = {}\n",
    "    record['Probability Difference'] = lift\n",
    "    record['Sample Size to Detect Difference'] = sample_required(p1=bsl,\n",
    "                                                                p_diff=lift,\n",
    "                                                                alpha=.05)\n",
    "    record['Confidence Level'] = '95%'\n",
    "    record['Initial Probability'] = bsl * 100\n",
    "    data.append(record)\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "print(df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:55:09.337700Z",
     "start_time": "2020-02-25T19:55:08.378283Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 648x648 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = pyplot.subplots(figsize=(9, 9))\n",
    "sns.set(style='darkgrid')\n",
    "\n",
    "plot = sns.pointplot(x='Initial Probability',\n",
    "            y='Sample Size to Detect Difference',\n",
    "            hue='Confidence Level', ax = ax,\n",
    "            data=df)\n",
    "\n",
    "labels = []\n",
    "for i in range(0, 101):\n",
    "    if i % 5 == 0:\n",
    "        labels.append(str(i))\n",
    "    else:\n",
    "        labels.append('')\n",
    "plot.set_xticklabels(labels=labels);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we see in the figure, the highest sample size occurs when the initial probability or the baseline is close to 50% of the population (P1). This means that is harder to detect a difference when the population baseline is a homogeneous mixture."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 06. Ploting the minimum sample size required by the lift expected"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T19:59:19.000572Z",
     "start_time": "2020-02-25T19:59:18.743730Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "18.6\n",
      "    Lift  Sample required Confidence Level\n",
      "0      5              361              95%\n",
      "1      6              255              95%\n",
      "2      7              191              95%\n",
      "3      8              148              95%\n",
      "4      9              119              95%\n",
      "..   ...              ...              ...\n",
      "70    75                3              95%\n",
      "71    76                3              95%\n",
      "72    77                3              95%\n",
      "73    78                3              95%\n",
      "74    79                3              95%\n",
      "\n",
      "[75 rows x 3 columns]\n"
     ]
    }
   ],
   "source": [
    "print(baseline*100)\n",
    "lift_range = range(5, 80)\n",
    "\n",
    "\n",
    "samples_lift = [sample_required(baseline, lift/100, .05) for lift in lift_range]\n",
    "dic = {'Lift': lift_range, 'Sample required': samples_lift, 'Confidence Level': '95%'} \n",
    "    \n",
    "df = pd.DataFrame(dic)\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-02-25T20:00:27.258521Z",
     "start_time": "2020-02-25T20:00:26.463011Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 648x648 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig, ax = pyplot.subplots(figsize=(9, 9))\n",
    "sns.set(style='darkgrid')\n",
    "\n",
    "plot = sns.pointplot(x='Lift',\n",
    "            y='Sample required',\n",
    "            hue='Confidence Level', ax = ax,\n",
    "            data=df)\n",
    "\n",
    "labels = []\n",
    "for i in range(5, 101):\n",
    "    if i % 5 == 0:\n",
    "        labels.append(str(i))\n",
    "    else:\n",
    "        labels.append('')\n",
    "plot.set_xticklabels(labels=labels);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we see in this figure, we need an exponentially bigger sample size if we want to be able to detect smaller lifts, while we are able to detect easily bigger lifts with a smaller sample size."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "The example shows how Python can be a very useful tool for performing “back of the envelope” calculations, such as estimates of required sample sizes for tests where this determination is not straightforward. These calculations can save you a lot of time and money, especially when you’re thinking about collecting your own data for a research project."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

About

The purpose of this exercise is to calculate how many people need to be shown the new assets before we can check if the results are a significant improvement.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages