Skip to content

Commit

Permalink
update vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
mattansb committed Dec 11, 2023
1 parent 16289db commit 87e9f26
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 37 deletions.
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ navbar:
href: https://easystats.github.io/parameters/articles/standardize_parameters_effsize.html
- text: "Correlation Vignettes"
href: https://easystats.github.io/correlation/articles/index.html
- text: -------
- text: "Confidence Intervals"
href: reference/effectsize_CIs.html
- text: "Statistical Power"
Expand Down
105 changes: 68 additions & 37 deletions vignettes/statistical_power.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,22 +31,26 @@ set.seed(123)

# Overview

In this vignette, we focus on statistical power and the role of the `effectsize` easystats package in power analysis. As such, we are interested in accomplishing several things with this vignette:
In this vignette, we focus on statistical power and the role of the `effectsize` _easystats_ package in power analysis.
As such, we are interested in accomplishing several things with this vignette:

1. Reviewing statistical power and its value in a research task
2. Demonstrating the role of the `effectsize` package in the context of exploring statistical power
3. Highlighting the ease of calculating and understanding of statistical power via the easystats ecosystem, and the `effectsize` package specifically
3. Highlighting the ease of calculating and understanding of statistical power via the _easystats_ ecosystem, and the `effectsize` package specifically
4. Encouraging wider adoption of power analysis in applied research

*Disclaimer:* This vignette is an initial look at power analysis via Easystats. There's much more we could do, so please give us a feedback about what features would you like to see in Easystats to make power analysis easier.
*Disclaimer:* This vignette is an initial look at power analysis via _easystats_.
There's much more we could do, so please give us a feedback about what features would you like to see in _easystats_ to make power analysis easier.

## What is statistical power and power analysis?

Statistical power allows for the ability to check whether an effect observed from a statistical test actually exists, or that the null hypothesis really can be rejected (or not). Power involves many related concepts including, but not limited to, sample size estimation, significance threshold evaluation, and of course, the *effect size*.
Statistical power allows for the ability to check whether an effect observed from a statistical test actually exists, or that the null hypothesis really can be rejected (or not).
Power involves many related concepts including, but not limited to, sample size, estimation, significance threshold, and of course, the *effect size*.

## What is `effectsize`?

The goal of the `effectsize` package is to provide utilities to work with indices of effect size and standardized parameters, allowing computation and conversion of indices such as Cohen’s d, r, odds-ratios, among many others. Please explore the breadth of effect size operations included in the package by visiting the [package docs](https://easystats.github.io/effectsize/reference/index.html).
The goal of the `effectsize` package is to provide utilities to work with indices of effect size and standardized parameters, allowing computation and conversion of indices such as Cohen’s d, r, odds-ratios, among many others.
Please explore the breadth of effect size operations included in the package by visiting the [package docs](https://easystats.github.io/effectsize/reference/index.html).

## Putting the Pieces Together: Hypothesis Testing

Expand All @@ -60,17 +64,24 @@ Let's take a closer looks at the key ingredients involved in statistical power b

4. *Statistical power*: This brings us to statistical power, which can be thought of in many ways, such as the probability that we are *correctly* observing an effect or group difference, or that we are correctly rejecting the null hypothesis, and so on (see, e.g., [@cohen1988], [@greene2000] for more). But regardless of the interpretation, all of these interpretations are all pointing to a common idea: *the ability for us to trust the result we get from the hypothesis test*, regardless of the test.

Let's put these pieces together with a simple example. Say we find a "statistically significant" ($p < 0.05$) difference between two group means from a two-sample t-test. In this case, we might be tempted to stop and conclude that the signal is sufficiently strong to conclude that the groups are different from each other. But our test could be incorrect for a variety of reasons. Recall, that the p-value is a *probability*, meaning in part that we could be erroneously rejecting the null hypothesis, or that an insignificant result is insignificant due to a small sample size, and so on.
Let's put these pieces together with a simple example.
Say we find a "statistically significant" ($p < 0.05$) difference between two group means from a two-sample t-test.
In this case, we might be tempted to stop and conclude that the signal is sufficiently strong to conclude that the groups are different from each other.
But our test could be incorrect for a variety of reasons.
Recall, that the p-value is a *probability*, meaning in part that we could be erroneously rejecting the null hypothesis, or that an insignificant result is insignificant due to a small sample size, and so on.

> This is where statistical power comes in.
Statistical power helps us go the next step and more thoroughly assess the probability that the "significant" result we observed is indeed significant, or detect a cause of an insignificant result (e.g., sample size). In general, *before* beginning a broader analysis, it is a good idea to check for statistical power to ensure that you can trust the results you get from your test(s) downstream, and that your inferences are reliable.
Statistical power helps us go the next step and more thoroughly assess the probability that the "significant" result we observed is indeed significant, or detect a cause of an insignificant result (e.g., sample size).
In general, *before* beginning a broader analysis, it is a good idea to check for statistical power to ensure that you can trust the results you get from your test(s) downstream, and that your inferences are reliable.

So this is where we focus in this vignette, and pay special attention to the ease and role of effect size calculation via the `effectsize` package from easystats. The following section walks through a simple applied example to ensure 1) the concepts surrounding and involved in power are clear and digestible, and 2) that the role and value of the `effectsize` package are likewise clear and digestible. Understanding both of these realities will allow for more complex extensions and applications to a wide array of research problems and questions.
So this is where we focus in this vignette, and pay special attention to the ease and role of effect size calculation via the `effectsize` package from _easystats_.
The following section walks through a simple applied example to ensure 1) the concepts surrounding and involved in power are clear and digestible, and 2) that the role and value of the `effectsize` package are likewise clear and digestible.
Understanding both of these realities will allow for more complex extensions and applications to a wide array of research problems and questions.

# An Applied Example
# Example: Comparing Means of Independant Samples

In addition to relying on the easystats `effectsize` package for effect size calculation, we will also leverage the simple, but excellent `pwr` package for the following implementation of power analysis [@champley2017].
In addition to relying on the _easystats_ `effectsize` package for effect size calculation, we will also leverage the simple, but excellent `pwr` package for the following implementation of power analysis [@champley2017].

```{r}
library(pwr)
Expand All @@ -83,65 +94,77 @@ First, let's fit a simple two sample t-test using the mtcars data to explore mea
t <- t.test(mpg ~ am, data = mtcars)
```

There are many power tests supported by `pwr` for different contexts, and we encourage you to take a look and select the appropriate one for your application. For present purposes of calculating statistical power for our t-test, we will rely on the `pwr.t.test()` function. Here's the basic anatomy:
There are many power tests supported by `pwr` for different contexts, and we encourage you to take a look and select the appropriate one for your application.
For present purposes of calculating statistical power for our t-test, we will rely on the `pwr.t2n.test()` function.
Here's the basic anatomy:

```{r, eval = FALSE}
pwr.t.test(
pwr.t2n.test(
n1 = ..., n2 = ...,
d = ...,
n = ...,
sig.level = ...,
type = ...,
power = ...,
alternative = ...
)
```

But, before we can get to the power part, we need to collect a few ingredients first, as we can see above. The ingredients we need include:
But, before we can get to the power part, we need to collect a few ingredients first, as we can see above.
The ingredients we need include:

- `d`: effect size
- `n`: sample size (for each sample)
- `n1` and `n2`: sample size (for each sample)
- `sig.level`: significance threshold (e.g., `0.05`)
- `type`: type of t-test (e.g., one-sample, two-sample, paired)
- `alternative`: direction of the t-test (e.g., greater, lesser, two.sided)

## Calculate Effect Size
(By omitting the `power` argument, we are implying that we want the function to estimate that value for us.)

All arguments in `pwr.t.test()` can be supplied by the researcher, except `d`, which requires calculation of an effect size. This is where the `effectsize` package comes in.
## Calculate Effect Size

Given the simplicity of this example and the prevalence of Cohen's $d$, we will rely on this effect size index here. We have three ways of easily calculating Cohen's $d$ via `effectsize`.
Given the simplicity of this example and the prevalence of Cohen's $d$, we will rely on this effect size index here.
We have three ways of easily calculating Cohen's $d$ via `effectsize`.

### Approach 1: `effectsize()`

The first approach is the simplest. As previously hinted at, there is a vast literature on different effect size calculations for different applications. So, if you don't want to track down a specific one, or are unaware of options, you can simply pass the statistical test object to `effectsize()`, and either select the `type`, or leave it blank for "cohens_d", which is the default option.
The first approach is the simplest.
As previously hinted at, there is a vast literature on different effect size calculations for different applications.
So, if you don't want to track down a specific one, or are unaware of options, you can simply pass the statistical test object to `effectsize()`, and either select the `type`, or leave it blank for "cohens_d", which is the default option.

*Note*, when using the formula interface to `t.test()`, this method (currently) only gives an approximate effect size. So for this first simple approach, we update our test (`t_alt`) and then make a call to `effectsize()`.
*Note*, when using the formula interface to `t.test()`, this method (currently) only gives an approximate effect size.
So for this first simple approach, we update our test (`t_alt`) and then make a call to `effectsize()`.

```{r eval = FALSE}
t_alt <- t.test(mtcars$mpg[mtcars$am == 0], mtcars$mpg[mtcars$am == 1])
effectsize(t_alt, type = "cohens_d")
```

*Note*, users can easily store the value and/or CIs as you'd like via, e.g., `cohens_d <- effectsize(t, type = "cohens_d")[1]`.
*Note*, users can easily store the value and/or CIs as you'd like via, e.g., `cohens_d <- effectsize(t, type = "cohens_d")[[1]]`.

### Approach 2: `cohens_d()`

Alternatively, if you knew the index one you wanted to use, you could simply call the associated function directly. For present purposes, we picked Cohen's $d$, so we would call `cohens_d()`. But there are many other indices supported by `effectsize`. For example, see [here](https://easystats.github.io/effectsize/reference/index.html#standardized-differences) for options for standardized differences. Or see [here](https://easystats.github.io/effectsize/reference/index.html#for-contingency-tables) for options for contingency tables. Or see [here](https://easystats.github.io/effectsize/reference/index.html#comparing-multiple-groups) for options for comparing multiple groups, and so on.
Alternatively, if you knew the index one you wanted to use, you could simply call the associated function directly. For present purposes, we picked Cohen's $d$, so we would call `cohens_d()`.
But there are many other indices supported by `effectsize`. For example, see [here](https://easystats.github.io/effectsize/reference/index.html#standardized-differences) for options for standardized differences. Or see [here](https://easystats.github.io/effectsize/reference/index.html#for-contingency-tables) for options for contingency tables. Or see [here](https://easystats.github.io/effectsize/reference/index.html#comparing-multiple-groups) for options for comparing multiple groups, and so on.

In our simple case here with a t-test, users are encouraged to use `effectsize()` when working with htest objects to ensure proper estimation. Therefore, with this second approach of using the "named" function, `cohens_d`, users should pass the data directly to the function instead of the htest object (e.g., `cohens_d(t)`).
In our simple case here with a t-test, users are encouraged to use `effectsize()` when working with `htest` objects to ensure proper estimation.
Therefore, with this second approach of using the "named" function, `cohens_d`, users should pass the data directly to the function instead of the `htest` object (e.g., `cohens_d(t)`).

```{r eval = FALSE}
cohens_d(mpg ~ am, data = mtcars)
```

### Approach 3: `t_to_d()`

When/if the test object's class is unclear, you may get a warning message like:
When the original underlying data is not available, you may get a warning message like:

> *Warning: ... Returning an approximate effect size using t_to_d()*
In these cases, the default behavior of `effectsize` is to make a back-up call to `t_to_d()` (or which ever conversion function is appropriate based on the input). This step makes the calculation from the t-test to Cohen's $d$. Given the prevalence of calculating effect sizes for different applications and the many effect size indices available for different contexts, we have anticipated this and baked in this conversion "fail safe" in the architecture of `effectsize` by detecting the input and making the appropriate conversion. There are many conversions available in the package. Take a look [here](https://easystats.github.io/effectsize/reference/index.html#effect-size-conversion).
In these cases, the default behavior of `effectsize` is to make a back-up call to `t_to_d()` (or which ever conversion function is appropriate based on the input).
This step makes the calculation from the t-test to Cohen's $d$.
Given the prevalence of calculating effect sizes for different applications and the many effect size indices available for different contexts, we have anticipated this and baked in this conversion "fail safe" in the architecture of `effectsize` by detecting the input and making the appropriate conversion.
There are many conversions available in the package.
Take a look [here](https://easystats.github.io/effectsize/reference/index.html#effect-size-conversion).

So, for the third and final approach, and to double check that the conversion was correct, let's directly pass our t-test results to `t_to_d()`.
This can also be dne directly by the user using the `t_to_d()` function:

```{r}
t_to_d(
Expand All @@ -150,22 +173,20 @@ t_to_d(
)
```

*Note*, this approach will drop the warning as it is now explicit that we are converting from a t-test to Cohen's $d$.

## Statistical Power

Now we are ready to calculate the statistical power of our t-test given that we have collected the essential ingredients.

For the present application, the effect size obtained from `t_to_d()` (or any of the three approaches previously described) can be passed to the first argument, `d`. This value can either be from a previously-stored effect size, or can be called directly as shown below.

In line with prior caveats, since `t_to_d()` is only an approximate effect size, best practice would be to properly compute Cohen's d as suggested previously in approach 2 when the raw data are available.
For the present application, the effect size obtained from `cohens_d()` (or any of the three approaches previously described) can be passed to the `d` argument.

```{r}
pwr.t.test(
d = t_to_d(t = t$statistic, df_error = t$parameter)$d,
n = table(mtcars$am),
(result <- cohens_d(mpg ~ am, data = mtcars))
(Ns <- table(mtcars$am))
pwr.t2n.test(
n1 = Ns[1], n2 = Ns[2],
d = result[["Cohens_d"]],
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)
```
Expand All @@ -174,4 +195,14 @@ The results tell us that we are sufficiently powered, with a very high power for

Notice, though, if you were to change the group sample sizes to something very small, say `n = c(2, 2)`, then you would get a much lower power, suggesting that your sample size is too small to detect any reliable signal or to be able to trust your results.

# Example: Contingency Table

<!-- TODO -->
_To be added._

# Example: ANOVA (and Model Comparisons)

<!-- TODO -->
_To be added._

# References

0 comments on commit 87e9f26

Please sign in to comment.