Skip to content

Commit

Permalink
eds
Browse files Browse the repository at this point in the history
  • Loading branch information
graemeblair committed Aug 22, 2023
1 parent 60a612f commit 9e537a2
Show file tree
Hide file tree
Showing 10 changed files with 46 additions and 43 deletions.
4 changes: 2 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ book:
- declaration-diagnosis-redesign/declaring-designs.qmd
- declaration-diagnosis-redesign/specifying-model.qmd
- declaration-diagnosis-redesign/defining-inquiry.qmd
- declaration-diagnosis-redesign/crafting-research-strategy.qmd
- declaration-diagnosis-redesign/crafting-data-strategy.qmd
- declaration-diagnosis-redesign/choosing-answer-strategy.qmd
- declaration-diagnosis-redesign/diagnosing-designs.qmd
- declaration-diagnosis-redesign/redesigning.qmd
Expand Down Expand Up @@ -72,7 +72,7 @@ book:
- declaration-diagnosis-redesign/declaring-designs.qmd
- declaration-diagnosis-redesign/specifying-model.qmd
- declaration-diagnosis-redesign/defining-inquiry.qmd
- declaration-diagnosis-redesign/crafting-research-strategy.qmd
- declaration-diagnosis-redesign/crafting-data-strategy.qmd
- declaration-diagnosis-redesign/choosing-answer-strategy.qmd
- declaration-diagnosis-redesign/redesigning.qmd
- declaration-diagnosis-redesign/diagnosing-designs.qmd
Expand Down
47 changes: 25 additions & 22 deletions declaration-diagnosis-redesign/choosing-answer-strategy.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ But that doesn't tell us much about how confident we should be in this answer. T

::: {#def-ch9num1 .declaration}

Italian village design
Italian village design.

```{r, file = "scripts_declarations/declaration_9.1.R"}
```
Expand Down Expand Up @@ -212,27 +212,30 @@ What does characterize our uncertainty about a significance test? The Type I and
### Bayesian formalizations

#### Answer

Bayesian answer strategies sometimes target the same inquiries as classical approaches, but rather than seeking a point estimate, they try to generate rational beliefs over possible values of the estimand. Rather than trying to provide a single best guess for the average age in a village, a Bayesian answer strategy would try to figure out how likely different answers are given the data. To do so they need to know how likely different age distributions are *before* seeing the data---the priors---and the likelihood of different types of data for each possible age distribution. A Bayesian who knows anything about Italy would likely not be very impressed by the "15" answer given by the point estimator in @sec-ch9s2ss1 because, prior to seeing any samples, they would likely expect that the answer had to be bigger than this. Bayesians would chalk the answer "15" down to an unusual draw.

The Bayesian answer strategy specifies a prior distribution over the average age (here a normal distribution centered on 50 to reflect a prior that Italian villages skew older) as well as a lognormal distribution for ages. Here we retain the (median) posterior estimates for average age alongside a standard error based on the posterior variance. In the `.summary` argument we ask the tidier to exponentiate the coefficient estimate and standard error before returning them.

::: {#def-ch9num4 .declaration}
::: {#def-ch9num3 .declaration}

Italian village design a la Bayes
Italian village design a la Bayes.

```{r, file = "scripts_declarations/declaration_9.4.R"}
```{r, file = "scripts_declarations/declaration_9.3.R"}
```

:::

::: {#lem-ch9num4}
::: {#lem-ch9num3}

Diagnosis of Italian village design a la Bayes.

We can then simulate this design in the same way and examine the distribution of estimates we might get.

```{r}
#| eval: false
diagnosis_9.4 <- diagnose_design(declaration_9.4)
diagnosis_9.3 <- diagnose_design(declaration_9.3)
```

What we see in @fig-ch9num3 is that using the same (poor) data strategy as before, a Bayesian answer strategy gets us a somewhat tighter distribution on our answer, but exhibits greater bias: the average estimate is higher than the estimand. We might accept higher bias for lower variance if overall, the root-mean-squared error is lower for the Bayesian approach. See @sec-ch10s4ss1 for a further discussion of RMSE. A major difference between the Bayesian and classical approaches is the handling of prior beliefs, which carry a lot of weight in the Bayesian estimation, but no weight in the classical approach.
Expand Down Expand Up @@ -280,21 +283,21 @@ answer_strategy(three_italian_citizens) |>
kable_styling()
```

::: {#def-ch9num5 .declaration}
::: {#def-ch9num4 .declaration}

Italian village declaration, varying the true mean age parameter.

```{r, file = "scripts_declarations/declaration_9.5.R"}
```{r, file = "scripts_declarations/declaration_9.4.R"}
```

:::

::: {#lem-ch9num5}
::: {#lem-ch9num4}

Diagnosing the Italian village design over many values of the true mean age parameter
Diagnosing the Italian village design over many values of the true mean age parameter.

```{r, eval=FALSE}
diagnosis_9.5 <- diagnose_designs(declaration_9.5)
diagnosis_9.4 <- diagnose_designs(declaration_9.4)
```

![Type II error rates of the Italian village design](/figures/figure-9-4){#fig-ch9num4}
Expand Down Expand Up @@ -399,9 +402,9 @@ As an illustration of the logic of the approach using `DeclareDesign`, we compar

First we set up a design that resamples from the @clingingsmith2009estimating's study of the effect of being randomly assigned to go on Hajj on the tolerance of foreigners.

:::{.definition #bootstrapping}
::: {#def-ch9num5 .declaration}

Bootstrapped standard errors
Bootstrapped standard errors.

```{r, file = "scripts_declarations/declaration_9.5.R"}
```
Expand All @@ -410,7 +413,7 @@ Bootstrapped standard errors

::: {#lem-ch9numbootstrap}

Bootstrap diagnosis
Bootstrap diagnosis.

The bootstrapped estimates are gotten by summarizing over multiple runs of the design:

Expand Down Expand Up @@ -464,7 +467,7 @@ This idea can be summarized as "analyze as you randomize," a dictum attributed t

::: {#def-ch9num6 .declaration}

Restoring parallelism design
Restoring parallelism design.

```{r, file = "scripts_declarations/declaration_9.6.R"}
```
Expand All @@ -474,7 +477,7 @@ Restoring parallelism design

::: {#lem-ch9num6}

Restoring parallelism diagnosis
Restoring parallelism diagnosis.

```{r}
#| eval: false
Expand All @@ -492,7 +495,7 @@ This principle applies most clearly to the bias diagnosand, but it applies to ot

More generally, the principle to "design agnostically" implies that we should choose "agnostic" answer strategies, by which we mean answer strategies that produce good answers under a wide range of models. Selecting answer strategies that are robust to multiple models ensures that we get good answers not only when our model is spot on --- which is rare! --- but also under many possible circumstances.

Understanding whether the choices over answer strategies---logit or probit or OLS---depend on the model being a particular way is crucial to making a choice. For example, many people have been taught that whenever the outcome variable is binary, OLS is inappropriate and they must use a binary choice model like logit instead. When the inquiry is the probability of success for each unit and we use covariates to model them, how much better logit performs at estimating probabilities depends on the model. When probabilities are all close to 0.5, the two answer strategies both perform well. When the probabilities spread out from 0.5, OLS is less robust and logit beats it (@hellevik2009). In the same breath, however, we can consider these same two estimators in the context of a randomized experiment with a binary outcome. Here, OLS can be just as strong as logit, no matter what the distribution of the potential outcomes. In this setting, when designing agnostically, we find that both estimators are robust (see @sec-ch10s3ss1).
Understanding whether the choices over answer strategies---logit or probit or OLS---depend on the model being a particular way is crucial to making a choice. For example, many people have been taught that whenever the outcome variable is binary, OLS is inappropriate and they must use a binary choice model like logit instead. When the inquiry is the probability of success for each unit and we use covariates to model them, how much better logit performs at estimating probabilities depends on the model. When probabilities are all close to 0.5, the two answer strategies both perform well. When the probabilities spread out from 0.5, OLS is less robust and logit beats it []. In the same breath, however, we can consider these same two estimators in the context of a randomized experiment with a binary outcome. Here, OLS can be just as strong as logit, no matter what the distribution of the potential outcomes. In this setting, when designing agnostically, we find that both estimators are robust (see @sec-ch10s3ss1).

Designing agnostically has something in common with robustness checks: both share the motivation that we have fundamental uncertainty about the true model. A robustness check is an *alternative* answer strategy that changes some model assumption that the main answer strategy depends on. Presenting three estimates of the same parameter under different answer strategies (logit, probit, and OLS) and making a joint decision based on the set of estimates about whether the main analysis is "robust" is a procedure for assessing "model dependence" --- meaning, dependence on *statistical* models. But robustness checks are just answer strategies themselves, and we should declare them and diagnose them to understand whether they are good answer strategies. We want to understand the *properties* of the robustness check, e.g., under what models and how frequently does it correctly describe the main answer strategy as "robust."

Expand Down Expand Up @@ -535,18 +538,18 @@ observed_estimate |>

Here we declare the null model (indicated by `potential_outcomes(Y ~ 0 * Z + marked_register_2014)`) and add it to the data and answer strategies:

:::{.definition #randomizationinference}
::: {#def-ch9num7 .declaration}

Randomization inference under the sharp null
Randomization inference under the sharp null.

```{r, file = "scripts_declarations/declaration_9.3.R"}
```{r, file = "scripts_declarations/declaration_9.7.R"}
```

:::

::: {#lem-ch9num3}
::: {#lem-ch9num7}

Randomization inference "diagnosis"
Randomization inference "diagnosis".

:::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -233,10 +233,10 @@ Thus far we have considered assignment strategies that allocate subjects to just
+-------------------------------------------------+--------------------------------------------------------------------------------------------+
| Design | Description and randomizr R code |
+=================================================+============================================================================================+
| Multi-arm random assignment (complete) | Fixed numbers of units are assigned to three or more conditions |
| Multi-arm random assignment (complete) | Fixed numbers of units are assigned to three or more conditions |
| | |
| | ``` |
| | complete_ra(N = 100, m_each = c(40, 30, 30)) |
| | complete_ra(N = 100, m_each = c(40, 30, 30)) |
| | ``` |
+-------------------------------------------------+--------------------------------------------------------------------------------------------+
| Factorial random assignment (complete) | Units are assigned to receive one treatment, the second treatment, neither, or both |
Expand Down Expand Up @@ -300,7 +300,7 @@ Descriptive inference is threatened whenever measurements differ from the quanti

Some measurement strategies exhibit little to no measurement error. It's easy enough to measure some plain matters of fact, like whether a country is a member of the European Union (though clerical errors could still crop up). In the social sciences, most measurement strategies are threatened by the possibility of measurement errors due to any number of biases (e.g., recall bias, observer bias, Hawthorn effects, demand effects, sensitivity bias, response substitution, among many others).

We often describe measurement error in two ways, measurement *validity*, and measurement *reliability*. Validity is the difference between the observed and latent outcome, $Y^{\mathrm obs} - Y^*$. Reliability is the consistency of the measurements we would obtain if we were to repeat the measurement many times, which we can operationalize as low variance of the measurements:, $\V(Y_1^{\mathrm obs}, Y_2^{\mathrm obs}, \ldots, Y_k^{\mathrm obs})$. We would of course like to always select valid, reliable measurement strategies. When no perfect measure is available, choices among alternative measurement strategies typically reduce to tradeoffs between their validity and reliability.
We often describe measurement error in two ways, measurement *validity*, and measurement *reliability*. Validity is the difference between the observed and latent outcome, $Y^{\mathrm obs} - Y^*$. Reliability is the consistency of the measurements we would obtain if we were to repeat the measurement many times, which we can operationalize as low variance of the measurements:, $\mathbb{V}(Y_1^{\mathrm obs}, Y_2^{\mathrm obs}, \ldots, Y_k^{\mathrm obs})$. We would of course like to always select valid, reliable measurement strategies. When no perfect measure is available, choices among alternative measurement strategies typically reduce to tradeoffs between their validity and reliability.

To make these choices, we depend on methodological research whose main~inquiries are the reliability and validity of particular measurement procedures. Sometimes measurement studies are presented as "validation" studies that compare a proposed measure to a "ground truth." But even "ground truths" must be measured, usually with an expensive or otherwise unfeasible approach (otherwise they would be no need for the alternative measurement). Further, neither measurement is known to be exactly $Y^*$, so ultimately validation studies are comparisons of multiple techniques each with their own advantages and disadvantages. This fact does not make these studies useless, but rather underlines that they rely on our faith in ground truths.

Expand Down Expand Up @@ -335,7 +335,7 @@ Just as we can use data-adaptive methods to hone in on the most effective treatm

<!-- ### Robustness -->

@exm-designagnostically: `Design agnostically` focuses on models, encouraging us to consider plausible variations of the set of variables, their probability distributions, and the relationships between them. The principle has implications for the data and answer strategies also, in particular we should choose *D* and *A* such that we have good designs under a wide array of plausible models.
@exm-designagnostically: *Design agnostically* focuses on models, encouraging us to consider plausible variations of the set of variables, their probability distributions, and the relationships between them. The principle has implications for the data and answer strategies also, in particular we should choose *D* and *A* such that we have good designs under a wide array of plausible models.

In this section, we discuss four core threats to data strategies and ways to respond to them: noncompliance (failure to treat), attrition (failure to be included in the sample or provide measures), excludability violations (causal effects of random sampling, random assignment, or measurement on the latent outcome), and interference (the dependence of potential outcomes on whether other units are treated). These threats are often discussed in the context of experimental designs, but the core issues they raise are relevant for observational designs also. If serious, these threats may necessitate changes to the inquiry, the answer strategy, or the data strategy~itself.

Expand Down
Loading

0 comments on commit 9e537a2

Please sign in to comment.