thomas_exercise.Rmd

---
title: "Aid with Dignity"
author: "Psych 251"
date: "11/18/2022"
output: html_document
---

# Introduction

We're reproducing some data from Thomas et al. (2020), "Toward a science of delivering aid with dignity: Experimental evidence and local forecasts from Kenya." 

[Catherine is a student and former TA in the course.]

In particular, we'll focus on Study 1: Experimental Impacts of Aid Narratives on Recipients.

The idea is to reproduce the basic effects of treatment that they saw, in particular, people who got a cash transfer heard that it was from either 1)  a community empowerment organization, 2) a poverty alleviation organization, or 3) an individual empowerment organization. Com (condition 1) and Ind (condition 3) recipients were found to watch more business education videos, and reported greater self efficacy and mobility and less stigma.

Data from: [https://www.pnas.org/content/117/27/15546]()

Repository: [https://osf.io/v3cr4/]() 

This is an example of a pretty nicely organized repository that includes a readme, a great codebook, all code and data, etc. 

```{r}
library(tidyverse)
load("data/KenyaData.RData")

k1_df <- tibble(k1_df)
```

Key variables are `treat` (treatment condition), `vid.num`, and various psychological treatments. We'll focus on videos and self-efficacy.

Let's make the appropriate composite for self-efficacy, copied from their code:

```{r}
scale.means = function (df, ..., na.rm=FALSE) {
  vars = unlist(list(...))
  mean_vars = rowMeans(df[,vars], na.rm=na.rm)
  return(mean_vars)
}

for (var in c(k1_df$sel.con, k1_df$sel.pers, k1_df$sel.com, 
              k1_df$sel.prob, k1_df$sel.bett)) {
  var[var < 0] <- NA
}

k1_df$sel.score.avg <- scale.means(k1_df, "sel.con", "sel.pers", "sel.com", 
                                   "sel.prob", "sel.bett", na.rm = T)
k1_df$sel.score <- scale(k1_df$sel.con) + scale(k1_df$sel.pers) + 
  scale(k1_df$sel.com) + scale(k1_df$sel.prob) + scale(k1_df$sel.bett)
k1_df$sel.score.z <- scale(k1_df$sel.score)
```

# Descriptives

Always good to make histograms of the dependent variables (`sel.score.avg` and `vid.num`)! Use facets, fills, etc. to try and explore how these relate to treatment. 

```{r}
ggplot(k1_df, aes(x = ...)) + 
  geom_histogram() 

# try some faceting, e.g. facet_wrap(~treat)
```


# Reproduce main analysis

Let's try to reproduce Figures 1A and 1B from the paper, pasted below:
```{r}
include_graphics('data/thomas_exercise_fig.png')
```

First, we'll reproduce the behavioral result that `vid.num` is influenced by `treat`! (Figure 1A). Build out this simple stub code.  

```{r}
k1_summary <- k1_df |>
  group_by(treat) |>
  summarise(...) 

ggplot(k1_summary, ...) 
```

Next, repeat this code to make Figure 1B for `sel.score.avg`. 

```{r}
...
```

Note that it's actually tricky to truncate the y axis for a barplot with ggplot (you can do it but it takes work!).

# Show the distribution of data

Think about how you could show the distribution of data for the `sel.score.avg` figure in a way that helps with the interpretation. 

```{r}
...
```


# Exploratory analysis

Consider exploratory analysis of demographic variables and how they relate to the self-efficacy outcome variable (`sel.score.avg`). One way to do this is to create scatter plots of each outcome by the demographics and consider how this relationship is modulated by treatment condition.  

* `soc.fem` = gender
* `soc.age` = age
* `ses.unemployed` = employment
* `soc.sav` = savings > 1000ksh
* `soc.inc` = income

```{r}
...
```