You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gen_data
--------------------------------------------------------------------------------
This section is for generating a fake dataset to test out the code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -
```{r gen_data}
## set seed
set.seed(50)
## Number of locations to select from
n <- 20
## Prefix
prefix <- "location "
##Suffix
suffix <- seq(1:n)
## Combine to create basic cluster selection dataset
clusters <- data.frame(location_name = paste0(prefix, suffix),
location_population = sample(1000:25000, n, replace = TRUE))
read_data
--------------------------------------------------------------------------------
This section is for importing your actual location and population data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
```{r read_data, warning = FALSE, message = FALSE}
### Read in location and population data ---------------------------------------------------------------
## Excel file ------------------------------------------------------------------
## read in location data sheet
# clusters <- rio::import(here::here("03 Sampling files", "cluster_data.xlsx"),
# na = ".")
identify_clusters
--------------------------------------------------------------------------------
This section is to specify or calculate the following:
- total population in the survey area
- the number of clusters for the survey
- the sampling interval, which is the total population divided by the number of clusters in the survey
- the random starting point
These figures will be combined together in a for loop to obtain a list of the clusters to be surveyed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
```{r identify_clusters}
## Set seed to ensure the random start remains the same each time
set.seed(50)
## Calculate total population
total_pop <- sum(clusters$location_population, na.rm = T)
## Calculate cumulative sum of the population
clusters$cum_sum <- cumsum(clusters$location_population)
## Specify the number of clusters
cluster_number <- 10
## Calculate sampling interval and round it up
sampling_interval <- round(total_pop/cluster_number, digits = 0)
## Select a random starting point between 1 and the sampling interval
random_start <- sample(1:sampling_interval,1)
## This for loop will identify the locations to survey
for (i in 1:length(clusters$cum_sum)) {
if (i == 1) {
clusters$number_clusters[i] = as.integer(((clusters$cum_sum[i] - random_start)/(sampling_interval) +1))
clusters$cum_clusters[i] = clusters$number_clusters
} else {
clusters$number_clusters[i] = as.integer((((clusters$cum_sum[i] - random_start)/(sampling_interval) +1) - clusters$cum_clusters[i-1]), digits = 0)
clusters$cum_clusters[i] = clusters$number_clusters[i] + clusters$cum_clusters[i-1]
}
}
The text was updated successfully, but these errors were encountered:
pbkeating
changed the title
Add to the survey section with a component on selection of clusters with probability proportional to size
Add to the survey section a component on selection of clusters with probability proportional to size
Jan 19, 2022
At MSF, we have an Excel tool that supports identification of clusters with probability proportional to size, but this can also be done in R
A first attempt at doing this with a sample dataset included for testing purposes
I've validated this using 2 datasets - previously used one from MSF activities and from this WHO doc https://www.who.int/tb/advisory_bodies/impact_measurement_taskforce/meetings/prevalence_survey/psws_probability_prop_size_bierrenbach.pdf
The text was updated successfully, but these errors were encountered: