diff --git a/book/_build/.doctrees/environment.pickle b/book/_build/.doctrees/environment.pickle index 26d8923..9ca0962 100644 Binary files a/book/_build/.doctrees/environment.pickle and b/book/_build/.doctrees/environment.pickle differ diff --git a/book/_build/.doctrees/sampling.doctree b/book/_build/.doctrees/sampling.doctree index 64bb8d2..63269e4 100644 Binary files a/book/_build/.doctrees/sampling.doctree and b/book/_build/.doctrees/sampling.doctree differ diff --git a/book/_build/html/_sources/sampling.md b/book/_build/html/_sources/sampling.md index 0cd4c93..3bcfa0d 100644 --- a/book/_build/html/_sources/sampling.md +++ b/book/_build/html/_sources/sampling.md @@ -24,6 +24,27 @@ One important type of bias is **selection bias**, which occurs when the sample s The pollster might have a smarter colleague who assembles a list of names that is truly representative of all voters. The next hurlde is getting responses. If a phone survey is used, there will be many who don't answer. This is a problem *if* those who don't respond are systematically different than those who do respond. In fact, old people are usually more available and willing to answer the phone, potentially leading to bias. A systematic difference between respondents and non-respondents is called **non-response bias**. +The specific problem of an unrepresentative distribution of ages in a sample can be solved by using **quota sampling**. In quota sampling, the sample is constructed to resemble the population of interest with respect to key characteristics. This can help, but it doesn't guarantee representativeness because not all important characteristics can be understood. + +**Example**: Imagine we want to estimate the public's satisfaction with the US Postal Service. Opinions differ by age and by their curmudgeonliness. Perhaps older people like USPS more, but curmudgeons will not like USPS regardless of age *and* curmudgeons will not respond. It's easy to quota sample based on age, and a quota sample will result in an average satisfaction rating of 9 based on {numref}`age-curmudg-table`. This is better than not using quota sampling and constructing a sample that overrepresents older non-curmudgeons. However, the true average rating is 7. Quota sampling can't give a good estimate of the actual population parameter because quotas for old or young people will systematically exclude curmudgeons. + +```{list-table} Average USPS Satisfaction by Age and Curmudgeonliness +:header-rows: 1 +:name: age-curmudg-table + +* - + - Bottom 50% Age + - Top 50% Age +* - Bottom 50% Curmudgeon + - 8 + - 10 +* - Top 50% Curmudgeon + - 5 + - 5 +``` + + +### Non-response in the American Time Use Survey {cite}`abraham2006nonresponse` reports different response rates for the American Time Use Survey, partially reproduced in {numref}`atusresprates`. The differing response rates present a mine field for researchers. diff --git a/book/_build/html/objects.inv b/book/_build/html/objects.inv index da5457b..c17cf71 100644 Binary files a/book/_build/html/objects.inv and b/book/_build/html/objects.inv differ diff --git a/book/_build/html/sampling.html b/book/_build/html/sampling.html index bbff990..76edad2 100644 --- a/book/_build/html/sampling.html +++ b/book/_build/html/sampling.html @@ -417,7 +417,10 @@
The specific problem of an unrepresentative distribution of ages in a sample can be solved by using quota sampling. In quota sampling, the sample is constructed to resemble the population of interest with respect to key characteristics. This can help, but it doesn’t guarantee representativeness because not all important characteristics can be understood.
+Example: Imagine we want to estimate the public’s satisfaction with the US Postal Service. Opinions differ by age and by their curmudgeonliness. Perhaps older people like USPS more, but curmudgeons will not like USPS regardless of age and curmudgeons will not respond. It’s easy to quota sample based on age, and a quota sample will result in an average satisfaction rating of 9 based on Table 11. This is better than not using quota sampling and constructing a sample that overrepresents older non-curmudgeons. However, the true average rating is 7. Quota sampling can’t give a good estimate of the actual population parameter because quotas for old or young people will systematically exclude curmudgeons.
++ | Bottom 50% Age |
+Top 50% Age |
+
---|---|---|
Bottom 50% Curmudgeon |
+8 |
+10 |
+
Top 50% Curmudgeon |
+5 |
+5 |
+
[AMB06] reports different response rates for the American Time Use Survey, partially reproduced in Table 12. The differing response rates present a mine field for researchers.