Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset updates #30

Open
24 tasks
briatte opened this issue Jan 30, 2021 · 0 comments
Open
24 tasks

Dataset updates #30

briatte opened this issue Jan 30, 2021 · 0 comments

Comments

@briatte
Copy link
Owner

briatte commented Jan 30, 2021

Closes #21, #22 and #23 (copied below), #27.

Update from 2023

Stop updating the data, really.

Detailed notes

  • QOG: qog2023 -- since QOG 2023 is out
    • freeze: qog2019
    • would require rewriting code and looking at less clear results… see code at end of section
    • only advantage would be lower codebook size → just downsample the 2019 one, it only loses the intra-doc links
    • note the codebook issue! QOG 2020: make sure GDP documentation has been corrected #27
    • Perhaps simply drop the eu_* variables
  • GSS: gss7221 -- since GSS has updated too
    • freeze: gss7616 (but see below)
    • not fun to keep only one year: keep older years one old year too
    • possibly break down single data into yearly ones? restrict to 1976 and 2016
  • ESS: ess2008 -- in order to continue using torture question?
    • freeze: ess0816, or ess2008 and ess2016 (different codebooks, so it's fine)
    • keep using Round 4 for both torture example and health services ones (results are not as clear-cut with Round 8(
    • keep Round 8 to cover e.g. climate change
    • problem: DTA file is too large -- divide, to avoid _merge problem
    • document existence of ess2016 despite not in use anywhere in the course do-files
  • WVS: wvs9904 -- keep old version for sharia law question
    • update to last version, check encoding
    • possibly also include a more recent wave? (raises same question as ess2016)
  • NHIS: update to nhis202* recent year nhis1020?
    • check if sampling frame and variables have changed first
    • see below on how URL structure for fetching has changed

Note on QOG -- offers only this as a replacement in 2023, which is not ideal:

// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
	name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name(g2, replace)
// all regions
forv i = 1/10 {
	sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
	lfit wdi_fertility wef_lse, ///
	name("region`i'", replace)
}

The plan for 2021:

Additional things to consider:

Dataset names

I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:

  • ess1214 (not used) and ess0816
  • wvs9904 (unavoidable)
  • nhis1017 (unavoidable, unless we use a single year, but that removes any demo of keep if year)
  • gss7616 (unavoidable, unless we separate the years)

Merged datasets

Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.

  • Keep NHIS with multiple years. Use it to demo keep if year.
  • Keep WVS with multiple years (country-dependent).
  • Break down GSS.
  • Break down ESS.

Both WVS and ESS are used to demo keep if inlist(country, …), the other subset we want to show.

Additional datasets

It would make a lot of sense to have more datasets for the students to use than those used in the do-files.

Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.

  • GSS has a single codebook, so bundling many years would duplicate the codebook in the ZIP archives. Not ideal.
  • ESS could be broken down to Rounds 4 (2008), 8 (2016) and 9 (2018).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant