Skip to content

labordynamicsinstitute/SynUSpopulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic population housing and person records for the United States

  • Author: William Sexton
  • Last Modified: 11/29/17

These data are meant to be representative of the 2012 US population.

Inputs

The synthetic population was generated from the 2010-2014 ACS PUMS housing and person files.

United States Department of Commerce. Bureau of the Census. (2017-03-06).
American Community Survey 2010-2014 ACS 5-Year PUMS File [Data set].
Ann Arbor, MI: Inter-university Consortium of Political and Social
Research [distributor]. http://doi.org/10.3886/E100486V1

Persistent URL: http://doi.org/10.3886/E100486V1

Funding support

This work is supported under Grant G-2015-13903 from the Alfred P. Sloan Foundation on "The Economics of Socially-Efficient Privacy and Confidentiality Management for Statistical Agencies" (PI: John M. Abowd)

Testing

Stress testing to determine whether these data can actually reproduce accurate statistics for 2012 is still underway.

Outputs

There is a set of housing files

  • repHus0.csv, repHus1.csv, ... and a set of person files
  • repPus0.csv, repPpus1.csv, ...

Files are split to be roughly equal in size. The files contain data for the entire country. Files are not split along any demographic characteristic. The person files and housing files must be concatenated to form a complete person file and a complete housing file, respectively.

If desired, person and housing records should be merged on 'id'. Variable description is below.

Data Dictionary

See 2010-2014 ACS PUMS data dictionary. All variables from the ACS PUMS housing files are present in the synthetic housing files and all variables from the ACS PUMS person files are present in the synthetic person files. Variables have not been modified in any way. Theoretically, variables like person weight no longer have any use in the synthetic population.

Additional variables.

  • id: Both the synthetic housing and person files include this variable. It is meant as an extension/recode of the existing serialno variable. id has the form serialno.replicationno where serialno is the serialno and replicationno is a nonnegative integer ranging from 0 up to n-1 where n is the number of times the household/GQ with serialno = serialno was replicated in the synthetic population.