From 50ff9b52644a2a05d85cfd9e437f1b4c40fb19b5 Mon Sep 17 00:00:00 2001 From: dsmedia <63077097+dsmedia@users.noreply.github.com> Date: Sun, 4 Aug 2024 19:57:21 -0400 Subject: [PATCH 1/2] docs: add description and source information for jobs.json --- SOURCES.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/SOURCES.md b/SOURCES.md index 7c52985..59884e0 100644 --- a/SOURCES.md +++ b/SOURCES.md @@ -156,6 +156,15 @@ Data about engineers from https://www.bls.gov/oes/tables.htm. Hurricane data fro The state of Iowa has dramatically increased its production of renewable wind power in recent years. This file contains the annual net generation of electricity in the state by source in thousand megawatthours. The dataset was compiled by the [U.S. Energy Information Administration](https://www.eia.gov/beta/electricity/data/browser/#/topic/0?agg=2,0,1&fuel=vvg&geo=00000g&sec=g&linechart=ELEC.GEN.OTH-IA-99.A~ELEC.GEN.COW-IA-99.A~ELEC.GEN.PEL-IA-99.A~ELEC.GEN.PC-IA-99.A~ELEC.GEN.NG-IA-99.A~~ELEC.GEN.NUC-IA-99.A~ELEC.GEN.HYC-IA-99.A~ELEC.GEN.AOR-IA-99.A~ELEC.GEN.HPS-IA-99.A~&columnchart=ELEC.GEN.ALL-IA-99.A&map=ELEC.GEN.ALL-IA-99.A&freq=A&start=2001&end=2017&ctype=linechart<ype=pin&tab=overview&maptype=0&rse=0&pin=) and downloaded on May 6, 2018. It is useful for illustrating stacked area charts. ## `jobs.json` +Derived from U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The data currently lacks accompanying generation scripts or clear documentation of its provenance. However, comprehensive census data, including on occupation, is available from [IPUMS USA](https://usa.ipums.org/usa/), which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790.) + +### Data Structure +The dataset is structured as follows: +- job: The occupation title +- sex: Gender (men/women) +- year: Census year +- count: Number of individuals in the occupation +- perc: Percentage of the workforce in the occupation ## `la-riots.csv` From a00e3a3a50991f862c674033de6ea2bcaa16221d Mon Sep 17 00:00:00 2001 From: ds <63077097+dsmedia@users.noreply.github.com> Date: Thu, 8 Aug 2024 21:41:14 -0400 Subject: [PATCH 2/2] docs: address comments on jobs.json SOURCES entry --- SOURCES.md | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/SOURCES.md b/SOURCES.md index 59884e0..383de2a 100644 --- a/SOURCES.md +++ b/SOURCES.md @@ -156,16 +156,39 @@ Data about engineers from https://www.bls.gov/oes/tables.htm. Hurricane data fro The state of Iowa has dramatically increased its production of renewable wind power in recent years. This file contains the annual net generation of electricity in the state by source in thousand megawatthours. The dataset was compiled by the [U.S. Energy Information Administration](https://www.eia.gov/beta/electricity/data/browser/#/topic/0?agg=2,0,1&fuel=vvg&geo=00000g&sec=g&linechart=ELEC.GEN.OTH-IA-99.A~ELEC.GEN.COW-IA-99.A~ELEC.GEN.PEL-IA-99.A~ELEC.GEN.PC-IA-99.A~ELEC.GEN.NG-IA-99.A~~ELEC.GEN.NUC-IA-99.A~ELEC.GEN.HYC-IA-99.A~ELEC.GEN.AOR-IA-99.A~ELEC.GEN.HPS-IA-99.A~&columnchart=ELEC.GEN.ALL-IA-99.A&map=ELEC.GEN.ALL-IA-99.A&freq=A&start=2001&end=2017&ctype=linechart<ype=pin&tab=overview&maptype=0&rse=0&pin=) and downloaded on May 6, 2018. It is useful for illustrating stacked area charts. ## `jobs.json` -Derived from U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The data currently lacks accompanying generation scripts or clear documentation of its provenance. However, comprehensive census data, including on occupation, is available from [IPUMS USA](https://usa.ipums.org/usa/), which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790.) +U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The dataset was obtained from IPUMS USA, which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790. + +Originally created for a 2006 data visualization project called *sense.us* by IBM Research (Jeff Heer, Martin Wattenberg and Fernanda ViƩgas), described [here](https://homes.cs.washington.edu/~jheer/files/bdata_ch12.pdf). The dataset is also referenced in this vega [example](https://vega.github.io/vega/examples/job-voyager/). + +### Notes on Data Origin +Data is based on a tabulation of the [OCC1950](https://usa.ipums.org/usa-action/variables/OCC1950) variable by sex across IPUMS USA samples. The dataset appears to be derived from Version 6.0 (2015) of [IPUMS USA](https://usa.ipums.org/usa/), according to 2024 correspondence with the IPUMS Project. IPUMS has made improvements to occupation coding since version 6, particularly for 19th-century samples, which may result in discrepancies between this dataset and current IPUMS data. Details on data revisions are available [here](https://usa.ipums.org/usa-action/revisions). ### Data Structure The dataset is structured as follows: - job: The occupation title -- sex: Gender (men/women) +- sex: Sex (men/women) - year: Census year - count: Number of individuals in the occupation - perc: Percentage of the workforce in the occupation +### Redistribution +IPUMS USA confirmed in 2024 correspondence that hosting this dataset on vega-datasets is permissible, stating: + +>We're excited to hear that this dataset made its way to this repository and is being used by students for data visualization. We allow for these types of redistributions of summary data so long as the underlying microdata records are not shared. + +This dataset contains only summary statistics and does not include any underlying microdata records. + +### Usage Notes +1. This dataset represents summary data. The underlying microdata records are not included. +2. Users attempting to replicate or extend this data should use the [PERWT](https://usa.ipums.org/usa-action/variables/PERWT#description_section) (person weight) variable as an expansion factor when working with IPUMS USA extracts. +3. Due to coding revisions, figures for earlier years (particularly 19th century) may not match current IPUMS USA data exactly. + +### Terms of Use and Citation +When using this dataset, please refer to IPUMS USA [terms of use](https://usa.ipums.org/usa/terms.shtml). The organization requests use of the following citation for this json file: + +Steven Ruggles, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. Integrated Public Use Microdata Series: Version 6.0. Minneapolis: University of Minnesota, 2015. http://doi.org/10.18128/D010.V6.0 + + ## `la-riots.csv` More than 60 people lost their lives amid the looting and fires that ravaged Los Angeles for five days starting on April 29, 1992. This file contains metadata about each person, including the geographic coordinates of their death. It was compiled and published by the [Los Angeles Times Data Desk](http://spreadsheets.latimes.com/la-riots-deaths/).