Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add description and source information for jobs.json #593

Merged
merged 2 commits into from
Aug 14, 2024

Conversation

dsmedia
Copy link
Contributor

@dsmedia dsmedia commented Aug 4, 2024

No description provided.

Copy link
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This source makes sense but I wonder whether there is a more immediate source as well? Do we know who made this specific file?

SOURCES.md Outdated Show resolved Hide resolved
@dsmedia
Copy link
Contributor Author

dsmedia commented Aug 6, 2024

To start with what we can be confident about: It's nearly certain the data is ultimately derived from US Census Bureau data, though historical data of this specificity isn't generally made available directly on census.gov. I'd also say it's very likely the data was aggregated by a domain expert from raw IPUMS USA data. But to your question, the immediate source of jobs.json is a bit of a mystery. I've not been able to find the exact datapoints referenced elsewhere (e.g. in a widely cited academic paper). The file was uploaded originally by @arvind. I'm not able to find any other documentation besides the one line in this example. I've contacted IPUMS via email to inquire as well.

@domoritz
Copy link
Member

domoritz commented Aug 6, 2024

Sometimes if you look at what example this dataset is used in, you can find a corresponding D3 example with an author and a source.

@dsmedia
Copy link
Contributor Author

dsmedia commented Aug 9, 2024

Updated with original source (vintage 2006!), permission from IPUMS, and additional context and links.

Derived from U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The data currently lacks accompanying generation scripts or clear documentation of its provenance. However, comprehensive census data, including on occupation, is available from [IPUMS USA](https://usa.ipums.org/usa/), which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790.)
U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The dataset was obtained from IPUMS USA, which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790.

Originally created for a 2006 data visualization project called *sense.us* by IBM Research (Jeff Heer, Martin Wattenberg and Fernanda Viégas), described [here](https://homes.cs.washington.edu/~jheer/files/bdata_ch12.pdf). The dataset is also referenced in this vega [example](https://vega.github.io/vega/examples/job-voyager/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the link you posted is a more formal academic paper, but on the other hand it barely discusses the dataset itself. The original link goes into detail about the dataset exploration ("Data" section, printed page 186-188), and also goes into depth about the IPUMS-USA database, from which this json was generated. This made it seem like a good link to have for a dataset repo like this one. That said, I don't feel strongly about having one link over the other if you have a preference, so feel free to swap out if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I went for the link from https://vega.github.io/vega/examples/job-voyager/ but your argument makes sense.

@domoritz domoritz merged commit f518903 into vega:main Aug 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants