-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add description and source information for jobs.json #593
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This source makes sense but I wonder whether there is a more immediate source as well? Do we know who made this specific file?
To start with what we can be confident about: It's nearly certain the data is ultimately derived from US Census Bureau data, though historical data of this specificity isn't generally made available directly on census.gov. I'd also say it's very likely the data was aggregated by a domain expert from raw IPUMS USA data. But to your question, the immediate source of jobs.json is a bit of a mystery. I've not been able to find the exact datapoints referenced elsewhere (e.g. in a widely cited academic paper). The file was uploaded originally by @arvind. I'm not able to find any other documentation besides the one line in this example. I've contacted IPUMS via email to inquire as well. |
Sometimes if you look at what example this dataset is used in, you can find a corresponding D3 example with an author and a source. |
Updated with original source (vintage 2006!), permission from IPUMS, and additional context and links. |
Derived from U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The data currently lacks accompanying generation scripts or clear documentation of its provenance. However, comprehensive census data, including on occupation, is available from [IPUMS USA](https://usa.ipums.org/usa/), which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790.) | ||
U.S. census data on [occupations](https://usa.ipums.org/usa-action/variables/OCC1950#codes_section) by sex and year across decades between 1850 and 2000. The dataset was obtained from IPUMS USA, which "collects, preserves and harmonizes U.S. census microdata" from as early as 1790. | ||
|
||
Originally created for a 2006 data visualization project called *sense.us* by IBM Research (Jeff Heer, Martin Wattenberg and Fernanda Viégas), described [here](https://homes.cs.washington.edu/~jheer/files/bdata_ch12.pdf). The dataset is also referenced in this vega [example](https://vega.github.io/vega/examples/job-voyager/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is https://idl.cs.washington.edu/files/2009-Voyagers-CACM.pdf a better source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the link you posted is a more formal academic paper, but on the other hand it barely discusses the dataset itself. The original link goes into detail about the dataset exploration ("Data" section, printed page 186-188), and also goes into depth about the IPUMS-USA database, from which this json was generated. This made it seem like a good link to have for a dataset repo like this one. That said, I don't feel strongly about having one link over the other if you have a preference, so feel free to swap out if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I went for the link from https://vega.github.io/vega/examples/job-voyager/ but your argument makes sense.
No description provided.