-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIA World Factbook #201
Comments
@rgrp You're more than welcome to (re)use what you can, see - /opendatajson/factbook.json for country profiles datasets in JSON. If anyone is interested some background and talks notes titled "factbook.json - Turn the World Factbook into Open (Structured) Data". About packaging - the factbook is more document-oriented (thus, "nested" JSON datasets to include everything incl. inconsistencies and "known" errors/typos etc.). Adding a subset, however, would work great for (one or more) tabular data packages (in CSV). Keep up the great work on datapackage.json and friends. Cheers. |
@geraldb awesome. Do you have any notes on the factbook structure and any scraping code to point to? (BTW: I remember scraping the factbook almost 10y ago in python but, typically, can't locate my code now!) |
@rgrp Sure. More than welcome. All code and scripts public domain. The ruby script (packaged as a gem) -> /factbook/factbook. All codes in csv /factbook/data/codes.csv and (most) categories mapped to attributes factbook/data/attributes.yml. That's the "real world" auto-generated list - factbook/CATEGORIES.md with a counter how many profile use the category. And if interested - there's a build script - to automate fetching and generating the datasets - /yorobot/factbook. Again everybody welcome to (re)use whatever you can. All public domain (dedicated). Cheers. |
@rufuspollock Thanks again for the interest in packaging the factbook datasets. Love the (tabular) data packages. As written before - this factbook repo and approach maps the original CIA factbook data sources (in html pages) with minimal clean-up 1:1 to "document-oriented" datasets. One "country" page one json dataset. An example, is France (which includes Metroplitan France and its overseas territories in a single country document, for example). Thus, as is you cannot map it without extra mapping to tabular structured data. The good news. @iancoleman has written an alternative factbook parser [1] that includes much more clean-ups and mappings, and, thus, might be way easier to use for packing up in tabular datasets. [1] https://github.com/iancoleman/cia_world_factbook_api#data Maybe @iancoleman can comment? By the way, great initiative / project. Always great to see alternatives / new factbook parsers / datasets / projects. Or maybe repost or open an issue / ticket on at the iancleman's cia_world_factbook_api repo to get things started over there. Again thanks for the update and interest. Keep it up. /cc @Mikanebu |
@geraldb thanks for the great suggestions. @iancoleman - any thoughts? Also do you have a schema for your data anywhere? Would it be possible make a table schema (https://specs.frictionlessdata.io/table-schema/) for it? |
For tabular data, have a quick look at https://iancoleman.github.io/explorer-cia-world-factbook/ which can create csv output; needs a bit of ux attention (eg a select all columns button, handle lists etc) but let me know if this is along the lines you're looking for. There isn't a formal schema but once the parser is a bit more mature this will happen. See iancoleman/cia_world_factbook_api#7 As for data being packagized, could you elaborate a bit more on that? I've somewhat bundled the data, see the 'data' section of the readme but it sounds like you're going for something a bit more formal...? |
@iancoleman i'm thinking about packaging (some of the data) as tabular data packages: http://frictionlessdata.io/data-packages/ https://specs.frictionlessdata.io/tabular-data-resource/ Especially adding a Table Schema https://specs.frictionlessdata.io/table-schema/ |
Thanks for the additional info. At this stage no plan for packaging, but it will happen at some point. I'll be tracking progress in iancoleman/cia_world_factbook_api#7 so if there's any further info you think may be beneficial please post it in that issue. |
CIA world factbook is a top candidate for being data packagized ...
/cc @geraldb - i see you've been doing some work around this recently
The text was updated successfully, but these errors were encountered: