Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIA World Factbook #201

Open
rufuspollock opened this issue Nov 20, 2016 · 9 comments
Open

CIA World Factbook #201

rufuspollock opened this issue Nov 20, 2016 · 9 comments

Comments

@rufuspollock
Copy link
Member

CIA world factbook is a top candidate for being data packagized ...

/cc @geraldb - i see you've been doing some work around this recently

@geraldb
Copy link

geraldb commented Nov 20, 2016

@rgrp You're more than welcome to (re)use what you can, see - /opendatajson/factbook.json for country profiles datasets in JSON. If anyone is interested some background and talks notes titled "factbook.json - Turn the World Factbook into Open (Structured) Data".

About packaging - the factbook is more document-oriented (thus, "nested" JSON datasets to include everything incl. inconsistencies and "known" errors/typos etc.). Adding a subset, however, would work great for (one or more) tabular data packages (in CSV). Keep up the great work on datapackage.json and friends. Cheers.

@rufuspollock
Copy link
Member Author

@geraldb awesome. Do you have any notes on the factbook structure and any scraping code to point to? (BTW: I remember scraping the factbook almost 10y ago in python but, typically, can't locate my code now!)

@geraldb
Copy link

geraldb commented Nov 20, 2016

@rgrp Sure. More than welcome. All code and scripts public domain. The ruby script (packaged as a gem) -> /factbook/factbook. All codes in csv /factbook/data/codes.csv and (most) categories mapped to attributes factbook/data/attributes.yml. That's the "real world" auto-generated list - factbook/CATEGORIES.md with a counter how many profile use the category. And if interested - there's a build script - to automate fetching and generating the datasets - /yorobot/factbook. Again everybody welcome to (re)use whatever you can. All public domain (dedicated). Cheers.

@rufuspollock
Copy link
Member Author

@geraldb i'm quite interested in trying to (tabular) data package this. Do you have an SQL version of the data - that would be the easiest to convert to CSV.

Also i looked briefly at the yorobot scripts but wasn't sure the best place to start -- any tips?

/cc @Mikanebu

@geraldb
Copy link

geraldb commented Sep 17, 2017

@rufuspollock Thanks again for the interest in packaging the factbook datasets. Love the (tabular) data packages.

As written before - this factbook repo and approach maps the original CIA factbook data sources (in html pages) with minimal clean-up 1:1 to "document-oriented" datasets. One "country" page one json dataset. An example, is France (which includes Metroplitan France and its overseas territories in a single country document, for example). Thus, as is you cannot map it without extra mapping to tabular structured data.

The good news. @iancoleman has written an alternative factbook parser [1] that includes much more clean-ups and mappings, and, thus, might be way easier to use for packing up in tabular datasets.

[1] https://github.com/iancoleman/cia_world_factbook_api#data

Maybe @iancoleman can comment? By the way, great initiative / project. Always great to see alternatives / new factbook parsers / datasets / projects.

Or maybe repost or open an issue / ticket on at the iancleman's cia_world_factbook_api repo to get things started over there.

Again thanks for the update and interest. Keep it up.

/cc @Mikanebu

@rufuspollock
Copy link
Member Author

@geraldb thanks for the great suggestions.

@iancoleman - any thoughts? Also do you have a schema for your data anywhere? Would it be possible make a table schema (https://specs.frictionlessdata.io/table-schema/) for it?

@iancoleman
Copy link

For tabular data, have a quick look at https://iancoleman.github.io/explorer-cia-world-factbook/ which can create csv output; needs a bit of ux attention (eg a select all columns button, handle lists etc) but let me know if this is along the lines you're looking for.

There isn't a formal schema but once the parser is a bit more mature this will happen. See iancoleman/cia_world_factbook_api#7

As for data being packagized, could you elaborate a bit more on that? I've somewhat bundled the data, see the 'data' section of the readme but it sounds like you're going for something a bit more formal...?

@rufuspollock
Copy link
Member Author

rufuspollock commented Sep 29, 2017

@iancoleman i'm thinking about packaging (some of the data) as tabular data packages:

http://frictionlessdata.io/data-packages/
http://frictionlessdata.io/guides/tabular-data-package/

https://specs.frictionlessdata.io/tabular-data-resource/

Especially adding a Table Schema https://specs.frictionlessdata.io/table-schema/

@iancoleman
Copy link

Thanks for the additional info. At this stage no plan for packaging, but it will happen at some point. I'll be tracking progress in iancoleman/cia_world_factbook_api#7 so if there's any further info you think may be beneficial please post it in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants