FoodData Central is an amazing service provided by the U.S. Department of Agriculture, hosting nutritional and agricultural information for massive amounts of branded and foundational foods. They provide a convenient REST API for querying foods and also allow us to download the raw data. The data is available for download as either several CSV files or a Microsoft Access database. The purpose of this repository is to continue the free and open source spirit that FoodData Central encourages by providing simple scripts which migrate the data to a Postgres database, as opposed to the proprietary Microsoft Access format. 🤢
Note that download.sh
simply uses wget
, unzip
, and sed
to put all the CSV files into the local filesystem.
This can easily be done by hand, but the script is convenient.
chmod +x ./download.sh
./download.sh
On Ubuntu, the required commands can be downloaded easily.
apt-get install wget unzip sed
Once the CSV files are in the directory FoodData_Central
, we can populate the Postgres database by running the PostgreSQL transactions in the file setup.sql
.
psql -f setup.sql
Surely, this requires we have a Postgres client, which we can do on Ubuntu like so.
apt-get install postgresql-12
The cleanup.sql
file will undo the work of setup.sql
.
Running
psql -f cleanup.sql
will remove all the data, as though it were never installed.
- The
fndds_ingredient_nutrient_value.csv
file is corrupted, so I needed to remove the messed up rows. - The
food_attribute
table has rows withoutfdc_id
column which seem to be useless logs for my use cases, so I drop said rows. - The
food_nutrient
table referencesfdc_id
values from1104805
to1104809
which are not present infood
, so those rows were deleted. - The
food_nutrient
table has a columnfood_nutrient_id
which refers to one ofnutritient.id
ornutirient.nutrient_nbr
, so shadow columnsnutrient_id_nid
andnutrient_id_nnbr
are created infood_nutrient
to satisfy the foreign keys. - Many of the columns for
input_food
are useless and were thus dropped. - The
sub_sample_food
table referencesfdc_id
value1104803
which is not present infood
, so those rows were deleted. - The
sub_sample_result
table referencesfood_nutrient_id
values from13336250
to13336266
which we needed to remove fromfdc_id
null-pointers (see earlier caveat).
To host a webservice that offers a Rest API similar to that provided by FoodData Central, consider cloning my Rust (working progress) implementation.