Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update FCT data version #279

Open
rbroth opened this issue Jan 21, 2022 · 15 comments
Open

Update FCT data version #279

rbroth opened this issue Jan 21, 2022 · 15 comments
Assignees

Comments

@rbroth
Copy link
Collaborator

rbroth commented Jan 21, 2022

Add newest data to repository, which should hopefully fix a problem with matching food composition data to food consumption data

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 21, 2022

Is the FCT metadate actually read from .csv file? I've found hardcoded FCt emtadata in test-data\V0.3.0.005__FCT_metadata.sql.

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 21, 2022

It looks like the FCT metadata is hardcoded. The FCT-import script checks if metadata already exists in the db before loading FCT data

@rbroth rbroth self-assigned this Jan 21, 2022
@rbroth
Copy link
Collaborator Author

rbroth commented Jan 21, 2022

Currently stuck on the following error, putting it here to remind myself on monday:

MAPS_KENFCT_v1.3.csv
{'\ufefforiginal_food_id': '10006', 'original_food_name': 'Mustard Seeds, Dry, Raw', 'food_genus_id': '1442.01', 'food_genus_confidence': 'h', 'fct_name': 'KENFCT', 'data_reference_original_id': 'KEN93-64,KEN93-286,IN17-H013, ZA10-3474', 'moisture_in_g': '8.2', 'energy_in_kcal': '522', 'energy_in_kj': '2160', 'totalprotein_in_g': '18.8', 'totalfats_in_g': '39.7', 'carbohydrates_in_g': '15.5', 'fibre_in_g': '13.7', 'ash_in_g':
'4.2', 'ca_in_mg': '433', 'fe_in_mg': '16', 'mg_in_mg': '220', 'p_in_mg': '587', 'k_in_mg': '380', 'na_in_mg': '2', 'zn_in_mg': '3.9', 'se_in_mcg': '70', 'vitamina_in_rae_in_mcg': '3', 'thiamin_in_mg': '0.65', 'riboflavin_in_mg': '0.26', 'niacin_in_mg': '3.7', 'folate_in_mcg': '75', 'vitaminb12_in_mcg': '0', 'vitaminc_in_mg': '0', 'phyticacid_in_mg': '129'}
Traceback (most recent call last):
  File "D:\-Repos\bmgf-maps\db-test-data\MAPS_data_pipeline.py", line 9, in <module>
    import_fct_csvs('.//Food Composition Tables//CSV')
  File "D:\-Repos\bmgf-maps\db-test-data\FCT_Import.py", line 37, in import_fct_csvs
    raise(e)
  File "D:\-Repos\bmgf-maps\db-test-data\FCT_Import.py", line 21, in import_fct_csvs
    insert_food_genus(row)
  File "D:\-Repos\bmgf-maps\db-test-data\maps_etl\fct.py", line 62, in insert_food_genus
    'food_name': data['food_genus_description'],
KeyError: 'food_genus_description'

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 25, 2022

new data imported

@rbroth rbroth closed this as completed Jan 25, 2022
@rbroth rbroth reopened this Jan 26, 2022
@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

moving from MAPS_MAFOODS_v1.4.csv to MAPS_MAFOODS_v1.5.csv breaks the household consumption data import.

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

Removing MAFOOD entirely also breaks the household consumption data import

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

The problem is with a food_genus_id; trying to insert household consumption data causes a foreign key violation. The problem is with household consumption entries for 21113.01.01 Pork. The same food_genus_id can be found in D:-Repos\bmgf-maps\db-test-data\raw data\food-genus\MAPS_Dictionary_v2.5.csv
194,2: "21113.01.01","pig meat, with bones, fresh, raw": "21113.01.01","pig meat, with bones, fresh, raw"

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

The change from MAPS_MAFOODS_v1.4.csv to MAPS_MAFOODS_v1.5.csv removed several entries, including e.g.
"Pork, meat, ~20% fat, raw, Sus Scrofa domesticus, (Nyama ya nkhumba)","21113.01.01","pig meat, with bones, fresh, raw"

@LuciaSegovia
Copy link

I am going to update the IHS4 version so the matching will work, but since updating the genus_id is something that is meant to happen and it will evetually break the matching process, we need to think on a long term solution. There are a few ideas:

  1. Finding a way to update all the files with those genus_id dependecies. This sounds ideal, but I haven't been able to make it work.
  2. Using version control - For example, IHS4_v1.2 only works with MAPS_MAFOODS_v1.4.
  3. That the fuzzy matching that @TomCodd is developing will save the day

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

Can you explain why the id for that fooditem changed? When you change the the id of a fooditem, can't you simply do a find-and-replace on the consumption datasets as well?

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 26, 2022

For the time being, I've replaced all instances of 21113.01.01 with 21113.02.01 in the IHS consumption data

@LuciaSegovia
Copy link

I change it when I have new data or updated knowledge that represents better the dietary intakes of Malawi. I don't think that doing a manual search and replace is a good long term solution. So, I would need to think of something...

@LuciaSegovia
Copy link

For the time being, I've replaced all instances of 21113.01.01 with 21113.02.01 in the IHS consumption data

Thanks! I'll look into the IHS4 data and see any other potential issues :)

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 27, 2022

@TomCodd

@rbroth
Copy link
Collaborator Author

rbroth commented Jan 28, 2022

The data has been updated and merged into the master branch of the data repo. However, due to a workaround for a bug, most of the data has been temporarily removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants