forked from micronutrientsupport/fct
-
Notifications
You must be signed in to change notification settings - Fork 1
/
MAPS_GENuS.Rmd
247 lines (140 loc) · 13.3 KB
/
MAPS_GENuS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
title: "MAPS Food Dictionary: User Guidelines"
author: "Segovia de la Revilla, Lucia"
date: "28/06/2021"
date: "Last compiled on `r format(Sys.time(), '%d %B, %Y')`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
In this document the creation and use of the MAPS-Food-Dictionary (`MAPS_Dictionary-Protocol.R`) (also referred as Food GENuS code) is detailed.
A list of datasets that uses/ are connected by the Genus code:
-MAPS_fct_mafood // MAPS_mafoods_vx.x
-MAPS_fct_wa // MAPS_wafct_vx.x
-MAPS_fct_ken // MAPS_kenfct_vx.x
-MAPS_hces_ihs4
-MAPS_hces_ihs5
## Food Dictionary data structure
[complete with information on ID_0 to ID_3, and rationale]
**Food Names 2**
[Elaborate this]
We realized that there are two group ids (ID_2) that are identical. Those ids were derived from SUA for crops and livestock and from FAO-Fishstat for fish and seafood. Thus, this conflict arouse. As a solution, we are going to add a line of code that changes the id values for fish. - FIXED
## Data sources
Here's a brief description of the data sources where the MAPS Food Dictionary can be found in MAPS tool.
## Food Composition Data
1) Food composition data are the "original" source of the "GENuS" code. Hence, `food_genus_code` should be unique per food item in each fct. If there are two food items in one food composition table that could be coded with the same `food_genus_code` a distinction should be made OR a choice should be taken.
## Food Balance Sheet
Food Balance Sheet (FBS) provides underlying data for MAPS tool. This data will allow to show food supply estimates that will be used to provide estimates of micronutrient supplies in all countries in sub-Saharan region. There have been some methodological update to the FBS, and hence they are divided by old FBS and new FBS. Therefore, two workflows might be implemented in order to provide both dataset with GENuS codes.
We will start with the old FBS for two main reasons: 1) it provides longer time-span data, 2) we can assume stability and no futher changes in that dataset.
Data source: http://www.fao.org/faostat/en/#data/FBSH
Regions = Eastern, Middle, Southern and Western Africa > (list)
Elements = Food supply quantity (kg/capita/yr)
Items = Select All
Years = Select All
In the first steps, the matches were done at regional level following the data in the study of Joy et al., however, because we need to provide data at country level a disaggregation into countries was needed.
Hence, we downloaded the data from FAOSTAT website for Southern, Eastern, Western and Middle Africa region countries. Then, we matched the countries with it's region (acc. to UN description) using the raster package. Although, we are using the `NAME_FAO` to merge, there are six countries without match due to differences in the name and/or spelling.
Then coded for the MAPS matching were supplied at country level but according to the region.
Once we have added the region, we need to create a `region` variable, with the values (E, W, M and S) that we will use for the merging.
The location is now sorted, hence we need to add the GENuS code that will allow MAPS tool to match food supply and food composition. To do so, we have *manually* coded each food item in Joy et al, study as reference, we did it for the Energy matches and we will carry those across the whole dataset. That original, manually coded template is called "Simplified-match-FBS-region_vxx.csv" and will serve as reference.
Because in the original study there was no "Middle" region, we need to create a middle region in fbs_ref. This will be done, following the same procedure as in the fct, by copying "Western" region.
This was generating issues due to differences in name spelling, then we decided to use the regional-fct to merge the GENuS code. Again, multiple issue were introduced due to differences in the spelling. We decided to use the FAO and ID_1 code for the merging to avoid issues with different spellings.
And we identified some issues with regards of the code merging. We need to identify if this issue is due to the food selected for certain categories in the regional FCT or it is at "Food Dictionary" level.
The items to be checked are:
item_code item item in FCT
1 2586 oilcrops oil, other --> oilcrops, other (2570) = fixed GENuS tag (confidence low)
2 2614 citrus, other --> orange, mandarines(2611) = change item code (confidence low)
3 2781 fish, body oil --> fish, liver oil (2782) = fixed GENuS tag (confidence low)
4 2764 marine fish, other --> pelagic fish (2763) = change item code (confidence low)
5 2680 infant food --> NA
6 2558 rape and mustardseed --> NA
7 2562 palm kernels --> NA
One and three were fixed in the regional fct and/or Food Dictionary leading to the release of the regional-fct_v1.6 and Food-Dictionary_v2.6. The ID_1 for those items are not as they should be, and they need to be amended accordingly. We will keep those `NA` as they are because the are not currently included in the FCT (hence they will not have a match), until further updates on the FCT. It is very important to highlight that matched need to be done base on item_code and NOT using item_name, due to challenges in the different name versions!.
We are going to use the same script for new fbs data, we are using this approach to get an idea of how similar or different both old and new fbs are in term of structure and food item codes used.
Data source: http://www.fao.org/faostat/en/#data/FBS
Regions = Eastern, Middle, Southern and Western Africa > (list)
Elements = Food supply quantity (kg/capita/yr)
Items = Select All
Years = Select All
Step 1: Allocating countries into its respective region did not need any adjustment in comparison with using the old FBS (FBSH).
Step 2: We identified nine food items without match in the regional fct. The new items identified are in bold. Rice and groundnuts were expected to be diferent due to changes in the definition of both items in comparison with the FBSH. In the new version, rice is reported as paddy equivalents (before was husked, eq.) and groundnuts are reported in shell equivalents (previously reported as shelled eq.). Similarly, *alcohol, non-food* and *miscellaneous* have been newly introduced and there is no data available in FBSH. We will use the same approach as in FBSH for solving the "missmatches" between regional fct and the FBS. We will change the new rice and groundnut code to the old one, for the time being. This will be likely to change after we account for edible portion and further data cleaning steps (i.e. including SUA data instead of FBS). Those changes will be listed and create a "new releases" pipeline, as it was done for the regional fct. For those analysis check fbs project.
|item_code| item item in FCT
|-|-|
|**2807**| **rice and products** --> rice and products (2805) = change item code (confidence low)
|**2552**| **groundnuts**| --> groundnuts (shelled eq) and products (2556) = change item code (confidence low)
|2558| rape and mustardseed| --> NA
|2614| citrus, other| --> orange, mandarines(2611) = change item code (confidence low)
|2764| marine fish, other| --> pelagic fish (2763) = change item code (confidence low)
|2680| infant food| --> NA
|**2899**| **miscellaneous**| --> NA
|2562| palm kernels| --> NA
|**2659**| **alcohol, non-food**| --> NA
```{r fbs}
```
## Household Consumption and Expenditure Survey
#1) IHS4
We **manually** matched IHS4 food items to a FoodEx2 code and provided some information about the rationale for the choices. However, it is still missing the confidence to the matches. Hence, we create a template to add the confidence for ihs4 to GENuS codes. Then, we filled that confidence values **manually** in excel.
There are some mismatching between the "old FoodEx2" and the GENuS code that need to be solved.
The last items are iteration of the original ihs4 food items, and hence they are going to be excluded the original ihs4 standard item list.
We create a template to be attached to the FCT, so we can supply a GENuS code for those items used in the IHS4 food matching.
```{r ihs4, echo=FALSE}
```
# User manual:
## Adding new items and codes
Below is one of the examples of the code used (and depricated now) that was used to generate sequential numbering for the items in the dictionary.
```{r new codes, echo=FALSE}
#id3_new
ifelse(is.na(id3)|id3 == "", paste0(id2, ".01"),
paste0( str_extract(id3,
"[[:alnum:]]{2,5}\\.\\d{1,2}\\.\\d{1}|[[:alnum:]]{2,5}\\.\\d{1}"),
as.numeric(str_extract(id3, "[[:digit:]]$"))+1))
```
## Development issues that should be tackled
#### Sharing updates
1) Keeping folders, files, and dependencies up-to-date, in a systematic, reproducible and efficient way. Including:-->#TO-BE-DONE
1.1) When updating the `food_genus_code` how to update it in a way that all the datasets that are affected are correctly and timely updated, so no dataset that depends on the Genus is left behind.
1.2) Updating all r-projects/ folders that uses the dictionary. We wrote a piece of code that find the Dictionary_vx.x (latest version) and save a copy of the new version in the same location. Still thinking on improvements to this solution. I.e., implementing a QC before saving, including git saving && committing.
1.3) In every folder that the Food Dictionary is used to find always the most updated/recent version to be loaded.
#### Food groups and categories
##### Issues
) Food categories (ID_1) that may be missclassified: 2775|aquatic plants are currently under "Fruits and Vegetables". However, according to FAOSTAT, they are under "Aquatic foods" and/or "Animal product". An option for the future would be to listed under "OT", or to generate a new food category for fisheries and aquatic products.
##### Potential new categories (ID_0)
) Fats and oils (FO): Particularly oils, are currently under "Other foods". However, it may be relevant to generate a specific category for the future. For example, this will be interesting as oils are frequently used as fortificants.
) Aquatic Products (AQ) (or fish and aquatic product (FA))
##### Compatibility with other food groups and categories
1) FoodEx2
2) FAO GIFT
```{r tracking-issue-2, echo=FALSE}
```
# Food description
Fortification is often misreported in the food item name and description, for instance, in kenfct the item "Chapati, white" (15019) used fortified flour, although it is not mentioned in the item name. We can check this by looking ant the recipes and by checking the VITA_RAE content (in comparison with "Chapati, brown (15020). Hence, for "wheat flour and products" we would need to generate a different entry for fortified and unfortified entries, and allocating the dictionary code that applies. Ensure that we double-check the quantities of the fortifiable nutrients. Also, it could be cases that it is due to other ingredientes and not because a fortification. For example, if using ghee or other dairy products, vit. A (RAE) would be higher that if only using plant-based products.
ISSUE with Dictionary --> SUA and Fish share the same code!!
ID_0 FoodName_0 ID_1 FoodName_1 ID_2 FoodName_2
<chr> <chr> <dbl> <chr> <chr> <chr>
1 RT Roots and Tubers 2531 potatoes and products 1510 potatoes
2 RT Roots and Tubers 2533 sweet potatoes and products 1530 sweet potatoes
3 AP Animal Products 2782 fish, liver oil 1510 freshwater fish, liver oil
4 AP Animal Products 2763 pelagic fish 1530 pelagic fish, frozen, fillet
# To do things:
# Adding values
## Notes:
For some instances, dictionary values were not available and/or compiled (there were there as part of the legacy) maybe removed them in next update. This is because there are food items for which food composition data is not available (e.g., wheat grain refined) or for which one food item is better than a potential value available, e.g., beef meat w/o bones from Western Africa than beef meat w/ bones from USA.
# Generic food composition values
118.01 - millet, dried, raw (average millet)
1701.01 - beans, dried, raw (should be an average of beans (i.e., generic food item))
1212.01 - cabbage, head, fresh, raw (average of cabbages)
1341.01 - apple, raw
21121.01 - chicken meat, fresh, raw
21151.01 - beef offal, fresh, raw
## Milk and products (dairy)
##NOTES:
##Any "whole fat would be under raw milk
# Change log
2023/06/16 - fix: mandazi was missing (F0022.08).
2023/04/25
-changes "rice grain, local, red" from 23161.01 to 23161.02.
2023/04/18
-changed "1290.9.03", coriander leaves, fresh, raw to anise, "1654.01", anise, badian, coriander, cumin, caraway, fennel and juniper berries, raw.
2023/02/15:
- Changed "Vegetable fat" from ID_2 (F1243) to ID_2 (34550) because the former was referring to animal fats, while the latter refers to any modified oils, and can include animal and/or vegetable, as per the description: "animal or vegetable fats and oils and their fractions, chemically modified, except those hydrogenated, inter-esterified, re-esterified or elaidinized; inedible mixtures or preparations of animal or vegetable fats or oils."