Category classification dataset (2019-09-16)
Category classification datasets.
One dataset per major language was build along with a multilingual (xx
) dataset. Only the products that met the following requirements were kept:
- non empty
categories_tags
field
For language specific dataset, the following requirements must also be met:
lang
field is set to the input languageproduct_name_{lang}
is not empty
For the multilingual dataset:
product_name
is not empty
In each dataset, the following fields can be found:
code
: product barcodeproduct_name
: the language specific product name (or value ofproduct_name
for the multilingual dataset)categories_tags
ingredient_tags
known_ingredient_tags
: tags of ingredients found in the taxonomyingredients_text
: the language specific ingredient text (or value ofingredients_text
for the multilingual dataset)lang
The ingredient and category taxonomies used during dataset generation are also provided.