Skip to content

Category classification dataset (2019-09-16)

Compare
Choose a tag to compare

Category classification datasets.
One dataset per major language was build along with a multilingual (xx) dataset. Only the products that met the following requirements were kept:

  • non empty categories_tags field

For language specific dataset, the following requirements must also be met:

  • lang field is set to the input language
  • product_name_{lang} is not empty

For the multilingual dataset:

  • product_name is not empty

In each dataset, the following fields can be found:

  • code: product barcode
  • product_name: the language specific product name (or value of product_namefor the multilingual dataset)
  • categories_tags
  • ingredient_tags
  • known_ingredient_tags: tags of ingredients found in the taxonomy
  • ingredients_text: the language specific ingredient text (or value of ingredients_textfor the multilingual dataset)
  • lang

The ingredient and category taxonomies used during dataset generation are also provided.