Skip to content

Commit

Permalink
docs: images url computation (#9670)
Browse files Browse the repository at this point in the history
* docs: images url computation

* docs: links to tutorial on images
  • Loading branch information
alexgarel authored Jan 24, 2024
1 parent 2f08865 commit 3deb4e0
Show file tree
Hide file tree
Showing 2 changed files with 100 additions and 154 deletions.
248 changes: 95 additions & 153 deletions docs/api/how-to-download-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,126 +22,38 @@ about how to download images from AWS dataset.

## Download from Open Food Facts server

All images can be found on
[https://images.openfoodfacts.org/images/products/](https://static.openfoodfacts.org/images/products/).
All images are hosted under
[https://images.openfoodfacts.org/images/products/](https://static.openfoodfacts.org/images/products/) folder.
But you have to build the right URL from the product info.

### Computing single product image folder

Images of a product are stored in a single directory. The path of this
directory can be inferred easily from the product barcode. If the product
barcode length is lower or equal to 8 (ex: "22222222"), the directory path is
simply the barcode: all images can be found on
`https://images.openfoodfacts.org/images/products/{barcode}`.
directory can be inferred easily from the product barcode.
There are two cases:

1. If the product barcode length is lower or equal to 8 (ex: "22222222"), the directory path is
simply the barcode: `https://images.openfoodfacts.org/images/products/{barcode}`.

2. Otherwise, we split the 9 first part of the code by group of three digits to get the three first folder names, and use the rest of the name as the last folder name^[split-regexp].
For example, the barcode `3435660768163` is split as : `343/566/076/8163`, thus products images will be in `https://images.openfoodfacts.org/images/products/343/566/076/8163`

^[split-regexp]: The following regex can be used to split the barcode into subfolders: `r"^(...)(...)(...)(.*)$"`

Otherwise, the following regex is used to split the barcode into subfolders:
`r"^(...)(...)(...)(.*)$"`. For example, the barcode `3435660768163` is split as
follows: `343/566/076/8163`, and all images of the products can be found on
[https://images.openfoodfacts.org/images/products/343/566/076/8163](https://images.openfoodfacts.org/images/products/343/566/076/8163).
### Computing single image file name

To get the image file names, we have to use the database dump or the API. All
images information are stored in the `images` field. For product
[3168930010883](https://world.openfoodfacts.org/api/v0/product/3168930010883.json),
we have:
Above we get the folder name, now we need the filename inside that folder for a particular image.

#### Understanding images data

To get the image file names, we have to use the database dump or the API.
All images information are stored in the `images` field.

Eg. For product [3168930010883](https://world.openfoodfacts.org/api/v0/product/3168930010883.json),
we have (trimmed the data):

```json
{
"4": {
"uploader": "openfoodfacts-contributors",
"uploaded_t": 1548685211,
"sizes": {
"400": {
"h": 400,
"w": 300
},
"100": {
"w": 75,
"h": 100
},
"full": {
"h": 3174,
"w": 2380
}
}
},
"3": {
"uploader": "openfoodfacts-contributors",
"uploaded_t": 1537002125,
"sizes": {
"full": {
"h": 3302,
"w": 2476
},
"100": {
"h": 100,
"w": 75
},
"400": {
"w": 300,
"h": 400
}
}
},
"ingredients_fr": {
"rev": "7",
"orientation": "0",
"ocr": 1,
"imgid": "2",
"y2": null,
"white_magic": "0",
"angle": null,
"x1": null,
"x2": null,
"geometry": "0x0-0-0",
"normalize": "0",
"y1": null,
"sizes": {
"100": {
"h": 100,
"w": 75
},
"400": {
"w": 300,
"h": 400
},
"200": {
"w": 150,
"h": 200
},
"full": {
"h": 1200,
"w": 900
}
}
},
"nutrition_fr": {
"sizes": {
"200": {
"h": 200,
"w": 150
},
"full": {
"w": 2476,
"h": 3302
},
"100": {
"w": 75,
"h": 100
},
"400": {
"w": 300,
"h": 400
}
},
"y1": "-1",
"normalize": null,
"x2": "-1",
"geometry": "0x0--8--8",
"x1": "-1",
"angle": 0,
"imgid": "3",
"white_magic": null,
"y2": "-1",
"ocr": 1,
"orientation": "0",
"rev": "11"
},
"1": {
"sizes": {
"full": {
Expand All @@ -160,24 +72,6 @@ we have:
"uploader": "kiliweb",
"uploaded_t": "1527184614"
},
"2": {
"sizes": {
"100": {
"h": 100,
"w": 75
},
"400": {
"h": 400,
"w": 300
},
"full": {
"h": 1200,
"w": 900
}
},
"uploader": "kiliweb",
"uploaded_t": "1527184615"
},
"front_fr": {
"x1": null,
"angle": null,
Expand Down Expand Up @@ -213,31 +107,79 @@ we have:

The keys of the map are the keys of the images. These keys can be:

- digits: the image is the raw image sent by the contributor (full resolution).
- selected images: `front_{lang}`, `nutrition_{lang}` and
`ingredients_{lang}`, selected as front, nutrition and ingredients images
respectively for `lang`. Here, `lang` is a 2-letter ISO 639-1 language code
(fr, en, es,\...).
- digits: the image is the *raw image* sent by the contributor (full resolution).
- selected images:
* `front_{lang}` correspond to the front product image in language with code `lang`
* `ingredients_{lang}` correspond to the ingredients image in language with code `lang`
* `nutrition_{lang}` is the same but for nutrition data
* `packaging_{lang}` for packaging logos

Each image is available in different resolutions: `100`, `200`, `400` or
`full`, each corresponding to image height (`full` means not resized). The
available resolutions can be found in the `sizes` subfield.
`lang` is a 2-letter ISO 639-1 language code (fr, en, es, …).

Selected images have additional fields:
Each image is available in different resolutions:
`100`, `200`, `400` or `full`, each corresponding to image height (`full` means not resized).
The available resolutions can be found in the `sizes` subfield.

#### Filename for a raw image

For a raw image (the one under a numeric key in images field),
filename is very easy to compute:
* just take the image digit + `.jpg` for full resolution
* image digit + `.` + resolution + `.jpg` for a lower resolution

For our example above, the filename for image `"1"`
* in resolution 400px is `1.400.jpg`
* in full resolution, it is `1.jpg`

So, adding the folder part, the final url for our example is:
* https://images.openfoodfacts.org/images/products/316/893/001/0883/1.jpg for the full image
* https://images.openfoodfacts.org/images/products/316/893/001/0883/1.400.jpg for the 400px version

#### Filename for a selected image

In the structure, selected images have additional fields:

- `rev` (as revision) indicates the revision number of the image to use (each
time a new image is selected, cropped or rotated, a new image with an
incremented rev is generated).
- `imgid`, the image ID of the raw image used to generate the selected image.
- `angle`, `x1`, `x2`, `y1`, `y2`: rotation angle and cropping coordinates.

For selected images, the file name is the image key followed by the revision
number and the resolution: `front_fr.1.400.jpg`. For raw images, the file name
is either the image ID (`1.jpg`) or the image ID followed by the resolution
(`1.100.jpg`).

To get the full URL, simply concatenate the product directory path and the
image name. Examples:

- [https://images.openfoodfacts.org/images/products/343/566/076/8163/1.jpg](https://images.openfoodfacts.org/images/products/343/566/076/8163/1.jpg)
- [https://images.openfoodfacts.org/images/products/343/566/076/8163/1.400.jpg](https://images.openfoodfacts.org/images/products/343/566/076/8163/1.400.jpg)
- `angle`, `x1`, `x2`, `y1`, `y2`: rotation angle and cropping coordinates (it's to be able to regenerate the image from the raw image)

For selected images, the filename is the image key followed by the revision number and the resolution: `<image_name>.<rev>.<resolution>.jpg`.
Resolution must always be specified, but you can use `full` keyword to get the full resolution image.

In our above example, the filename for the front image in french (`front_fr` key) is:
* `front_fr.4.400.jpg` for 400 px version
* `front_fr.4.full.jpg` for full resolution version

So, adding the folder part, the final url for our example is:
* https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.full.jpg for the full image
* https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.400.jpg for the 400px version

## A python snippet

So if we have the product_data in a dict, a Python code doing it would be something like:

```python
def get_image_url(product_data, image_name, resolution="full"):
if image_name not in product_data["images"]:
return None
base_url = "https://images.openfoodfacts.org/images/products"
# get product folder name
folder_name = product_data["code"]
if len(folder_name) > 8:
folder_name = re.sub(r'(...)(...)(...)(.*)', r'\1/\2/\3/\4', folder_name)
# get filename
if re.match("^\d+$", image_name): # only digits
# raw image
resolution_suffix = "" if resolution == "full" else f".{resolution}"
filename = f"{image_name}{resolution_suffix}.jpg"
else:
# selected image
rev = product_data["images"][image_name]["rev"]
filename = f"{image_name}.{rev}.{resolution}.jpg"
# join things together
return f"{base_url}/{folder_name}/{filename}"
```
6 changes: 5 additions & 1 deletion docs/api/ref/schemas/product_images.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ description: |
Images ensure the reliability of Open Food Facts data.
It provides a primary source and proof of all the structured data.
You may therefore want to display it along the structured information.
See also tutorials about images:
* [Getting images](https://openfoodfacts.github.io/openfoodfacts-server/api/how-to-download-images/)
* [Uploading images](https://openfoodfacts.github.io/openfoodfacts-server/api/tutorial-uploading-photo-to-a-product/)
properties:
image_front_small_url:
type: string
Expand Down Expand Up @@ -85,4 +89,4 @@ properties:
description: |
See property `front` to get the real type of those objects
(Put this way because of a [bug in rapidoc](https://github.com/rapi-doc/RapiDoc/issues/880))
type: string
type: string

0 comments on commit 3deb4e0

Please sign in to comment.