docs: images url computation (#9670)

* docs: images url computation * docs: links to tutorial on images
openfoodfacts · Jan 24, 2024 · 3deb4e0 · 3deb4e0
1 parent 2f08865
commit 3deb4e0
Show file tree

Hide file tree

Showing 2 changed files with 100 additions and 154 deletions.
diff --git a/docs/api/how-to-download-images.md b/docs/api/how-to-download-images.md
@@ -22,126 +22,38 @@ about how to download images from AWS dataset.
 
 ## Download from Open Food Facts server
 
-All images can be found on
-[https://images.openfoodfacts.org/images/products/](https://static.openfoodfacts.org/images/products/).
+All images are hosted under
+[https://images.openfoodfacts.org/images/products/](https://static.openfoodfacts.org/images/products/) folder. 
+But you have to build the right URL from the product info.
+
+### Computing single product image folder
+
 Images of a product are stored in a single directory. The path of this
-directory can be inferred easily from the product barcode. If the product
-barcode length is lower or equal to 8 (ex: "22222222"), the directory path is
-simply the barcode: all images can be found on
-`https://images.openfoodfacts.org/images/products/{barcode}`.
+directory can be inferred easily from the product barcode.
+There are two cases:
+
+1. If the product barcode length is lower or equal to 8 (ex: "22222222"), the directory path is
+simply the barcode: `https://images.openfoodfacts.org/images/products/{barcode}`.
+
+2. Otherwise, we split the 9 first part of the code by group of three digits to get the three first folder names, and use the rest of the name as the last folder name^[split-regexp].
+   For example, the barcode `3435660768163` is split as : `343/566/076/8163`, thus products images will be in `https://images.openfoodfacts.org/images/products/343/566/076/8163`
+
+^[split-regexp]: The following regex can be used to split the barcode into subfolders: `r"^(...)(...)(...)(.*)$"`
 
-Otherwise, the following regex is used to split the barcode into subfolders:
-`r"^(...)(...)(...)(.*)$"`. For example, the barcode `3435660768163` is split as
-follows: `343/566/076/8163`, and all images of the products can be found on
-[https://images.openfoodfacts.org/images/products/343/566/076/8163](https://images.openfoodfacts.org/images/products/343/566/076/8163).
+### Computing single image file name
 
-To get the image file names, we have to use the database dump or the API. All
-images information are stored in the `images` field. For product
-[3168930010883](https://world.openfoodfacts.org/api/v0/product/3168930010883.json),
-we have:
+Above we get the folder name, now we need the filename inside that folder for a particular image.
+
+#### Understanding images data
+
+To get the image file names, we have to use the database dump or the API. 
+All images information are stored in the `images` field. 
+
+Eg. For product [3168930010883](https://world.openfoodfacts.org/api/v0/product/3168930010883.json),
+we have (trimmed the data):
 
 ```json
     {
-      "4": {
-        "uploader": "openfoodfacts-contributors",
-        "uploaded_t": 1548685211,
-        "sizes": {
-          "400": {
-            "h": 400,
-            "w": 300
-          },
-          "100": {
-            "w": 75,
-            "h": 100
-          },
-          "full": {
-            "h": 3174,
-            "w": 2380
-          }
-        }
-      },
-      "3": {
-        "uploader": "openfoodfacts-contributors",
-        "uploaded_t": 1537002125,
-        "sizes": {
-          "full": {
-            "h": 3302,
-            "w": 2476
-          },
-          "100": {
-            "h": 100,
-            "w": 75
-          },
-          "400": {
-            "w": 300,
-            "h": 400
-          }
-        }
-      },
-      "ingredients_fr": {
-        "rev": "7",
-        "orientation": "0",
-        "ocr": 1,
-        "imgid": "2",
-        "y2": null,
-        "white_magic": "0",
-        "angle": null,
-        "x1": null,
-        "x2": null,
-        "geometry": "0x0-0-0",
-        "normalize": "0",
-        "y1": null,
-        "sizes": {
-          "100": {
-            "h": 100,
-            "w": 75
-          },
-          "400": {
-            "w": 300,
-            "h": 400
-          },
-          "200": {
-            "w": 150,
-            "h": 200
-          },
-          "full": {
-            "h": 1200,
-            "w": 900
-          }
-        }
-      },
-      "nutrition_fr": {
-        "sizes": {
-          "200": {
-            "h": 200,
-            "w": 150
-          },
-          "full": {
-            "w": 2476,
-            "h": 3302
-          },
-          "100": {
-            "w": 75,
-            "h": 100
-          },
-          "400": {
-            "w": 300,
-            "h": 400
-          }
-        },
-        "y1": "-1",
-        "normalize": null,
-        "x2": "-1",
-        "geometry": "0x0--8--8",
-        "x1": "-1",
-        "angle": 0,
-        "imgid": "3",
-        "white_magic": null,
-        "y2": "-1",
-        "ocr": 1,
-        "orientation": "0",
-        "rev": "11"
-      },
       "1": {
         "sizes": {
           "full": {
@@ -160,24 +72,6 @@ we have:
         "uploader": "kiliweb",
         "uploaded_t": "1527184614"
       },
-      "2": {
-        "sizes": {
-          "100": {
-            "h": 100,
-            "w": 75
-          },
-          "400": {
-            "h": 400,
-            "w": 300
-          },
-          "full": {
-            "h": 1200,
-            "w": 900
-          }
-        },
-        "uploader": "kiliweb",
-        "uploaded_t": "1527184615"
-      },
       "front_fr": {
         "x1": null,
         "angle": null,
@@ -213,31 +107,79 @@ we have:
 
 The keys of the map are the keys of the images. These keys can be:
 
--   digits: the image is the raw image sent by the contributor (full resolution).
--   selected images: `front_{lang}`, `nutrition_{lang}` and
-    `ingredients_{lang}`, selected as front, nutrition and ingredients images
-    respectively for `lang`. Here, `lang` is a 2-letter ISO 639-1 language code
-    (fr, en, es,\...).
+-   digits: the image is the *raw image* sent by the contributor (full resolution).
+-   selected images:
+    * `front_{lang}` correspond to the front product image in language with code `lang`
+    * `ingredients_{lang}` correspond to the ingredients image in language with code `lang`
+    * `nutrition_{lang}` is the same but for nutrition data
+    * `packaging_{lang}` for packaging logos
 
-Each image is available in different resolutions: `100`, `200`, `400` or
-`full`, each corresponding to image height (`full` means not resized). The
-available resolutions can be found in the `sizes` subfield.
+    `lang` is a 2-letter ISO 639-1 language code (fr, en, es, …).
 
-Selected images have additional fields:
+Each image is available in different resolutions: 
+`100`, `200`, `400` or `full`, each corresponding to image height (`full` means not resized).
+The available resolutions can be found in the `sizes` subfield.
+
+#### Filename for a raw image
+
+For a raw image (the one under a numeric key in images field), 
+filename is very easy to compute: 
+* just take the image digit + `.jpg` for full resolution
+* image digit + `.` + resolution + `.jpg` for a lower resolution
+
+For our example above, the filename for image `"1"`
+* in resolution 400px is `1.400.jpg`
+* in full resolution, it is `1.jpg`
+
+So, adding the folder part, the final url for our example is: 
+* https://images.openfoodfacts.org/images/products/316/893/001/0883/1.jpg for the full image
+* https://images.openfoodfacts.org/images/products/316/893/001/0883/1.400.jpg for the 400px version
+
+#### Filename for a selected image
+
+In the structure, selected images have additional fields:
 
 -   `rev` (as revision) indicates the revision number of the image to use (each
     time a new image is selected, cropped or rotated, a new image with an
     incremented rev is generated).
 -   `imgid`, the image ID of the raw image used to generate the selected image.
--   `angle`, `x1`, `x2`, `y1`, `y2`: rotation angle and cropping coordinates.
-
-For selected images, the file name is the image key followed by the revision
-number and the resolution: `front_fr.1.400.jpg`. For raw images, the file name
-is either the image ID (`1.jpg`) or the image ID followed by the resolution
-(`1.100.jpg`).
-
-To get the full URL, simply concatenate the product directory path and the
-image name. Examples:
-
-- [https://images.openfoodfacts.org/images/products/343/566/076/8163/1.jpg](https://images.openfoodfacts.org/images/products/343/566/076/8163/1.jpg)
-- [https://images.openfoodfacts.org/images/products/343/566/076/8163/1.400.jpg](https://images.openfoodfacts.org/images/products/343/566/076/8163/1.400.jpg)
+-   `angle`, `x1`, `x2`, `y1`, `y2`: rotation angle and cropping coordinates (it's to be able to regenerate the image from the raw image)
+
+For selected images, the filename is the image key followed by the revision number and the resolution: `<image_name>.<rev>.<resolution>.jpg`.
+Resolution must always be specified, but you can use `full` keyword to get the full resolution image.
+
+In our above example, the filename for the front image in french (`front_fr` key) is:
+* `front_fr.4.400.jpg` for 400 px version
+* `front_fr.4.full.jpg` for full resolution version
+
+So, adding the folder part, the final url for our example is: 
+* https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.full.jpg for the full image
+* https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.400.jpg for the 400px version
+
+## A python snippet
+
+So if we have the product_data in a dict, a Python code doing it would be something like:
+
+```python
+def get_image_url(product_data, image_name, resolution="full"):
+    if image_name not in product_data["images"]:
+        return None
+    base_url = "https://images.openfoodfacts.org/images/products"
+    # get product folder name
+    folder_name = product_data["code"]
+    if len(folder_name) > 8:
+        folder_name = re.sub(r'(...)(...)(...)(.*)', r'\1/\2/\3/\4', folder_name)
+    # get filename
+    if re.match("^\d+$", image_name):  # only digits
+        # raw image
+        resolution_suffix = "" if resolution == "full" else f".{resolution}"
+        filename = f"{image_name}{resolution_suffix}.jpg"
+    else:
+        # selected image
+        rev = product_data["images"][image_name]["rev"]
+        filename = f"{image_name}.{rev}.{resolution}.jpg"
+    # join things together
+    return f"{base_url}/{folder_name}/{filename}"
+```
+        
+        
diff --git a/docs/api/ref/schemas/product_images.yaml b/docs/api/ref/schemas/product_images.yaml
@@ -5,6 +5,10 @@ description: |
   Images ensure the reliability of Open Food Facts data.
   It provides a primary source and proof of all the structured data.
   You may therefore want to display it along the structured information.
+
+  See also tutorials about images:
+  * [Getting images](https://openfoodfacts.github.io/openfoodfacts-server/api/how-to-download-images/)
+  * [Uploading images](https://openfoodfacts.github.io/openfoodfacts-server/api/tutorial-uploading-photo-to-a-product/)
 properties:
   image_front_small_url:
     type: string
@@ -85,4 +89,4 @@ properties:
         description: |
           See property `front` to get the real type of those objects
           (Put this way because of a [bug in rapidoc](https://github.com/rapi-doc/RapiDoc/issues/880))
-        type: string
+        type: string