[skip test] Add Example for VisionEncoderDecoder

JohnSnowLabs · Sep 25, 2023 · 3d77309 · 3d77309
1 parent ea63e1b
commit 3d77309
Showing 1 changed file with 265 additions and 0 deletions.
diff --git a/examples/python/annotation/image/VisionEncoderDecoderForImageCaptioning.ipynb b/examples/python/annotation/image/VisionEncoderDecoderForImageCaptioning.ipynb
@@ -0,0 +1,265 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)\n",
+    "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/image/VisionEncoderDecoderForImageCaptioning.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## VisionEncoderDecoderForImageCaptioning Annotator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this notebok we are going to generate captions for images using spark-nlp. It uses the vision transformer ViT to encode the images and then GPT2 to generate tokens. This model is rather heavy so make sure you have enough RAM and possible use an accelerator such as a GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Downloading Images"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!wget -q https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/images/images.zip"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import shutil\n",
+    "shutil.unpack_archive(\"images.zip\", \"images\", \"zip\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Start Spark Session"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sparknlp\n",
+    "from sparknlp.base import *\n",
+    "from sparknlp.annotator import *\n",
+    "from pyspark.sql import SparkSession\n",
+    "\n",
+    "spark = sparknlp.start()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_df = spark.read.format(\"image\").option(\"dropInvalid\", value = True).load(path=\"images/images/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Pipeline with VisionEncoderDecoderForImageCaptioning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image_assembler = ImageAssembler() \\\n",
+    "    .setInputCol(\"image\") \\\n",
+    "    .setOutputCol(\"image_assembler\")\n",
+    "\n",
+    "image_captioning = VisionEncoderDecoderForImageCaptioning \\\n",
+    "    .pretrained() \\\n",
+    "    .setInputCols([\"image_assembler\"]) \\\n",
+    "    .setOutputCol(\"caption\")\n",
+    "\n",
+    "pipeline = Pipeline(stages=[\n",
+    "    image_assembler,\n",
+    "    image_captioning,\n",
+    "])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+-----------------+---------------------------------------------------------+\n",
+      "|image_name       |result                                                   |\n",
+      "+-----------------+---------------------------------------------------------+\n",
+      "|palace.JPEG      |[a large room filled with furniture and a large window]  |\n",
+      "|egyptian_cat.jpeg|[a cat laying on a couch next to another cat]            |\n",
+      "|hippopotamus.JPEG|[a brown bear in a body of water]                        |\n",
+      "|hen.JPEG         |[a flock of chickens standing next to each other]        |\n",
+      "|ostrich.JPEG     |[a large bird standing on top of a lush green field]     |\n",
+      "|junco.JPEG       |[a small bird standing on a wet ground]                  |\n",
+      "|bluetick.jpg     |[a small dog standing on a wooden floor]                 |\n",
+      "|chihuahua.jpg    |[a small brown dog wearing a blue sweater]               |\n",
+      "|tractor.JPEG     |[a man is standing in a field with a tractor]            |\n",
+      "|ox.JPEG          |[a large brown cow standing on top of a lush green field]|\n",
+      "+-----------------+---------------------------------------------------------+\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "model = pipeline.fit(data_df)\n",
+    "image_df = model.transform(data_df)\n",
+    "image_df \\\n",
+    "    .selectExpr(\"reverse(split(image.origin, '/'))[0] as image_name\", \"caption.result\") \\\n",
+    "    .show(truncate = False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Light Pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To use the annotator in a light pipeline, we need to use the new method `fullAnnotateImage`, which can receive 3 kinds of input:\n",
+    "1. A path to a single image\n",
+    "2. A path to a list of images"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['image_assembler', 'caption'])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "light_pipeline = LightPipeline(model)\n",
+    "annotations_result = light_pipeline.fullAnnotateImage(\"images/images/hippopotamus.JPEG\")\n",
+    "annotations_result[0].keys()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To process a list of images, we just pass a list of images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['image_assembler', 'caption'])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "images = [\"images/images/bluetick.jpg\", \"images/images/palace.JPEG\", \"images/images/hen.JPEG\"]\n",
+    "annotations_result = light_pipeline.fullAnnotateImage(images)\n",
+    "annotations_result[0].keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Annotation(document, 0, 37, a small dog standing on a wooden floor, Map(nChannels -> 3, image -> 0, height -> 500, origin -> images/images/bluetick.jpg, mode -> 16, width -> 333), [])]\n",
+      "[Annotation(document, 0, 52, a large room filled with furniture and a large window, Map(nChannels -> 3, image -> 0, height -> 334, origin -> images/images/palace.JPEG, mode -> 16, width -> 500), [])]\n",
+      "[Annotation(document, 0, 46, a flock of chickens standing next to each other, Map(nChannels -> 3, image -> 0, height -> 375, origin -> images/images/hen.JPEG, mode -> 16, width -> 500), [])]\n"
+     ]
+    }
+   ],
+   "source": [
+    "for result in annotations_result:\n",
+    "  print(result['caption'])"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}