Using pre-built AI to understand images

💡 You must have completed the setup before attempting to do the demo.

In this demonstration, we will use Azure Computer Vision to detect the type of object an image represents.

First, we will use the Computer Vision online web-form to upload an image and observe the results.

Then, we will use the Computer Vision API to collect the same information programatically, using curl.

Defining the problem: Shop by Photo doesn't work right

The problem that motivates this talk is that the Shop by Photo tool in the Tailwind Traders website isn't correctly identifying products. It's useful to run this section in ONNX Deployment at this point to set the scene.

Using Computer Vision via the Web interface

Let's try using computer vision on a picture of a hardware product. If we can identify a product that Tailwind Traders sells by name, we can search for that name in the catalog for the "Shop by Photo" app.

Visit the Computer Vision webpage at https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
Scroll down to the "Analyze an Image" section. It looks like this:

Click the "Browse" button, and choose "man in hardhat.jpg" from the "test images" folder in "CV Training Images".
After a moment, the analysis of your image will appear in the right pane. It looks like this:

FEATURE NAME:	VALUE

Objects	[ { "rectangle": { "x": 138, "y": 27, "w": 746, "h": 471 }, "object": "headwear", "confidence": 0.616 }, { "rectangle": { "x": 52, "y": 33, "w": 910, "h": 951 }, "object": "person", "confidence": 0.802 } ]

Tags	[ { "name": "man", "confidence": 0.999212 }, { "name": "headdress", "confidence": 0.99731946 }, { "name": "person", "confidence": 0.995057464 }, { "name": "clothing", "confidence": 0.991814733 }, { "name": "wearing", "confidence": 0.9827137 }, { "name": "hat", "confidence": 0.9691986 }, { "name": "helmet", "confidence": 0.9227209 }, { "name": "headgear", "confidence": 0.840476155 }, { "name": "personal protective equipment", "confidence": 0.8358513 }, { "name": "looking", "confidence": 0.832229853 }, { "name": "hard hat", "confidence": 0.8004248 }, { "name": "human face", "confidence": 0.785058737 }, { "name": "green", "confidence": 0.774940848 }, { "name": "fashion accessory", "confidence": 0.706475437 } ]

Description	{ "tags": [ "man", "headdress", "person", "clothing", "wearing", "hat", "helmet", "looking", "green", "jacket", "shirt", "standing", "head", "suit", "glasses", "yellow", "white", "large", "phone", "holding" ], "captions": [ { "text": "a man wearing a helmet", "confidence": 0.8976638 } ] }

Image format	"Jpeg"

Image dimensions	1000 x 1000

Clip art type	0

Line drawing type	0

Black and white	false

Adult content	false

Adult score	0.0126242451

Racy	false

Racy score	0.0156497136

Categories	[ { "name": "people_", "score": 0.69140625 } ]

Faces	[ { "age": 37, "gender": "Male", "faceRectangle": { "top": 419, "left": 363, "width": 398, "height": 398 } } ]

Dominant color background	"White"

Dominant color foreground	"White"

Accent Color	#90A526

(Note, the above analysis may change in the future: the Computer Vision model is updated regularly.)

Note that in the first "Objects" result, two objects "headwear" and "person" are detected, and their locations in the image is given. The object we want to detect is classified "headwear", but for our application we need a more specific classification: "hard hat". However "hard hat" is not one of the object types that Computer Vision currently detects. (We'll address this problem with Custom Vision, later.) Also note that a confidence score is given for each object classification.

The second "Tags" result gives a list of labels associated with the entire image. The tag with the highest confidence (listed first) is "man", which doesn't help us much. The second tag, "headdress", is not exactly what we are looking for either.

The other responses are also interesting, but we won't focus on them for our demo. But take a look at what's included:

A caption for the photo ("a man wearing a helmet") in the Description field.
Image features (is it black and white? a line drawing?)
Details of any faces detected in the image (identified as a 37-year-old male in this case)
A score for the content of the image: is it "Adult" or "Racy"?
Color analysis for the image: the dominant foreground, accent, and background colors.

We're really only interested in the "Tags" field for our purposes, so we'll find out how to extract that programatically in the next section.

Using Computer Vision via the API

You can control Computer Vision programatically using its REST API. You can do this from just about any language or application that has access to the Web, but we will use curl, a common command-line application for interfacing with URLs and collecting their outputs. The curl application comes pre-installed on most Linux distributions and in recent versions of Windows 10 (1706 and later).

Run the commands in the file vision_demo.sh. You can use a local Azure CLI or Azure Cloud Shell, but you must use bash as the shell.

The commands in this script will:

Log into your Azure subscription (this step is unneccessary if using Cloud Shell)
Create an Azure Resource Group
Create a Cognitive Service key. (Note: this is an omnibus key that we will also use for Custom Vision, later.)
Find the key
Use CURL to analyze two images

Manually generating Keys for use with Computer Vision

In the script vision_demo.sh, run the section "Create a Key" to programatically create a Cognitive Sevices key using the Azure Command Line Interface. (If you prefer, you can create keys interactively with the Azure Portal.)

Next Step

Custom Vision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEMO Computer Vision.md

DEMO Computer Vision.md

Using pre-built AI to understand images

Defining the problem: Shop by Photo doesn't work right

Using Computer Vision via the Web interface

Using Computer Vision via the API

Manually generating Keys for use with Computer Vision

Next Step

Files

DEMO Computer Vision.md

Latest commit

History

DEMO Computer Vision.md

File metadata and controls

Using pre-built AI to understand images

Defining the problem: Shop by Photo doesn't work right

Using Computer Vision via the Web interface

Using Computer Vision via the API

Manually generating Keys for use with Computer Vision

Next Step