Skip to content

Latest commit

 

History

History
246 lines (212 loc) · 12.3 KB

aws-computer-vision.md

File metadata and controls

246 lines (212 loc) · 12.3 KB

AWS and Computer Vision

GluonCV

  • GluonCV: a deep learning toolkit for computer vision
  • runs on Apache MXNet engine
  • created and maintained by AWS
  • Java, Maven, Linux, CPU:
  • GluonCV was also created to close the a gap between model experimentation and applicable model deployment
  • GluonCV implements models for image classifications, ...
  • these models are accessible in model zoo.
  • these models are pre-trained on public available dataset with millions of images.
  • what is a model?
    • simply defined: a function with an input and an output
    • in CV a model takes an image as input and generate a prediction as output
    • the term network and model is interchangeable
    • for different task you may need different models. it depends on prediction accuracy and compute resource consumption
  • GluonCV covers following computer vision tasks:
    • image classification
    • object detection models (ex. YOLO: Real-Time Object Detection)
    • semantic / instance segmentation
    • pose estimation

Apache MXNet

  • Apache MXNet: is a deep learning framework
  • with MXNet you can build and train neuronal network models
  • it support different languages: Java, Python, R, C++, ...
  • MXNet has a lot of pre-trained models in model-zoo
  • MXNet supports ONNX: https://onnx.ai/supported-tools.html

AWS: ML Stack

  • AI Services: contains high level API's for vision, speech, language, ...
    • Amazon Rekognition
  • ML Services
    • Amazon SageMaker
  • ML Frameworks & Infrastructure:
    • DL Frameworks TensorFlow, MXNet, Pytorch, ...
    • Infrastructure: EC2, Deep Learning Containers, IoT Greengrass
      • AWS Deep Learning AMI

Amazon Rekognition

  • for image and video analysis: object, scene and activity detection
  • provides simple API for usage

Amazon SageMaker

  • SageMaker service is compososed of many services: label, build & train, tune, compile, deploy
  • amazon sagemaker has a service for labeling for supervised learning. ex. when classification a dog image. on train a human has to label it:
    • a flag: yes it is a dog
    • pixel coordinates of the dog in the image
  • a jupyter notebook with preinstalled (CONDA environments) apache mxnet, tensorflow, pytorch, chainer and non-deeplearning frameworks scikit-learn and Spark ML.
  • in the jupyter notebook you can write your own ML models, train it, ...
  • or you can use build-in algorithms: K-Means, K-Nearest Neighbors (k-NN), BlazingText, ...
  • further algorithm by 3rd pary is listed in AWS Marketplace
  • you can train a model locally on amazon sagemaker notebook or use model training jobs (needed infrastructure / instances is created instantly)
  • model is stored is S3
  • model optimization jobs: compiles the trained model into exe
  • deployment
    • amazon sagemaker endpoint: http request
    • AWS IoT Greengrass: for deployment on edge devices; see also Amazon SageMaker Neo
  • the workflow is controlable by AWS CLI
  • or SDK's in python import sagemaker

AWS Deep Learning AMI

  • Deep Learning Amazon Machine Images: DLAMI
  • AMI is a template to create a virtual machine (instance) in EC2 (Amazon Elastic Compute Cloud)
  • an AMI includes the OS and any additional software / dependency
  • its like purchasing an computer. it has an OS and additional programs
  • the DLAMI provides different OS (Ubuntu, Amazon Linux, Windows), preinstalled DL frameworks (mxnet, tensorflow, ...) and Nvidia CUDA drivers
  • if you create an DLAMI instance with EC2 then you can access it with SSH (private/public key). you access the instance with the public DNS: ssh -i "private-key.pem" [email protected]. do not forget to shutdown the instance (for cost reasons). and also delete the instance because you'll get charged for the storage.

AWS Deep Learning Containers

  • these containers provide just another way to set up a deep learning environment on AWS with optimized, prepackaged, container images
  • amazon provides a docker container repository: Amazon Elastic Container Registry (ECR)
  • Amazon Elastic Container Service: Amazon ECS. ECS do the container orchestration.
  • So what are Deep Learning Containers? These are Docker container images pre-installed with deep learning frameworks
  • you can deploy container on:
    • ECS
    • Amazon Elastic Kubernetes Services: Amazon EKS
    • EC2 with DLAMI: connect to instance (with SSH) then docker run
    • on your own machine: Docker and Amazon CLI has to be installed

Module 3: GluonCV Start

  • understand how to use pre-trained models for computer vision tasks
  • as previously defined you can use GluonCV on local machine (by setup environment) or use pre-installed environment in a Amazon Sagemaker instance, Amazon Elastic Compute Cloud (EC2), Amazon Deep Learning AMI (DLAMI)

Setup Virtual Environment

Example for ubuntu. in order to work on a clean independent environment we'll use a virtual environment. then we install mxnet and gluoncv

  1. create virtual environment: only once
python3 -m venv gluoncv
  1. activate virtual environment
cd gluoncv
source bin/activate
  1. deactivate virtual environment
deactivate

Install MXNet and GluonCV

pip install mxnet
pip install gluoncv

# for CPU optimized
# pip install mxnet-mkl

# for GPU optimized
# pip install mxnet-cu101

Image Classification with a pre-trained model

  • objective is to classify the image from a list of predetermined classes
  • when making a prediction the model will assign a probability to each of these classes
  • ex. we have a mountain image and our classes are: mountain, beach, forest
    • the model will now give probabilities to each class: mountain 80%, beach 0%, forst 20%
  • GluonCV Models are pre-trained on public available dataset

Datasets

  • Datasets for Image Classification
  • CIFAR-10 (Canadian Institute for Advanced Research) used over 10 years for computer research. it includes 10 basic classes: cars, cats, dogs, ... and 60000 images. its a small dataset with low resolution images (32x32).
  • ImageNet is another images classification dataset. released by prinston university in 2009. with 14 mio images and 22.000 classes. refered as ImageNet22k. models are pre-trained on a sub-set of it (Imagenet1k: 1.000.000 images and 1000 classes).

Models

  • Neuronal Network Models for image classification
  • there are a lot of different models architectures (classification model architectures)
    • ResNet is popular
    • ResNet has different variants: ResNet18, ResNet50
    • MobileNet is good for mobile phones
    • or: VGG, SqueezeNet, DenseNet, AlexNet, DarkNet, Inception, ...
  • how to decide which model to take? accurary, memory consumption, ...
  • as already wrote, GluonCV implement these model based on the research paper. The GluonCV models are compared to other implementations optimized. The GluonCV version to ResNet-152 is called ResNet-152D.

Code Examples

image-classification.py

Object Detection with a pre-trained model

  • understand the content of an image
  • ex. medical image analysis, self driving cars, ...
  • locate object in an image with a box (called bounding box)
  • additionally the located object is classified from a list of predetermined classes
  • the model will also give a probability for each class

Datasets

  • Pascal VOC (Visual Object Class)
    • 2007 version: 10000 images and 24500 object classes
    • 2012 version: 11500 images and 27500 object classes
  • COCO (Common Object in Context)
    • 2017
    • 123000 images
    • 886000 objects
    • 80 object classes: Pascal VOC classes are included. Additionally following categories are more covered: sports, food, household objects (table, tv, computer, ...)

Models

  • Object detection model architectures:
    • Faster-RNN: with ResNet
      • Faster-RNN is an extension of ResNet with additional components
      • the network output need coordinates for bounding box, ...
      • ResNet is called the base network or backbone
    • SSD: with VGG or ResNet or MobilNet
    • YOLO: with Darknet or MobileNet

Code Examples

object-detection.py

Image Segmentation with a pre-trained model

  • see code documentation: image-segmentation.py

Neural Network Essentials

  • the pre-trained models are based of components called Neural Networks especially Convolutional Neural Networks

Fully Connected Network

  • a fundamental network is called the fully connected network
  • is called fully connected because all inputs are connected to outputs
  • its a general purpose network that makes no assumption about the input data
  • the network begins from 0. it has no special information from the image and has to re-learn the relationship between the pixels

source: aws training

  • example of a fully connected network with 4 inputs with 3 fully connected layers
  • the 4 inputs can correspond to 4 pixels (if we have a 2x2 image)
  • we have 3 outputs (with predictions) because of 3 classes
  • an image is composed of pixels. and each pixel of an RGB image will have three values that encode the intensity of red, green and blue colors.
  • more simplified example. we have a gray scale image
    • each pixel represent the intensity: 0 is black and 1 is white
    • now all pixels are flattened for the first layer: pixel to input
  • all connections are weighted
    • with this weights the activation function is activated and produces the output
    • with a activation function like Sigmoid function the network has the ability to learn
    • therefore the train objective is to find the best weights
  • these connection weights are called also network parameters
    • real world models have billions of network parameteres or even more
  • pre-trained models have already good network parameter that have been learned already
    • we download a file with these values and create a model based on it

Convolutional networks

  • used in computer vision tasks
  • Convolutional neural networks learn local patterns from small neighborhoods of pixels, unlike fully connected networks that learned patterns across all pixels of the image.
  • two most important operations: the convolution operation and the max-pooling operation
  • a component of the convolution operation is the kernel or filter
  • the input goes through this kernel and generate an output to learn pattern or extract features
  • you can achive by this operation: edge feature extraction, or blur image, sharpen
  • with deep learning we learn the parameter for the best suited kernel values
  • max pooling reduces the dimension by getting the max value of a defined kernel ex. 2x2 (4 input values) the output will be 1 value

what is a feature?

  • the number of features is equal to number of nodes in the input layer
  • if we want to classify man or woman then the attributes (height, hair length, ...) of them are the features
  • if we want to classify images (cat / dog) then our features = inputs. the next layer can then do features extraction like cat ears vs dog ears, ...

Module 4: Gluon Fundamentals

  • Understand the mathematics behind NDArray
  • Understand when to use different Gluon blocks including convolution, dense layers, and pooling
  • Compose Gluon blocks into complete models
  • Understand the difference between metrics and loss

N-dimensional arrays

  • ndarrays also called tensors
  • vectors and matrices of N-dimensions
  • used in deep learning to represent input, output, ...

source: aws training

NDArray

  • NDArray: MXNet's primary tool for storing and transforming numerical data
  • examples: ndarray.py, ndarray-operations.py

Gluon Blocks

  • how to create a neuronal network using the gluon API of mxnet
    • examples: gluon-blocks.py, gluon-blocks-init.py
  • create a sequential block to compose a sequence of layers to create a neural network
    • examples: gluon-blocks-sequential.py, gluon-blocks-custom.py
  • visualize gluon models (blocks) to understand it better

Sources, Links