Skip to content

Aleaume/UdacityML_P2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Udacity AZ ML Enginner Project #2 - Deploying Models & Pipelines - Q4 2021

Requirements

  • Open Azure ML Studio & login.
  • Create a compute instance on which you will run your notebooks and scripts.
  • In order for the project & scripts to run, please make sure the Folder "Project-Solution-Cousseau" is uploaded.

Overview

  • Model deployment (details in Figure 1)

    • First, it picks up a public csv file with Bankmarketing data and creates a dataset in Azure.
    • Then an experiment is created to find the best model.
    • After registering this model, it is deployed and can be consumed via a test python script
    • In parallel, it uses the swagger.json file generated at deploy to visualize in a swagger page. The instance being set on a docker.
    • Also, a benchmark scripts calls on an apache instance to test the response time and performance of the deployed model.
  • Pipeline publishing (Figure 2)

    • In the second part, we make use of the same computing cluster, to run again a similar experiment, and capture the best model.
    • We then create a pipeline from this model and publish it.
    • Via the jupyter notebook, we test and consume the endpoint for this pipeline.

Architectural Diagram

image Figure 1

image Figure 2

Key Steps

  • Step #1: Environment set up => not necessary as I made use of the Udacity Lab with pre-installed tools

    • The only set-up / preparation activity was to create a compute instance (in order to run the Notebook See step #2)
  • Step #2 : Create and run Auto ML Experiment => achieved using azure SDK for Python, see Notebook project_2_udacity_Cousseau.ipynb

    • First step up is to retrieve the current workspace
        from azureml.core import Workspace, Experiment, Dataset
        #ws = Workspace.get(name="udacity-project")
        ws = Workspace.from_config()  
        # using the current workspace (Lab)  SOURCE: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py
        exp = Experiment(workspace=ws, name="udacity-project_2_Cousseau")
        print('Workspace name: ' + ws.name, 
              'Azure region: ' + ws.location, 
              'Subscription id: ' + ws.subscription_id, 
              'Resource group: ' + ws.resource_group, sep = '\n')
        run = exp.start_logging()
    • Then I created a compute cluster with the instructions given (Standard_DS12_V2, minimum 1 node)
                cluster_name = "myCluster"
          try:
              cluster = ComputeTarget(workspace=ws, name=cluster_name)
              print("Cluster already created")
          except ComputeTargetException:
              compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",min_nodes=1, max_nodes=6)
              cluster = ComputeTarget.create(ws,cluster_name, compute_config) #creates the actual cluster

    image

    • Once done and running, we then load the Dataset from the url provided

    image

  • From there on, we setup a new AutoML experiment with the required constraint (Classification, Explain best Model, Exit after 1h, max concurrency to 5) image

  • Once ran, we have a look at the results and display the best run for checking

  • Before we then jump to Step #3 we need to register the run as a Model.

image image

  • Step #3: Deploy the Best Model => achieved using azure SDK for Python, see Notebook xxx

    • First we need to define the inference configuration
                  #Define inference configuration
      
                  #score.py needs to be located in the same directory as this notebook. Otherwise update the source_directory variable
      
                  from azureml.core import Environment
                  from azureml.core.model import InferenceConfig
      
                  env = Environment(name="Project 2 Udacity")
                  my_inference_config = InferenceConfig(
                      environment=env,
                      source_directory="./",
                      entry_script="./score.py",
                  )
    • And then we can deploy it to ACI
                 #Deploy to ACI
    
              from azureml.core.webservice import AciWebservice
    
              deployment_config = AciWebservice.deploy_configuration(
                  cpu_cores=0.5, memory_gb=1, auth_enabled=True
              )
    
              service = model.deploy(
                  ws,
                  "myservice",
                  [model],
                  my_inference_config,
                  deployment_config,
                  overwrite=True,
              )
              service.wait_for_deployment(show_output=True)
    
              print(service.get_logs())

    image

  • Step #4: Enable Logging

image

image

  • Step #5: Swagger Documentation

    • Download swagger.json from Model just deployed and save it in the local swagger folder
    • Start git bash from this folder, and run swagger.sh image
    • Once done, run serve.py

    image

    • Finally we can display swagger documentation about the deployed model:

    image image

  • Step #6: Consume Model Endpoints

    • From the consume tab of the deployed model we retrieve both the score ui & the key image

    • Then we make sure to add those in the endpoint.py script before running it

    image

  • Optional - Benchmark the endpoint -

    • First we check that the Apache CLI tool is installed image

    • We then update the URI and key in the endpoint.py and run the script

    • Last we run the benchmark.sh and check the results image image

  • Step #7: Create, Publish and Consume a Pipeline

From this step, the project switches to a second notebook, provided by udacity, named: "aml-pipelines-with-automated-machine-learning-step.ipynb"

  • Changes made to original file:
    • folder name & experiment name
    • compute cluster name to match the existing one
    • dataset name
    • update the automl settings & config to match previous experiment

-Once the workspace, cluster, dataset and model have been either retrieved or created, we start by creating a pipeline & running it: image image

image image image image

  • After downloading the results outputs, and exmining it, we retrieve the best Model, and test it: image

  • Last, we then publish the Pipeline and test it: image image - We can here see the Published Pipeline Overview showing the REST endpoint and status is ACTIVE. image

Screen Recording

Standout Suggestions

-Quickly after understanding what was asked for this Project, I decided to try to deliver it using mostly Azure Python SDK rather that the GUI of Azure ML Studio.

This has proven more challenging but more rewarding and helped me understand better some topics.

Also, as I want this project to be a go-back-to resource whenever I need, I tried do have an extensive documentation of my work and each steps.

This can also be seen in the extensive "bonus" video I recorded to track back every single steps.

-In terms of improvement, I believe there is first an investigation to be made into the struggle to register a model and then deploy it with a ready-generated swagger info. Also, I believe the entry script and environment settings of some sorts are holding me from being able to have it properly running when trying to deploy with the score.py script. For the purpose of this project a simple dummy script is working. This issue lies in the ability to reference a model (in order to call the predict function) from the score.py script. -Sources tried: https://knowledge.udacity.com/questions/414299 https://knowledge.udacity.com/questions/419852 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#registermodel https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-advanced-entry-script#load-registered-models

The model itself could be also improved, we do have an acceptable first results with an accuracy ~0.918 but with fine tuning the algorithm used we could maybe reach an even better result. Also some additional preparation step could help (i.e. Normaization).

SOURCES: https://docs.microsoft.com/en-us/azure/architecture/data-science-process/prepare-data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published