Udacity AZ ML Enginner Project #2 - Deploying Models & Pipelines - Q4 2021

Requirements

Open Azure ML Studio & login.
Create a compute instance on which you will run your notebooks and scripts.
In order for the project & scripts to run, please make sure the Folder "Project-Solution-Cousseau" is uploaded.

Overview

Model deployment (details in Figure 1)
- First, it picks up a public csv file with Bankmarketing data and creates a dataset in Azure.
- Then an experiment is created to find the best model.
- After registering this model, it is deployed and can be consumed via a test python script
- In parallel, it uses the swagger.json file generated at deploy to visualize in a swagger page. The instance being set on a docker.
- Also, a benchmark scripts calls on an apache instance to test the response time and performance of the deployed model.
Pipeline publishing (Figure 2)
- In the second part, we make use of the same computing cluster, to run again a similar experiment, and capture the best model.
- We then create a pipeline from this model and publish it.
- Via the jupyter notebook, we test and consume the endpoint for this pipeline.

Architectural Diagram

Figure 1

Figure 2

Key Steps

Step #1: Environment set up => not necessary as I made use of the Udacity Lab with pre-installed tools
- The only set-up / preparation activity was to create a compute instance (in order to run the Notebook See step #2)

Step #2 : Create and run Auto ML Experiment => achieved using azure SDK for Python, see Notebook project_2_udacity_Cousseau.ipynb

First step up is to retrieve the current workspace

    from azureml.core import Workspace, Experiment, Dataset
    #ws = Workspace.get(name="udacity-project")
    ws = Workspace.from_config()  
    # using the current workspace (Lab)  SOURCE: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py
    exp = Experiment(workspace=ws, name="udacity-project_2_Cousseau")
    print('Workspace name: ' + ws.name, 
          'Azure region: ' + ws.location, 
          'Subscription id: ' + ws.subscription_id, 
          'Resource group: ' + ws.resource_group, sep = '\n')
    run = exp.start_logging()

Then I created a compute cluster with the instructions given (Standard_DS12_V2, minimum 1 node)

            cluster_name = "myCluster"
      try:
          cluster = ComputeTarget(workspace=ws, name=cluster_name)
          print("Cluster already created")
      except ComputeTargetException:
          compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",min_nodes=1, max_nodes=6)
          cluster = ComputeTarget.create(ws,cluster_name, compute_config) #creates the actual cluster

Once done and running, we then load the Dataset from the url provided

From there on, we setup a new AutoML experiment with the required constraint (Classification, Explain best Model, Exit after 1h, max concurrency to 5)
Once ran, we have a look at the results and display the best run for checking
Before we then jump to Step #3 we need to register the run as a Model.

Step #3: Deploy the Best Model => achieved using azure SDK for Python, see Notebook xxx

First we need to define the inference configuration

            #Define inference configuration

            #score.py needs to be located in the same directory as this notebook. Otherwise update the source_directory variable

            from azureml.core import Environment
            from azureml.core.model import InferenceConfig

            env = Environment(name="Project 2 Udacity")
            my_inference_config = InferenceConfig(
                environment=env,
                source_directory="./",
                entry_script="./score.py",
            )

And then we can deploy it to ACI

             #Deploy to ACI

          from azureml.core.webservice import AciWebservice

          deployment_config = AciWebservice.deploy_configuration(
              cpu_cores=0.5, memory_gb=1, auth_enabled=True
          )

          service = model.deploy(
              ws,
              "myservice",
              [model],
              my_inference_config,
              deployment_config,
              overwrite=True,
          )
          service.wait_for_deployment(show_output=True)

          print(service.get_logs())

Step #4: Enable Logging

Step #5: Swagger Documentation
- Download swagger.json from Model just deployed and save it in the local swagger folder
- Start git bash from this folder, and run swagger.sh
- Once done, run serve.py
- Finally we can display swagger documentation about the deployed model:
Step #6: Consume Model Endpoints
- From the consume tab of the deployed model we retrieve both the score ui & the key
- Then we make sure to add those in the endpoint.py script before running it
Optional - Benchmark the endpoint -
- First we check that the Apache CLI tool is installed
- We then update the URI and key in the endpoint.py and run the script
- Last we run the benchmark.sh and check the results
Step #7: Create, Publish and Consume a Pipeline

From this step, the project switches to a second notebook, provided by udacity, named: "aml-pipelines-with-automated-machine-learning-step.ipynb"

Changes made to original file:
- folder name & experiment name
- compute cluster name to match the existing one
- dataset name
- update the automl settings & config to match previous experiment

-Once the workspace, cluster, dataset and model have been either retrieved or created, we start by creating a pipeline & running it:

After downloading the results outputs, and exmining it, we retrieve the best Model, and test it:
Last, we then publish the Pipeline and test it: - We can here see the Published Pipeline Overview showing the REST endpoint and status is ACTIVE.

Screen Recording

The 5 min Pipeline screencast can be accessed here: https://youtu.be/JBdv4biEUS8
A full version of the 1st part of the project regarding the Model can be viewed here: https://youtu.be/2ScCChpkOxg

Standout Suggestions

-Quickly after understanding what was asked for this Project, I decided to try to deliver it using mostly Azure Python SDK rather that the GUI of Azure ML Studio.

This has proven more challenging but more rewarding and helped me understand better some topics.

Also, as I want this project to be a go-back-to resource whenever I need, I tried do have an extensive documentation of my work and each steps.

This can also be seen in the extensive "bonus" video I recorded to track back every single steps.

-In terms of improvement, I believe there is first an investigation to be made into the struggle to register a model and then deploy it with a ready-generated swagger info. Also, I believe the entry script and environment settings of some sorts are holding me from being able to have it properly running when trying to deploy with the score.py script. For the purpose of this project a simple dummy script is working. This issue lies in the ability to reference a model (in order to call the predict function) from the score.py script. -Sources tried: https://knowledge.udacity.com/questions/414299 https://knowledge.udacity.com/questions/419852 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#registermodel https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-advanced-entry-script#load-registered-models

The model itself could be also improved, we do have an acceptable first results with an accuracy ~0.918 but with fine tuning the algorithm used we could maybe reach an even better result. Also some additional preparation step could help (i.e. Normaization).

SOURCES: https://docs.microsoft.com/en-us/azure/architecture/data-science-process/prepare-data

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Archive		Archive
Project-Solution-Cousseau		Project-Solution-Cousseau
README.md		README.md
swagger.json		swagger.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Udacity AZ ML Enginner Project #2 - Deploying Models & Pipelines - Q4 2021

Requirements

Overview

Architectural Diagram

Key Steps

Screen Recording

Standout Suggestions

About

Releases

Packages

Languages

Aleaume/UdacityML_P2

Folders and files

Latest commit

History

Repository files navigation

Udacity AZ ML Enginner Project #2 - Deploying Models & Pipelines - Q4 2021

Requirements

Overview

Architectural Diagram

Key Steps

Screen Recording

Standout Suggestions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages