- Open Azure ML Studio & login.
- Create a compute instance on which you will run your notebooks and scripts.
- In order for the project & scripts to run, please make sure the Folder "Project-Solution-Cousseau" is uploaded.
-
Model deployment (details in Figure 1)
- First, it picks up a public csv file with Bankmarketing data and creates a dataset in Azure.
- Then an experiment is created to find the best model.
- After registering this model, it is deployed and can be consumed via a test python script
- In parallel, it uses the swagger.json file generated at deploy to visualize in a swagger page. The instance being set on a docker.
- Also, a benchmark scripts calls on an apache instance to test the response time and performance of the deployed model.
-
Pipeline publishing (Figure 2)
- In the second part, we make use of the same computing cluster, to run again a similar experiment, and capture the best model.
- We then create a pipeline from this model and publish it.
- Via the jupyter notebook, we test and consume the endpoint for this pipeline.
-
Step #1: Environment set up => not necessary as I made use of the Udacity Lab with pre-installed tools
- The only set-up / preparation activity was to create a compute instance (in order to run the Notebook See step #2)
-
Step #2 : Create and run Auto ML Experiment => achieved using azure SDK for Python, see Notebook project_2_udacity_Cousseau.ipynb
- First step up is to retrieve the current workspace
from azureml.core import Workspace, Experiment, Dataset #ws = Workspace.get(name="udacity-project") ws = Workspace.from_config() # using the current workspace (Lab) SOURCE: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py exp = Experiment(workspace=ws, name="udacity-project_2_Cousseau") print('Workspace name: ' + ws.name, 'Azure region: ' + ws.location, 'Subscription id: ' + ws.subscription_id, 'Resource group: ' + ws.resource_group, sep = '\n') run = exp.start_logging()
- Then I created a compute cluster with the instructions given (Standard_DS12_V2, minimum 1 node)
cluster_name = "myCluster" try: cluster = ComputeTarget(workspace=ws, name=cluster_name) print("Cluster already created") except ComputeTargetException: compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS12_V2",min_nodes=1, max_nodes=6) cluster = ComputeTarget.create(ws,cluster_name, compute_config) #creates the actual cluster
- Once done and running, we then load the Dataset from the url provided
-
From there on, we setup a new AutoML experiment with the required constraint (Classification, Explain best Model, Exit after 1h, max concurrency to 5)
-
Once ran, we have a look at the results and display the best run for checking
-
Before we then jump to Step #3 we need to register the run as a Model.
-
Step #3: Deploy the Best Model => achieved using azure SDK for Python, see Notebook xxx
- First we need to define the inference configuration
#Define inference configuration #score.py needs to be located in the same directory as this notebook. Otherwise update the source_directory variable from azureml.core import Environment from azureml.core.model import InferenceConfig env = Environment(name="Project 2 Udacity") my_inference_config = InferenceConfig( environment=env, source_directory="./", entry_script="./score.py", )
- And then we can deploy it to ACI
#Deploy to ACI from azureml.core.webservice import AciWebservice deployment_config = AciWebservice.deploy_configuration( cpu_cores=0.5, memory_gb=1, auth_enabled=True ) service = model.deploy( ws, "myservice", [model], my_inference_config, deployment_config, overwrite=True, ) service.wait_for_deployment(show_output=True) print(service.get_logs())
- First we need to define the inference configuration
-
Step #4: Enable Logging
-
Step #5: Swagger Documentation
- Download swagger.json from Model just deployed and save it in the local swagger folder
- Start git bash from this folder, and run swagger.sh
- Once done, run serve.py
- Finally we can display swagger documentation about the deployed model:
-
Step #6: Consume Model Endpoints
-
Optional - Benchmark the endpoint -
-
Step #7: Create, Publish and Consume a Pipeline
From this step, the project switches to a second notebook, provided by udacity, named: "aml-pipelines-with-automated-machine-learning-step.ipynb"
- Changes made to original file:
- folder name & experiment name
- compute cluster name to match the existing one
- dataset name
- update the automl settings & config to match previous experiment
-Once the workspace, cluster, dataset and model have been either retrieved or created, we start by creating a pipeline & running it:
-
After downloading the results outputs, and exmining it, we retrieve the best Model, and test it:
-
Last, we then publish the Pipeline and test it:
- We can here see the Published Pipeline Overview showing the REST endpoint and status is ACTIVE.
- The 5 min Pipeline screencast can be accessed here: https://youtu.be/JBdv4biEUS8
- A full version of the 1st part of the project regarding the Model can be viewed here: https://youtu.be/2ScCChpkOxg
-Quickly after understanding what was asked for this Project, I decided to try to deliver it using mostly Azure Python SDK rather that the GUI of Azure ML Studio.
This has proven more challenging but more rewarding and helped me understand better some topics.
Also, as I want this project to be a go-back-to resource whenever I need, I tried do have an extensive documentation of my work and each steps.
This can also be seen in the extensive "bonus" video I recorded to track back every single steps.
-In terms of improvement, I believe there is first an investigation to be made into the struggle to register a model and then deploy it with a ready-generated swagger info. Also, I believe the entry script and environment settings of some sorts are holding me from being able to have it properly running when trying to deploy with the score.py script. For the purpose of this project a simple dummy script is working. This issue lies in the ability to reference a model (in order to call the predict function) from the score.py script. -Sources tried: https://knowledge.udacity.com/questions/414299 https://knowledge.udacity.com/questions/419852 https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#registermodel https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-advanced-entry-script#load-registered-models
The model itself could be also improved, we do have an acceptable first results with an accuracy ~0.918 but with fine tuning the algorithm used we could maybe reach an even better result. Also some additional preparation step could help (i.e. Normaization).
SOURCES: https://docs.microsoft.com/en-us/azure/architecture/data-science-process/prepare-data