This is a no-code tutorial project that involves training a machine learning model using Automated ML/AutoML in Azure Machine Learning Studio.
For demonstration purposes, this tutorial will train a simple classification model to predict whether a client will subscribe to a fixed-term deposit with a financial institution or not.
- Azure account & subscription.
- Familiarity with basic Azure concepts such as resource groups, subscriptions, Azure Storage, Azure Compute, & some familiarity with Azure Machine Learning Studio is recommended.
- Basic understanding of machine learning concepts such as supervised learning, classification, regression, & evaluation metrics.
1. Create a workspace
2. Upload a dataset as a data asset
3. Create an automated machine learning job
4. Explore trained model/s
5. Deploy & test trained model/s
Option 1: From the Azure Machine Learning Studio
- Go to https://ml.azure.com.
- In the left navigation pane, go to Workspaces, then click
+ New
. - In the Name section, enter a name for your workspace. (Note: You can choose any name.)
- In the Subscription section, select the subscription you want to use for the workspace. (Note: You can choose any subscription you have.)
- In the Resource group section, select an existing resource group from your Azure account or create a new one by clicking
Create new
. This resource group will store the instance for your Azure ML Studio workspace. - In the Region section, choose the region where you want your workspace to be deployed. (Note: Select a region based on accessibility & availability. For this project, it will be deployed in "East US 2" due to its high availability.)
- Click
Create
.
Option 2: From the Azure Portal
- Go to https://portal.azure.com.
- Under Azure services, click
Create a resource
. - Search for "Azure Machine Learning" & click on the Azure Machine Learning service, then click
Create
. - In the Subscription section, select the subscription you want to use for the workspace. (Note: You can choose any subscription you have.)
- In the Resource group section, select an existing resource group from your Azure account or create a new one by clicking
Create new
. This resource group will store the instance for your Azure ML Studio workspace. - In the Name section, enter a name for your workspace. (Note: You can choose any name.)
- In the Region section, choose the region where you want your workspace to be deployed. (Note: Select a region based on accessibility & availability. For this project, it will be deployed in "East US 2" due to its high availability.)
- Click
Review + create
. - After passing the validation, click
Create
. - After successful deployment, go to https://ml.azure.com.
- In the left navigation pane, go to Workspaces.
- Go to the workspace you created.
- In the left navigation pane, go to Data.
- In the Data page, click
+ Create
.
- In the Name section, enter a name for your data asset (Note: You can choose any name, but for this project, it will be named "bankmarketing".). Next, in the Type section, select the type of data stored in your dataset. (Note: For this project, the type of data we'll use is "Tabular".). After that, click
Next
. - Select the source from which your dataset will be imported by choosing a source for your data asset. (Note: For this project, we'll select "From local files".). After that, click
Next
.
- In the Datastore type section, select the type of storage where your dataset will be stored. (Note: For this project, we'll use the default option, which is "Azure Blob Storage".)
- Select a datastore from the list of existing datastores or create a new one by clicking
Create new datastore
. (Note: For this project, we'll use the default option, which is "workspaceblobstore".). After that, clickNext
. - Select "Upload files" from the Upload files or folder drop-down, then upload your . (Note: For this project, I recommend downloading & using this dataset, as this is what we'll use.). After the dataset has been uploaded, click
Next
.
- In the Data preview section, verify that the data in the dataset is populated as follows.
- Verify that the data is properly formatted. After you verify that the data is populated & properly formatted, click
Next
. (Note: For this project, keep the data format as it is: File format set to "Delimited", Delimiter set to "Comma", Column headers set to "All files have same headers", Encoding set to "UTF-8", Skip rows set to "None", leave the Dataset contains multi-line data un-ticked) - On Schema, ensure that the data Type of each column in the dataset is correct & modify the columns you want to include. After everything is verified, click
Next
.
- On Review, ensure that all information matches what was previously configured for your data asset. Once everything is verified, click
Create
.
- In the left navigation pane, go to Automated ML, then click
+ New Automated ML job
. - In the Job name section, enter a name for your training job. (Note: You can choose any name, but for this project, you can simply name it "Deposit-Subscription-Prediction".)
- In the Experiment name section, select "Create new". (Note: If this is your first time creating an experiment/job or you don't have any existing experiments, this section might be grayed out & defaulted to "Create new". If this is the case, leave it as is.)
- In the New experiment name section, enter a name for your experiment. (Note: You can choose any name, but for this project, you can name it "Binary-Classification" since the model we will train is a binary classification model.)
- In the Description section, you can also put a description about your experiment. (Optional)
- Click
Next
. - Select "Classification" from the Select task type drop-down. Then, in the Select data section, select the "bankmarketing" dataset we uploaded. After that, click Next.
- In the Target column section, select "y (string)" as this column indicates whether the client subscribed to a term deposit or not, which corresponds to what we want our model to predict.
- Select View additional configuration settings & ensure the following configurations to better control the training job: set Primary metric to "AUCWeighted", enable Explain best model & Use all supported models, ensure no models are checked in the Blocked models section, & leave the Positive class label section blank. After configuring these settings, click Save.
- In the Limits dropdown, ensure the following configuration: set Max nodes to "6", set Metric score threshold to "0.8" (equivalent to 80%) since the model we're training is only a baseline model.
- In the Validation type dropdown, select "k-fold cross-validation", then enter "2" as your Number of cross validations. (Note: This section is optional & may be left as is, but for this project, a k-fold cross-validation will be implemented.)
- Click
Next
. - In the Select compute type dropdown, select "Compute cluster".
- In the Select Azure ML compute cluster section, either create a new compute cluster or select an existing one. If no compute cluster is available, you can create a new one by clicking
+ New
- If you decide to create one, select "East US 2" in the Location dropdown. (Note: Select a region based on accessibility & availability. For this project, it will be deployed in "East US 2" due to its high availability.)
- In the Virtual machine tier section, select "Dedicated", then choose "CPU" as your Virtual machine type. (Note: You can opt for a "GPU" for faster training, but this may result in higher costs & quicker consumption of your Azure credits.)
- In the Virtual machine size section, select "Select from all options", then find & choose "Standard_DS12_v2" from the options. (Note: You may choose other virtual machine sizes based on your requirements, but for this tutorial, we will use "Standard_DS12_v2" as it offers a balanced combination of CPU, memory, & storage for most workloads.)
- Click
Next
. - In the Compute name section, enter a name for your compute & leave the other configurations as they are. (Note: You can choose any name, but for this project, you can simply name it "automl-compute".)
- Click
Create
. - Once you have successfully created a new compute cluster, select the newly created cluster in the Select Azure ML compute cluster dropdown & click Next.
- Click Submit training job.
- To monitor the training progress, navigate to the Jobs tab under Assets in the left navigation pane, then click on "Deposit-Subscription-Prediction." You can check the training status under the Status section. If it says "Completed," your model/s have finished training. (Note: With our current training job setup & dataset, the training process could take anywhere from 15 minutes to 1 hour under typical conditions. However, these are general estimates, & the actual time may vary.)
- To explore the models you've trained, go to the Jobs tab in the Assets section of the left navigation pane, then select the "Deposit-Subscription-Prediction" job.
- First, we can check the evaluation metrics of the model/s in the Models & Child Jobs tab. Then, click on the name of the model's algorithm, which is "MaxAbsScaler, LightGBM".
- Under Model Summary > AUC Weighted, you can see the score that represents the model's overall performance in distinguishing between positive & negative classes. Since "AUC Weighted" is the primary metric used in this job, a higher AUC value indicates better model performance.
- Aside from AUC Weighted, you can also view other metrics by clicking View all other metrics or go to the Metrics tab. (Note: If you don't see any metric in the Metrics tab just click
π Refresh
.) - In the Metrics tab, you can filter & view only the metrics you want by using the Select Metrics panel on the left side. (Note: Click the double-arrow button pointing to the right next to the "Select Metrics" text to access the filtering options.)
- Back to the Model tab, you can also view the hyperparameters used to improve the performance of the model/s under Model Summary > AUC Weighted > View hyperparameters.
- You can also view an explanation of the model/s & see which data features (raw or engineered) influenced a particular model's predictions in the Explanations (preview) tab.
- Test predictions can also be performed in the Test results (preview) tab. In this tab, click Test model (preview) & configure the following settings: set Select compute type to "Compute cluster," set Select Azure ML compute cluster to "automl-compute," & choose the dataset you want to use under Select a dataset, then click
Test
. (Note: Ensure you have a test dataset available for making predictions. It is recommended to use a different dataset than the one used to train the model.) - You can monitor the progress of the testing in the table within the Test results (preview) tab. If the table is empty, simply click
π Refresh
to update the view. Once the testing is complete, you will see the testing job's AUC score result in the "AUC Weighted" column, & the status will indicate "Completed".
- Using the Automated ML/AutoML interface, you can also deploy the models you trained by clicking
β· Deploy
. (Note: Ensure you are still in Assets > Jobs > "Deposit-Subscription-Prediction" > Models and Child Jobs > "MaxAbsScaler, LightGBM" to see theβ· Deploy
dropdown button.) - In the
β· Deploy
dropdown button, select "Real-time endpoint". (Note: For this tutorial, we will select the "Real-time endpoint" option to enable individual real-time predictions.) - If you don't have an existing endpoint, select the "New" option & leave the configured settings as they are, then click
Deploy
. If you have an existing endpoint, select the "Existing" option & choose the desired endpoint from the Endpoint name dropdown. - To check the endpoint's deployment status, navigate to Assets > Endpoints & click the name of the endpoint you just created. If it is successfully deployed the Provisioning state under Endpoint attributes section will indicate "Succeeded".
- After the deployment of the endpoint, you can also check check model's deployment status by navigating to Assets > Endpoints & click the name of the endpoint you just created. If it is successfully deployed, the Provisioning state under the Deployment <deployment_name> section will indicate "Succeeded".
- To test the deployed model & predict whether a client will subscribe to a fixed-term deposit with a financial institution or not, navigate to the Test tab & input data in JSON format in the Sample inference > Input editor. (Note: For your convenience, you can use this example input data. This also includes an explanation of each column to give you an idea on why each input data is used.)
- To view the prediction results, scroll-down to the bottom & see the results in the jsonOutput section.
Option 1: Delete only the deployment instance & keep the resource group and workspace
- Go to https://ml.azure.com.
- In the left navigation pane, go to Assets > Endpoints, select the deployment instance you created for this tutorial, & then click
Delete
. - Click
Delete
.
Option 2: Delete all resources used in this tutorial
- Go to https://portal.azure.com.
- Under Azure services, select Resource groups.
- In Resource groups, click resource group you created for this tutorial.
- Select
Delete resource group
. - Enter the name of the resource group in the Enter resource group name to confirm deletion field & click
Delete
.