In this repository, we present a deployement-ready AWS stack which uses AWS Step Functions to orchestrate AutoML workflows using AutoGluon on Amazon SageMaker.
A complete description can be found in the corresponding blog post.
| Main State Machine | Training State Machine | Deployment State Machine |
|---|---|---|
![]() |
![]() |
![]() |
- Node.js
16.13.1 - Python
3.7.10
-
Clone this repository to your cloud environment of choice (Cloud9, EC2 instance, local aws environemnt, ...)
-
Create IAM role needed to deploy the stack (skip to 3. if you already have a role with sufficient permissions and trust relationship).
-
Using AWS CLI
- Configure AWS CLI profile that you would like to use, if not configured yet with
aws configureand follow the instructions - Create a new IAM role which can be used by Cloud Formation with
aws iam create-role --role-name {YOUR_ROLE_NAME} --assume-role-policy-document file://trust_policy.json - Attach permissions policy to the new role
aws iam put-role-policy --role-name {YOUR_ROLE_NAME} --policy-name {YOUR_POLICY_NAME} --policy-document file://permissions_policy.json
- Configure AWS CLI profile that you would like to use, if not configured yet with
-
Alternatevily, you can create the role using AWS IAM Management Console. Once created, make sure to update Trust Relationship with
trust_policy.jsonand attach a customer Permissions Policy based onpermissions_policy.json
-
Create a new python virtual environment
python3 -m venv .venv -
Activate the environment
source .venv/bin/activate -
Install AWS CDK
npm install -g [email protected] -
Install requirements
pip install -r requirements.txt -
Bootstrap AWS CDK for your aws account
cdk bootstrap aws://{AWS_ACCOUNT_ID}/{REGION}. If your account has been bootstrapped already with[email protected], you may need to manually deleteCDKToolkitstack from AWS CloudFormation console to avoid compatibility issues with[email protected]. Once de-bootstrapped, proceed by re-bootstrapping. -
Deploy the stack with
cdk deploy -r {NEW_ROLE_ARN}
Once the stack is deployed, you can familiarize with the resources using the tutorial notebooks/AutoML Walkthrough.ipynb.
Action flows defined using AWS Step Functions are called State Machine.
Each machine has parameters that can be defined at runtime (i.e. execution-specific) which are specified through an input json object. Some exemples of input parameters are presented in notebooks/input/. Despite being meant to be used during the notebook tutorial, you can also copy/paste them directly into the AWS Console.
Request Syntax
{
"Parameters": {
"Flow": {
"Train": true|false,
"Evaluate": true|false,
"Deploy": true|false
},
"PretrainedModel":{
"Name": "string"
},
"Train": {
"TrainDataPath": "string",
"TestDataPath": "string",
"TrainingOutput": "string",
"InstanceCount": int,
"InstanceType": "string",
"FitArgs": "string"",
"InitArgs": "string"
},
"Evaluation": {
"Threshold": flaot,
"Metric": "string"
},
"Deploy": {
"InstanceCount": int,
"InstanceType": "string",
"Mode": "endpoint"|"batch",
"BatchInputDataPath": "string",
"BatchOutputDataPath": "string"
}
}
}
Parameters
- Flow
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to
falseto deploy a pretrained model. - Evaluation (bool) - set to
trueif evaluation is required. If selected, a AWS Lambda will retreive model performances on test set and evaluate them agains user-defined threshold. If model performances are not satisfactory, deployment is skipped. - Deploy (bool) - (REQUIRED) indicates if model has to be deployed.
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to
- PretrainedModel
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If
Flow.Train = truethis field is ignored, otherwise it's required.
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If
- Train (REQUIRED if
Flow.Train = true)- TrainDataPath (string) - S3 URI where train
csvis stored. Header and target variable are required. AutoGluon will perform holdout split for validation automatically. - TestDataPath (string) - S3 URI where test
csvis stored. Header and target variable are required. Dataset is used to evaluate model performances on samples not seen during training. - TrainingOutput (string) - S3 URI where to store model artifacts at the end of training job.
- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g.
ml.m4.2xlarge). See full list here. - FitArgs (string) - double JSON-encoded dictionary containing parameters to be used during model
.fit(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. - InitArgs (string) - double JSON-encoded dictionary containing parameters to be used when model is initiated
TabularPredictor(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. Common parameters arelabel,problem_typeandeval_metric.
- TrainDataPath (string) - S3 URI where train
- Evaluation (REQUIRED if
Flow.Evaluate = true)- Threshold (float) - Metric threshold to consider model performance satisfactory. All metrics are maximized (e.g. losses are repesented as negative losses).
- Metric (string) - Metric name used for evaluation. Accepted metrics correspond to avaiable
eval_metricfrom AutoGluon.
- Deploy (REQUIRED if
Flow.Deploy = true)- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g.
ml.m4.2xlarge). See full list here. - Mode (string) - Model deployment mode. Supported modes are
batchfor SageMaker Batch Transform Job andendpointfor SageMaker Endpoint. - BatchInputDataPath (string) - (REQUIRED if
mode=batch) S3 URI of dataset against which predictions are generated. Data must be store incsvformat, without header and with same columns order of training dataset. - BatchOutputDataPath (string) - (REQUIRED if
mode=batch) S3 URI to where to store batch predictions.
app.pyentrypointstepfunctions_automl_workflow/lambdas/AWS Lambda source scriptsstepfunctions_automl_workflow/utils/utils functions used across for stack generationstepfunctions_automl_workflow/stack.pyCDK stack definitionnotebooks/Jupyter Notebooks to familiarise with the artifactsnotebooks/input/Input examples to be fed in State Machines
WARNING: While you'll still be able to keep SageMaker artifacts, the AWS Step Functions State Machines will be deleted along with their execution history.
Clean-up all resources with cdk destroy.
cdk lslist all stacks in the appcdk synthemits the synthesized CloudFormation templatecdk deploydeploy this stack to your default AWS account/regioncdk diffcompare deployed stack with current statecdk docsopen CDK documentation

