- Model training
- Generate predictions
- Deploy the model to AWS using SageMaker
|-- data
|-- submission.csv
|-- sagemaker
|-- data
|-- models
|-- AWS - SageMaker.ipynb
|-- featurizer_local.py
|-- featurizer_remote.py
|-- src
|-- models
|-- baseline_model.py: Class for training the binary classification model
|-- transformers.py: ColumnSelector transformer for preprocessing
|-- api_request_exmpl.json
|-- pickled_model.pickle
|-- Model.ipunb: model evaluation notebook
- python3.8
- aws-cli
- A verbose explanation of the model training and validation could be found here.
- CSV file with resulting predictions could be found here.
- Read and explore dataset (excluded from the current repo due to privacy reasons)
- Train baseline defualt prediction model
- Evaluate and сompare different models (Logistic Regression and Tree-based ensemble models) with a set of metrics
- Estimate feature inportance
- Generate predictions (
/data/submission.csv
) - Dump the best estimator (
/pickled_model.pickle
)
- Define Sagemaker session and role
- Preprocessing data and train the model
- Create SageMaker Scikit Estimator
- Batch transform training data
- Fit a Tree-based Model with the preprocessed data
- Serial Inference Pipeline with Scikit preprocessor and classifier
- Deploy model
- Make a request to the pipeline endpoint
Besides AWS SageMaker I tested some other AWS deployment options including AWS Serverless Application Model (SAM).
It allows deploing ML models with Serverless API (AWS Lambda).
I managed to expose API endpoint with helloworld
application behind in order to try AWS Lambda.
- ECR: Container & Registy
- AWS Lambda: Serving API
- SAM: Serverless Framework
GET request:
https://x3jp27x3t3.execute-api.eu-central-1.amazonaws.com/test/hello
Expected response:
{
"message": "hello world"
}