Name		Name	Last commit message	Last commit date
parent directory ..
bin		bin
lib		lib
Makefile		Makefile
README.md		README.md
cdk.json		cdk.json
cleanup.sh		cleanup.sh
config.yaml		config.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

README.md

Training

This folder contains the cdk app which deploys the training part of the MLOps pipeline. This part of the pipeline is responsible for model building and orchestrated using Amazon SageMaker Pipelines. It consists of following steps

Processing Step which loads the latest images and bounding boxes from SageMaker Feature Store and transforms the data into recordio format required for Model training.
Training step for model training using Apache MXNet and Yolov3
Processing Step to evaluate model against a test set
Register Model Step to register model in SageMaker Model Registry if performance is above threshold

This is how the final SageMaker Pipelines workflow looks like:

CI/CD pipeline

In order for the SageMaker Pipelines workflow to run succesfully a number of assets like lambda functions and iam roles need to be deployed beforehand. This deployment is automated using a CDK app. The pipeline is triggered on a schedule as well as on git commit using two pipelines in AWS CodePipeline. This is how the architecture of the CI/CD infrastructure looks like deployed by this CDK app:

Note the CodePipeline deployed here is a self-mutating pipeline which updates itself during the run. It is deployed using CDK pipelines. Checkout this blog if you want an intro to CDK pipelines

repository layout

This is the layout of the training CDK app:

├── bin
│   └── app.ts                                 - cdk app definition
├── cdk.json                                   - cdk configuration
├── cleanup.sh                                 - cleanup script to delete all relevant resources
├── config.yaml                                - pipeline config, changes to the training workflow are done here
├── lib
│   ├── assets
│   │   ├── docker                             - assets to build custom docker files used in training and processing jobs
│   │   │   ├── Dockerfile                     - Docker file for custom container used in training and processing
│   │   │   ├── evaluate.py                    - script to evaluate model and calulate MaP metric for a given test set
│   │   │   ├── im2rec.py                      - utility script to convert lst files to recordIO format
│   │   │   ├── preprocess.py                  - preprocessing script to load images and bounding boxes from feature store, split dataset and convert to recordio for training
│   │   │   └── train_yolo.py                  - script to train yolov3 model to detect scratches in an image
│   │   └── sagemaker_pipeline                 - assets related to SageMaker pipeline definition
│   │       ├── pipeline_helper.py             - helper script to work with sagemaker pipelines API
│   │       ├── requirements.txt               - requirements definition for code build job which starts sagemaker pipeline
│   │       └── construct_and_run_pipeline.py  - script which defines pipeline and kicks off execution
│   ├── constructs
│   │   └── training-pipeline-assets.ts        - assets required to execute model building pipeline
│   └── stacks
│       ├── training-pipeline.ts               - stack which defines asset build code pipeline
│       └── training-sagemaker-pipeline.ts     - stack which defines pipeline responsible for sagemaker pipelines execution
├── package.json                               - node dependency definition
└── tsconfig.json                              - typescript config

Changing configuration

You can change the behaviour of the training pipeline by changing the values in in config.yaml. Checkout the file to learn more about properties you can change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training

training

README.md

Training

CI/CD pipeline

repository layout

Changing configuration

Files

training

Directory actions

More options

Directory actions

More options

Latest commit

History

training

Folders and files

parent directory

README.md

Training

CI/CD pipeline

repository layout

Changing configuration