GitHub - cjmcgraw/mlmodel-manager: MLModel Manager - a webservice to simplify deploying machine learning models

What is model.manager?

model.mananger is a tool to deploy ATG machine learning models. It consists of 3 python webservices developed in house(master, config_manager, remote_model_puller and one external dependency(tfserving) running as docker container orchestraed in docker swarm cluster.

Our machine training models are currently trained in Google cloud service periodically and stored in Google storage. This model.manager is developed to get the latest models into ATG network automatically for production usage.

System Diagram

Structure of model.manager

tfserving

It is Tensorflow serving container which serves and represents our ML models. It is called by Serch Elasticsearch via gRPC connections. It reads a model configuration file from local file systems and load the corresponding model assest stored on local. This container should be deployed globally along with config_manager and remote_model_puller, and takes most of the node resources available on each node.

master

Responsible of administrative operations to occur on the cluster. The master container keeps a list of registered config_manager and remote_model_puller nodes. It allows users to perform operations across the entire cluster, and to read/change remote model state as well.

remote_manager

Periodically this webservice tracks what models are available on a defined gcs bucket, then pulls the most recent/current models down to local file system. This local file system is shared with config_manager It is deployed globably in the cluster along with config_manager and tfserving. It is s only accessible within the cluster via port of 8001 to communicate with master. The master container will call it to initiate admin calls.

config_manager

Periodically this webservice tracks what models are available on local and tracks what tfserving knows about, and keeps tfserving up to date with available models, also removes models that are no longer valid to clean up disk space. It is only availale by port 8002 within the cluster to communicate with master. The master container will call it to initiate admin calls.

Local Development

Run docker-compose up --build -d and have all containers running and healthy. You can check container status by running docker ps.
Run docker-compose run fill_remote_directory_with_models to create some fake models on remote GCS bucket for testing purpose.
Run docker-compose run tests to run integeration tests.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.wiki/.assets		.wiki/.assets
config_manager		config_manager
master		master
model_manager_lib		model_manager_lib
remote_model_puller		remote_model_puller
tests		tests
tfserving		tfserving
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
RUNBOOK.md		RUNBOOK.md
check-current-host-containers-healthy.sh		check-current-host-containers-healthy.sh
check-current-host-docker-state.sh		check-current-host-docker-state.sh
check-local-config.sh		check-local-config.sh
check-local-model-state.sh		check-local-model-state.sh
docker-compose.yml		docker-compose.yml
doggy-hooks.sh		doggy-hooks.sh
swarm-deploy.yml		swarm-deploy.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is model.manager?

System Diagram

Structure of model.manager

tfserving

master

remote_manager

config_manager

Local Development

About

Releases 3

Packages

Contributors 2

Languages

License

cjmcgraw/mlmodel-manager

Folders and files

Latest commit

History

Repository files navigation

What is model.manager?

System Diagram

Structure of model.manager

tfserving

master

remote_manager

config_manager

Local Development

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages