Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.81 KB

Flows.md

File metadata and controls

29 lines (22 loc) · 1.81 KB

Flow Diagrams

Training job

Also compare with the similar flow diagram for IDE integration with SageMaker Studio.

The following diagram describes the flow of events for the training job use case:

Screenshot

  1. Data Scientists (DS) starts a training job, with SSH Helper Lib as a dependency.
  2. The SageMaker Python SDK starts the job, sending the train.py script, and SSH Helper lib as code dependencies.
  3. Amazon SageMaker control plain starts the host and the container.
  4. The user script (train.py) starts running, starting SSH helper, which fetches AWS SSM agent and other packages from the Internet, and installs them.
  5. SSH Helper starts the SSM agent
  6. Through SSM, SSH Helper registers the container as an SSM managed instance, and tags it with the DS AWS user/role name.
  7. SSH Helper printouts the managed instance ID. The log is streamed to CloudWatch Logs.
  8. The DS manually/automatically tails the training job logs for the managed instance ID.

9-12. Optionally: The DS starts a process to copy over his SSH Public key to the container, needed to set up port forwarding via SSH (e.g., for remote debugging)

  1. The DS uses the AWS SSM CLI to start a shell providing the managed instance ID as a parameter. Optionally, user starts SSM with SSH port forwarding with the helper command sm-local-ssh-training connnect <<training_job_name>>.
  2. AWS SSM IAM rules verify that the user is allowed to take this action and that the instance is tagged with the DS's AWS user/role name. Once verified, a session is created with the SSM Agent running in the container.
  3. The SSM agent generates a shell by spinning off a new bash shell process. Optionally, SSH port forwarding starts over SSM connection to let user connect to remote processes over TCP in both directions.