Added tutorial for using torchserve on aws sagemaker #2671

Viditagarwal7479 · 2023-11-10T18:52:28Z

Added tutorial on how to use torchserve on AWS Sagemaker.
The tutorial focuses on features of AWS Sagemaker and other AWS services which we can use to serve PyTorch model rather than emphasizing various features provided by torchserve. Although I have provided external links to tutorials wherever we can do more things using more torchserve features like handler customization in torch-model-archiver.

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @msaroufim @agunapal @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

pytorch-bot · 2023-11-10T18:52:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2671

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Viditagarwal7479 · 2023-11-10T18:53:41Z

@pytorchbot label "docathon-h2-2023"

lxning · 2023-11-10T20:17:24Z

intermediate_source/torchserve_aws_sagemaker_tutorial.rst

+| TorchServe is easy to use. It comes with a convenient CLI to deploy locally and is easy to package into a container and scale out with Amazon SageMaker or Amazon EKS. With default handlers for common problems such as image classification, object detection, image segmentation, and text classification, you can deploy with just a few lines of code—no more writing lengthy service handlers for initialization, preprocessing, and post-processing. TorchServe is open-source, which means it's fully open and extensible to fit your deployment needs.
+
+To get started on how to use TorchServe you can refer to this tutorial: `TorchServe QuickStart  <https://pytorch.org/serve/getting_started.html>`_
+


Please add the user manual links of using TorchServe on SM:

https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-torchserve.html

https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html

SageMaker has 2 different endpoints. Their deployment is slightly different. Please include this information.

single model

multi-model

Please add the user manual links of using TorchServe on SM:

* https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-models-frameworks-torchserve.html * https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html

Hello lxning,
Thanks for the review, I have already added these links in the reference at the end of the tutorial.

lxning · 2023-11-10T20:23:30Z

intermediate_source/torchserve_aws_sagemaker_tutorial.rst

+    #. Create a compressed tar.gz file out of the densenet161.mar file, because Amazon SageMaker expects models to be in a tar.gz file.
+
+        .. code:: shell
+
+            tar cvfz $model_file_name.tar.gz densenet161.mar


You can skip this extra step by using torchserve-model-archiver --archive-format tgz.

For large model, we recommend using torchserve-model-archiver --archive-format no-archive by leveraging SM uncompressed model artifact feature (current ly only available on SageMaker single model endpoint) (see details: com/sagemaker/latest/dg/large-model-inference-tutorials-torchserve.html)

lxning · 2023-11-10T20:27:17Z

intermediate_source/torchserve_aws_sagemaker_tutorial.rst

+
+            aws s3 cp $model_file_name.tar.gz s3://{bucket_name}/{prefix}/model
+
+Creating an Amazon ECR registry


This step is needed ONLY if you are going to BYOD or BYOC, otherwise it is not needed.

lxning · 2023-11-10T20:35:15Z

intermediate_source/torchserve_aws_sagemaker_tutorial.rst

+Metrics
+~~~~~~~~
+
+TorchServe supports both system level and model level metrics. You can enable metrics in either log format mode or Prometheus mode through the environment variable TS_METRICS_MODE. You can use the TorchServe central metrics config file metrics.yaml to specify the types of metrics to be tracked, such as request counts, latency, memory usage, GPU utilization, and more. By referring to this file, you can gain insights into the performance and health of the deployed models and effectively monitor the TorchServe server's behavior in real-time. For more detailed information, see the `TorchServe metrics documentation <https://github.com/pytorch/serve/blob/master/docs/metrics.md#torchserve-metrics>`_. You can access TorchServe metrics logs that are similar to the StatsD format through the Amazon CloudWatch log filter. The following is an example of a TorchServe metrics log:


SageMaker does not support prometheus format. User can only use regex search TorchServe metrics log.

prometheus format not yet supported to view torchserve metric logs boto/boto3#3437

…ure for uncompressed models

… aws SM

Viditagarwal7479 · 2023-11-13T16:44:44Z

Hello @svekars @sekyondaMeta, just a gentle reminder to review my PR kindly. Should I remove the docathon label from this PR to get it merged?

github-actions · 2024-09-27T00:15:52Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Viditagarwal7479 added 2 commits November 11, 2023 00:15

added tutorial for torchserve on aws

0b88b33

Merge branch 'pytorch:main' into torchserve_tutorial_aws

eed4a97