Expose all CPUs to TorchServe #100

vdantu · 2021-05-19T22:54:13Z

Handle the error described in sagemaker docs here .

Issue #, if available:

Description of changes:
This change allows the PyTorch DLC containers to expose all the available CPUs to TorchServe.

Tested by running

docker run --name pytorch -p 8080:8080 -p 8081:8081 -v /home/ubuntu/models:/models -itd --cpu-shares=512 pytorch18:latest serve

docker logs pytorch-us-west  | less

['torchserve', '--start', '--model-store', '/.sagemaker/ts/models', '--ts-config', '/etc/sagemaker-ts.properties', '--log-config', '/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/etc/log4j.properties', '--models', 'model.mar']
Removing orphan pid file.
Warning: TorchServe is using non-default JVM parameters: -XX:-UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
2021-05-19 22:49:40,925 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.3.1
TS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 16
Max heap size: 3110 M
Python executable: /opt/conda/bin/python3.6
Config file: /etc/sagemaker-ts.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Metrics address: http://127.0.0.1:8082
Model Store: /.sagemaker/ts/models
Initial Models: model.mar
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 16
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Handle the error described [in sagemaker docs here](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model-troubleshoot.html)

sagemaker-bot · 2021-05-19T22:57:10Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 35d0963
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-06-21T18:07:52Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-container-pr
Commit ID: ee87eb0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-06-21T18:13:31Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ee87eb0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Expose all CPUs to TorchServe

35d0963

Handle the error described [in sagemaker docs here](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model-troubleshoot.html)

zoran-hristov approved these changes Nov 19, 2021

View reviewed changes

Merge branch 'master' into patch-1

ee87eb0

davidthomas426 mentioned this pull request Jan 22, 2023

add vmargs=-XX:-UseContainerSupport in config #136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose all CPUs to TorchServe #100

Expose all CPUs to TorchServe #100

Uh oh!

vdantu commented May 19, 2021

Uh oh!

sagemaker-bot commented May 19, 2021

Uh oh!

sagemaker-bot commented Jun 21, 2022

Uh oh!

sagemaker-bot commented Jun 21, 2022

Uh oh!

Uh oh!

Expose all CPUs to TorchServe #100

Are you sure you want to change the base?

Expose all CPUs to TorchServe #100

Uh oh!

Conversation

vdantu commented May 19, 2021

Uh oh!

sagemaker-bot commented May 19, 2021

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Jun 21, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Jun 21, 2022

AWS CodeBuild CI Report

Uh oh!

Uh oh!