Serving a model using custom container, instance run of disk #112

HamidShojanazeri · 2021-11-19T04:54:18Z

Describe the bug
Using a custom container to serve a Pytorch model, defined as below, it throw "No space left on device"

container = {"Image": image, "ModelDataUrl": model_artifact}

create_model_response = sm.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.g4dn.8xlarge",
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

Docker image size is 17 GB and Torchserve mar file is 8 GB. I was wondering if there is any way to increase the storage for the instances that are serving the model. Going through the doc for endpoint configuration seems there is no setting for specifics about instances.

-- Cloud watch log

Expected behavior

Having knobs to set the storage for the serving instances.

The text was updated successfully, but these errors were encountered:

HamidShojanazeri · 2021-11-19T04:55:23Z

cc @nskool

HamidShojanazeri · 2021-11-19T17:38:28Z

I believe exposing few knobs for some of the settings including storage for the host instances would be helpful. Thanks @lxning for the offline discussions, it would be great if could add this as a feature to Sagemaker SDK.

lxning · 2021-11-19T18:20:01Z

According to SM hosting team, currently SM SDK does not support storage size configuration. The only available solution is to change instance type. Pls refer host-instance-storage-volumes-table

HamidShojanazeri · 2021-11-19T20:50:27Z

@lxning this is a limiting factor, as it is easy to hit the limit mostly on gpu instance 30GB, some of Nvidia dockers similar in this case can go up to 21 GB and heavier workloads that chain multiple models can end up having a large model_artifact size that goes beyond the limit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serving a model using custom container, instance run of disk #112

Serving a model using custom container, instance run of disk #112

HamidShojanazeri commented Nov 19, 2021

HamidShojanazeri commented Nov 19, 2021

HamidShojanazeri commented Nov 19, 2021 •

edited

Loading

lxning commented Nov 19, 2021

HamidShojanazeri commented Nov 19, 2021 •

edited

Loading

Serving a model using custom container, instance run of disk #112

Serving a model using custom container, instance run of disk #112

Comments

HamidShojanazeri commented Nov 19, 2021

Expected behavior

HamidShojanazeri commented Nov 19, 2021

HamidShojanazeri commented Nov 19, 2021 • edited Loading

lxning commented Nov 19, 2021

HamidShojanazeri commented Nov 19, 2021 • edited Loading

HamidShojanazeri commented Nov 19, 2021 •

edited

Loading

HamidShojanazeri commented Nov 19, 2021 •

edited

Loading