Skip to content

Latest commit

 

History

History

Hosting OpenAI Whisper Model on Amazon SageMaker Asynchronous Inference Endpoint using SageMaker PyTorch DLC

This is a CDK Python project to host the OpenAI Whisper model on Amazon SageMaker Asynchronous Inference Endpoint.

OpenAI Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. Sagemaker JumpStart is the machine learning (ML) hub of SageMaker that provides access to foundation models in addition to built-in algorithms and end-to-end solution templates to help you quickly get started with ML.

The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

$ python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

$ source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

% .venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

(.venv) $ pip install -r requirements.txt

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

Prerequisites

In order to host the model on Amazon SageMaker, the first step is to save the model artifacts. These artifacts refer to the essential components of a machine learning model needed for various applications, including deployment and retraining. They can include model parameters, configuration files, pre-processing components, as well as metadata, such as version details, authorship, and any notes related to its performance.

  1. Install required packages

    (.venv) $ cat requirements-dev.txt
    accelerate==0.30.1
    datasets==2.16.1
    librosa==0.10.2.post1
    openai-whisper>=20230918
    soundfile==0.12.1
    torch==2.1.0
    torchaudio==2.1.0
    transformers==4.38.0
    
    (.venv) $ pip install -r requirements-dev.txt
    
  2. Save model artifacts

    The following instructions work well on either Ubuntu or SageMaker Studio.

    (1) Create a directory for model artifacts.

    (.venv) mkdir -p model
    

    (2) Run the following python code to download OpenAI Whisper model artifacts from Hugging Face model hub.

    from transformers import (
        AutoModelForSpeechSeq2Seq,
        WhisperProcessor,
        WhisperTokenizer,
    )
    
    # Define a directory where you want to save the model
    save_directory = "./model"
    
    model_id = "openai/whisper-medium"
    model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
    model.save_pretrained(save_directory)
    
    tokenizer = WhisperTokenizer.from_pretrained(model_id)
    tokenizer.save_pretrained(save_directory)
    
    processor = WhisperProcessor.from_pretrained(model_id)
    processor.save_pretrained(save_directory)

    (3) Create model.tar.gz with model artifacts including your custom inference scripts.

    (.venv) tar cvf model.tar --exclude=".ipynb_checkpoints" -C model/ .
    (.venv) tar rvf model.tar --exclude=".ipynb_checkpoints" -C src/ code
    (.venv) gzip model.tar
    

    ℹ️ For more information about the directory structure of model.tar.gz, see Model Directory Structure for Deploying Pre-trained PyTorch Models

    (4) Upload model.tar.gz file into s3

    (.venv) export MODEL_URI="s3://{bucket_name}/{key_prefix}/model.tar.gz"
    (.venv) aws s3 cp model.tar.gz ${MODEL_URI}
    

    ⚠️ Replace bucket_name and key_prefi with yours.

  3. Set up cdk.context.json

    Then, you should set approperly the cdk context configuration file, cdk.context.json.

    For example,

    {
      "model_id": "openai/whisper-medium",
      "model_data_source": {
        "s3_bucket_name": "sagemaker-us-east-1-123456789012",
        "s3_object_key_name": "openai-whisper/model.tar.gz"
      }
    }
    

Deploy

At this point you can now synthesize the CloudFormation template for this code.

(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
(.venv) $ export CDK_DEFAULT_REGION=$(aws configure get region)
(.venv) $ cdk synth --all

Use cdk deploy command to create the stack shown above.

(.venv) $ cdk deploy --require-approval never --all

Clean Up

Delete the CloudFormation stack by running the below command.

(.venv) $ cdk destroy --force --all

Useful commands

  • cdk ls list all stacks in the app
  • cdk synth emits the synthesized CloudFormation template
  • cdk deploy deploy this stack to your default AWS account/region
  • cdk diff compare deployed stack with current state
  • cdk docs open CDK documentation

Enjoy!

References