Skip to content

Distributed GPU learning and deploying NLP models on cloud with real time performance analysis

Notifications You must be signed in to change notification settings

schopra6/ML-Ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Experiment tracking

MLflow to track our experiments and store our models and the MLflow Tracking UI to view our experiments. We use DVC central location to store and track all of our experiments. Then we spin up MLflow server in Comet, etc.

export MODEL_REGISTRY=$(python -c "from  mlops import config; print(config.MODEL_REGISTRY)")
mlflow server -h 0.0.0.0 -p 8080 --backend-store-uri $MODEL_REGISTRY

Anyscale Services

Now we launch our serve our model to production.

ray_serve_config:
  import_path: deploy.services.serve_model:entrypoint
  runtime_env:
    working_dir: .
    upload_path: s3://mlops/$GITHUB_USERNAME/services  
    env_vars:
      GITHUB_USERNAME: $GITHUB_USERNAME  

Now we're ready to launch our service:

# Rollout service
anyscale service rollout -f deploy/services/serve_model.yaml

# Query
curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $SECRET_TOKEN" -d '{
  "title": "recommendation system",
  "description": ""
}' $SERVICE_ENDPOINT/predict/

# Rollback (to previous version of the Service)
anyscale service rollback -f $SERVICE_CONFIG --name $SERVICE_NAME

# Terminate
anyscale service terminate --name $SERVICE_NAME

Run the following command on your Anyscale Workspace terminal to generate the public URL to your MLflow server.

APP_PORT=8080
echo https://$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN

Continual learning

It becomes really easy to extend on this foundation to connect to scheduled runs (cron), drift detection and online evaluation etc. with continual learning.

Serving

# Start
ray start --head
# Set up
export EXPERIMENT_NAME="llm"
export RUN_ID=$(python madewithml/predict.py get-best-run-id --experiment-name $EXPERIMENT_NAME --metric val_loss --mode ASC)
python madewithml/serve.py --run_id $RUN_ID

Once the application is running, we can use it via cURL, Python, etc.:

# via Python
import json
import requests
json_data = json.dumps({ "data": data})
requests.post("http://127.0.0.1:8000/predict", data=json_data).json()
ray stop  # shutdown

About

Distributed GPU learning and deploying NLP models on cloud with real time performance analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published