Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few minor fixes for benchmarks #784

Merged
merged 1 commit into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 26 additions & 14 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ via these terraform scripts on a Standard cluster that you've created yourself.
This tutorial assumes you have access to use google storage APIs via Application Default Credentials (ADC).
To login, you can run the following:

```
```sh
gcloud auth application-default login
```

### Terraform

Install Terraform by following the documentation at https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli.
Install Terraform by following the documentation at <https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli>.
This requires a minimum Terraform version of 1.7.4

### python
Expand All @@ -34,10 +34,11 @@ You may need to run pip install -r benchmark/dataset/ShareGPT_v3_unflitered_clea

This section goes over an end to end example to deploy and benchmark the Falcon 7b model using [TGI](https://huggingface.co/docs/text-generation-inference/en/index) on a Standard GKE Cluster with GPUs.

Each step below has more details in their respective directoy README.md. It is recommended
Each step below has more details in their respective directory README.md. It is recommended
that you read through the available options at least once when testing your own models.

At a high level, running an inference benchmark in GKE involves these five steps:

1. Create the cluster
2. Configure the cluster
3. Deploy the inference server
Expand All @@ -51,7 +52,8 @@ Set up the infrastructure by creating a GKE cluster with appropriate accelerator
configuration.

To create a GPU cluster, in the ai-on-gke/benchmarks folder run:
```

```sh
# Stage 1 creates the cluster.
cd infra/stage-1

Expand All @@ -68,8 +70,10 @@ terraform plan
# Run apply if the changes look good by confirming the prompt.
terraform apply
```

To verify that the cluster has been set up correctly, run
```

```sh
# Get credentials using fleet membership
gcloud container fleet memberships get-credentials <cluster-name> --project <project-id>

Expand All @@ -80,7 +84,8 @@ kubectl get nodes
### 2. Configure the cluster

To configure the cluster to run inference workloads we need to set up workload identity, GCS Fuse and DCGM for GPU metrics. In the ai-on-gke/benchmarks folder run:
```

```sh
# Stage 2 configures the cluster for running inference workloads.
cd infra/stage-2

Expand All @@ -104,7 +109,8 @@ terraform apply
### 3. Deploy the inference server

To deploy TGI with a sample model, in the ai-on-gke/benchmarks folder run:
```

```sh
# text-generation-inference is the inference workload we'll deploy.
cd inference-server/text-generation-inference

Expand All @@ -125,17 +131,20 @@ terraform apply
```

It may take a minute or two for the inference server to be ready to serve. To verify that the model is running, you can run:
```

```sh
kubectl get deployment -n benchmark
```

This will show the status of the TGI server running.

### 4. Deploy the benchmark

#### Prepare the benchmark dataset

To prepare the dataset for the Locust inference benchmark, in the ai-on-gke/benchmarks folder run:
```

```sh
# This folder contains a script that prepares the prompts for ShareGPT_v3_unflitered_cleaned_split dataset
# that works out of the box with the locust benchmarking expected format.
cd benchmark/dataset/ShareGPT_v3_unflitered_cleaned_split
Expand All @@ -147,7 +156,8 @@ python3 upload_sharegpt.py --gcs_path="gs://${PROJECT_ID}-ai-gke-benchmark-fuse/
#### Deploy the benchmarking tool

To deploy the Locust inference benchmark with the above model, in the ai-on-gke/benchmarks folder run:
```

```sh
# This folder contains the benchmark tool that generates requests for your workload
cd benchmark/tools/locust-load-inference

Expand All @@ -173,7 +183,7 @@ To further interact with the Locust inference benchmark, view the README.md file

An end to end Locust benchmark that runs for a given amount of time can be triggered via a curl command to the Locust Runner service:

```
```sh
# get the locust runner endpoint
kubectl get service -n benchmark locust-runner-api

Expand All @@ -182,7 +192,8 @@ curl -XGET http://$RUNNER_ENDPOINT_IP:8000/run
```

A results file will appear in the GCS bucket specified as output_bucket in input variables once the benchmark is completed. Metrics and Locust statistics are visible under the [Cloud Monitoring metrics explorer](http://pantheon.corp.google.com/monitoring/metrics-explorer). In the ai-on-gke/benchmarks/benchmark/tools/locust-load-inference, run the following command to create a sample custom dashboard for the above related example:
```

```sh
# apply the sample dashboard to easily view and explore metrics
gcloud monitoring dashboards create --config-from-file ./sample-dashboards/tgi-dashboard.yaml
```
Expand All @@ -192,9 +203,10 @@ View the results in the [Cloud Monitoring Dashboards](https://pantheon.corp.goog
For more ways to interact with the locust benchmarking tooling, see the instructions in the [locust-load-inference README.md here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/benchmarks/benchmark/tools/locust-load-inference/README.md#step-9-start-an-end-to-end-benchmark).

### 6. Clean Up

To clean up the above setup, in the ai-on-gke/benchmarks folder run:

```
```sh
# Run destroy on locust load generator
cd benchmark/tools/locust-load-inference
terraform destroy
Expand All @@ -215,4 +227,4 @@ terraform destroy
# Run destroy on infra/stage-1 resources
cd ../stage-1
terraform destroy
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def main(gcs_path: str, overwrite: bool):
parser.add_argument('--overwrite', default=False,
action=argparse.BooleanOptionalAction)
args = parser.parse_args()
gcs_uri_pattern = "^gs:\/\/[a-z0-9.\-_]{3,63}\/(.+\/)*(.+)$"
gcs_uri_pattern = "^gs:\\/\\/[a-z0-9.\\-_]{3,63}\\/(.+\\/)*(.+)$"
if not re.match(gcs_uri_pattern, args.gcs_path):
raise ValueError(
f"Invalid GCS path: {args.gcs_path}, expecting format \"gs://$BUCKET/$FILENAME\"")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ project_id = "$PROJECT_ID"
namespace = "benchmark"
ksa = "benchmark-ksa"

k8s_hf_secret = "hf-token"

# This is needed for loading the gated models on HF
# k8s_hf_secret = "hf-token"

# Locust service configuration
artifact_registry = "us-central1-docker.pkg.dev/$PROJECT_ID/ai-benchmark"
Expand Down