Skip to content

Commit

Permalink
Decrease scrape interval of metrics from TGI and DCGM to 15s in bench…
Browse files Browse the repository at this point in the history
…mark (#772)

Decrease scrape interval of metrics from TGI and DCGM from 30s to 15s. To facilitate this, also decrease DCGM's interval to 5s to ensure fresh metrics.
  • Loading branch information
raywainman committed Aug 27, 2024
1 parent 2c2fec4 commit 6afaa13
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ spec:
app: nvidia-dcgm-exporter
endpoints:
- port: metrics
interval: 30s
interval: 15s
metricRelabeling:
# Change the DCGM metric name that we want to use in HPA to lowercase.
# This is because HPA doesn't work with uppercase external metrics:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ spec:
app: tgi
endpoints:
- port: 80
interval: 20s
interval: 15s
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ spec:
image: nvcr.io/nvidia/k8s/dcgm-exporter:3.1.8-3.1.5-ubuntu20.04
command: ["/bin/bash", "-c"]
args:
- hostname $NODE_NAME; dcgm-exporter --remote-hostengine-info $(NODE_IP) --collectors /etc/dcgm-exporter/counters.csv
- hostname $NODE_NAME; dcgm-exporter --remote-hostengine-info $(NODE_IP) --collect-interval 5000 --collectors /etc/dcgm-exporter/counters.csv
ports:
- name: metrics
containerPort: 9400
Expand Down

0 comments on commit 6afaa13

Please sign in to comment.