Skip to content

Commit

Permalink
Kubeflow update dashboard and readme (#18600)
Browse files Browse the repository at this point in the history
* update readme

* update readme

* [Release] Bumped kubeflow version to 1.0.0

* revert release
  • Loading branch information
HadhemiDD authored Sep 17, 2024
1 parent 426d18b commit fcfc8f1
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 11 deletions.
46 changes: 46 additions & 0 deletions kubeflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,52 @@ For the Agent to start collecting metrics, the `kubeflow` pods need to be annota

Kubeflow has metrics endpoints that can be accessed on port `9090`.

To enable metrics exposure in kubeflow through prometheus, you might need to enable the prometheus service monitoring for the component in question.

You can use Kube-Prometheus-Stack or a custom Prometheus installation.

##### How to install Kube-Prometheus-Stack:
1. Add Helm Repository:
```
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
```

2. Install the Chart:
```
helm install prometheus-stack prometheus-community/kube-prometheus-stack
```

3. Expose Prometheus service externally:
```
kubectl port-forward prometheus-stack 9090:9090
```
##### Set Up ServiceMonitors for Kubeflow Components:

You need to configure ServiceMonitors for Kubeflow components to expose their Prometheus metrics.
If your Kubeflow component exposes Prometheus metrics by default. You'll just need to configure Prometheus to scrape these metrics.

The ServiceMonitor would look like this:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: <kubeflow-component>-monitor
labels:
release: prometheus-stack
spec:
selector:
matchLabels:
app: <kubeflow-component-name>
endpoints:
- port: http
path: /metrics
```
Where `<kubeflow-component>` is to be replaced by `pipelines`, `kserve` or `katib` and `<kubeflow-component-name>` is to be replaced by `ml-pipeline`, `kserve` or `katib`.


**Note**: The listed metrics can only be collected if they are available(depending on the version). Some metrics are generated only when certain actions are performed.

The only parameter required for configuring the `kubeflow` check is `openmetrics_endpoint`. This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is `9090`. In containerized environments, `%%host%%` should be used for [host autodetection][3].
Expand Down
43 changes: 32 additions & 11 deletions kubeflow/assets/dashboards/overview.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"id": 4610707819074916,
"layout": {
"height": 3,
"height": 4,
"width": 3,
"x": 0,
"y": 3
Expand All @@ -67,7 +67,7 @@
},
"id": 8366490141273904,
"layout": {
"height": 3,
"height": 4,
"width": 2,
"x": 3,
"y": 3
Expand All @@ -77,7 +77,7 @@
},
"id": 3375620455700908,
"layout": {
"height": 7,
"height": 8,
"width": 5,
"x": 0,
"y": 0
Expand All @@ -93,8 +93,8 @@
"widgets": [
{
"definition": {
"background_color": "blue",
"content": "The service checks show the Kubeflow OpenMetrics/Prometheus endpoint status.\n\nThe monitor summary shows you any active alerts for the most crucial Metrics. ",
"background_color": "pink",
"content": "If many widgets are empty, you are using a version of Kubeflow that does not expose certain metrics. Refer to the metadata.csv file for metrics list. \n\nReach out to support to indicate version incompatibilities.",
"font_size": "14",
"has_padding": true,
"show_tick": true,
Expand All @@ -112,6 +112,27 @@
"y": 0
}
},
{
"definition": {
"background_color": "blue",
"content": "The service checks show the Kubeflow OpenMetrics/Prometheus endpoint status.\n\nThe monitor summary shows you any active alerts for the most crucial Metrics. ",
"font_size": "14",
"has_padding": true,
"show_tick": true,
"text_align": "left",
"tick_edge": "left",
"tick_pos": "50%",
"type": "note",
"vertical_align": "center"
},
"id": 6145599891700518,
"layout": {
"height": 1,
"width": 7,
"x": 0,
"y": 2
}
},
{
"definition": {
"color_preference": "text",
Expand All @@ -134,14 +155,14 @@
"height": 4,
"width": 7,
"x": 0,
"y": 2
"y": 3
}
}
]
},
"id": 3510698085005998,
"layout": {
"height": 7,
"height": 8,
"width": 7,
"x": 5,
"y": 0
Expand Down Expand Up @@ -273,7 +294,7 @@
"height": 5,
"width": 12,
"x": 0,
"y": 7
"y": 8
}
},
{
Expand Down Expand Up @@ -988,7 +1009,7 @@
"is_column_break": true,
"width": 12,
"x": 0,
"y": 12
"y": 13
}
},
{
Expand Down Expand Up @@ -1224,7 +1245,7 @@
"height": 8,
"width": 12,
"x": 0,
"y": 28
"y": 29
}
},
{
Expand Down Expand Up @@ -1424,7 +1445,7 @@
"height": 7,
"width": 12,
"x": 0,
"y": 36
"y": 37
}
}
]
Expand Down
File renamed without changes.

0 comments on commit fcfc8f1

Please sign in to comment.