Skip to content

Commit

Permalink
Merge branch 'master' into fix-mounting-secrets
Browse files Browse the repository at this point in the history
Signed-off-by: Yee Hing Tong <[email protected]>
  • Loading branch information
wild-endeavor authored Apr 11, 2024
2 parents e4a2dea + ab95f7e commit b867fac
Show file tree
Hide file tree
Showing 42 changed files with 10,957 additions and 626 deletions.
8 changes: 4 additions & 4 deletions charts/flyte/README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion charts/flyte/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -532,7 +532,7 @@ flyte:
container: container
sidecar: sidecar
container_array: k8s-array
bigquery_query_job_task: agent-service
sensor: agent-service


# -- Kubernetes specific Flyte configuration
Expand Down
6 changes: 3 additions & 3 deletions deployment/sandbox/flyte_helm_generated.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -634,9 +634,9 @@ data:
tasks:
task-plugins:
default-for-task-types:
bigquery_query_job_task: agent-service
container: container
container_array: k8s-array
sensor: agent-service
sidecar: sidecar
enabled-plugins:
- container
Expand Down Expand Up @@ -7173,7 +7173,7 @@ spec:
template:
metadata:
annotations:
configChecksum: "4fd54a75274d84bbb9a90cc421f7aece12c202911984a436a9ec5fe52e942eb"
configChecksum: "673119651fe870e114e1b95cfbc27a6e5c2418215569ab9d0b9451385c32a51"
labels:
app.kubernetes.io/name: flytepropeller
app.kubernetes.io/instance: flyte
Expand Down Expand Up @@ -7247,7 +7247,7 @@ spec:
app.kubernetes.io/name: flyte-pod-webhook
app.kubernetes.io/version: v1.11.1-b1
annotations:
configChecksum: "4fd54a75274d84bbb9a90cc421f7aece12c202911984a436a9ec5fe52e942eb"
configChecksum: "673119651fe870e114e1b95cfbc27a6e5c2418215569ab9d0b9451385c32a51"
spec:
securityContext:
fsGroup: 65534
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/complete-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: b21Wb1RDSEJTTlZtdE9kdw==
haSharedSecret: RUtrQlNrYU9tQ21hT2NQdg==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1412,7 +1412,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: ec1d3f5f583d49c1391ba826ce8902ccab1176d54ec85fddf650af30e9a4288a
checksum/secret: f32ac7770d546bb970d5cdfb8280be16ee0a852fc6f9e23f8be29bc3cdcdc080
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/complete.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -796,7 +796,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: RGlPWTNTd2FSalUyeExhRw==
haSharedSecret: OVJPbVVSY1pnbGhYZ3VnMA==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1360,7 +1360,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: b9b7e397079b78ef59f2319194edbbd8304404b1cc83fddae42be22028f8f9de
checksum/secret: 78488724c19da8da25ffdbe6f64179a0ff50e13ad607d9ad62f6ed26f39f391b
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ metadata:
---
apiVersion: v1
data:
haSharedSecret: Q1lhdnJPSUpuNE1INFpldQ==
haSharedSecret: d2ZQSFBRbTdndktaVG1uYQ==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -934,7 +934,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: 395260d1bf8400be7613e9cc87617407754212fc015fb1f216f1ed4e8119ec59
checksum/secret: 82243571f71a234dddb18728159976b6d944626310a65e5f2c2e5a39b0497415
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
12 changes: 12 additions & 0 deletions docs/deployment/agents/chatgpt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ Specify agent configuration
agent-service:
supportedTaskTypes:
- chatgpt
# Configuring the timeout is optional.
# Tasks like using ChatGPT with a large model might require a longer time,
# so we have the option to adjust the timeout setting here.
defaultAgent:
timeouts:
ExecuteTaskSync: 10s
.. group-tab:: Flyte core

Expand Down Expand Up @@ -66,6 +72,12 @@ Specify agent configuration
agent-service:
supportedTaskTypes:
- chatgpt
# Configuring the timeout is optional.
# Tasks like using ChatGPT with a large model might require a longer time,
# so we have the option to adjust the timeout setting here.
defaultAgent:
timeouts:
ExecuteTaskSync: 10s
Add the OpenAI API token
-------------------------------
Expand Down
56 changes: 48 additions & 8 deletions docs/deployment/configuration/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,15 +85,55 @@ Use Published Dashboards to Monitor Flyte Deployment

Flyte Backend is written in Golang and exposes stats using Prometheus. The stats are labeled with workflow, task, project & domain, wherever appropriate.

The dashboards are divided into two types:
Both ``flyteadmin`` and ``flytepropeller`` are instrumented to expose metrics. To visualize these metrics, Flyte provides three Grafana dashboards, each with a different focus:

- **User-facing dashboards**: Dashboards that can be used to triage/investigate/observe performance and characteristics of workflows and tasks.
The user-facing dashboard is published under Grafana marketplace ID `13980 <https://grafana.com/grafana/dashboards/13980>`__.
The user-facing dashboard is published under ID `13980 <https://grafana.com/grafana/dashboards/13980>`__ in the Grafana marketplace.

- **System Dashboards**: Dashboards that are useful for the system maintainer to maintain their Flyte deployments. These are further divided into:
- DataPlane/FlytePropeller dashboards published @ `13979 <https://grafana.com/grafana/dashboards/13979>`__
- ControlPlane/Flyteadmin dashboards published @ `13981 <https://grafana.com/grafana/dashboards/13981>`__
- **System Dashboards**: Dashboards that are useful for the system maintainer to investigate the status and performance of their Flyte deployments. These are further divided into:
- `DataPlane/FlytePropeller <https://grafana.com/grafana/dashboards/13979>`__: execution engine status and performance.
- `ControlPlane/Flyteadmin <https://grafana.com/grafana/dashboards/13981>`__: API-level monitoring.

The corresponding JSON files for each dashboard are also located at ``deployment/stats/prometheus``.

.. note::

The dashboards are basic dashboards and do not include all the metrics exposed by Flyte.
Feel free to use the scripts provided `here <https://github.com/flyteorg/flyte/tree/master/stats>`__ to improve and -hopefully- contribute the improved dashboards.

How to use the dashboards
~~~~~~~~~~~~~~~~~~~~~~~~~

1. We recommend installing and configuring the Prometheus operator as described in `their docs <https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md>`__.
This is especially true if you plan to use the Service Monitors provided by the `flyte-core <https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/templates/propeller/service-monitor.yaml>`__ Helm chart.

2. Enable the Prometheus instance to use Service Monitors in the namespace where Flyte is running, configuring the following keys in the ``prometheus`` resource:

.. code-block:: yaml
spec:
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
.. note::

The above example configuration lets Prometheus use any ``ServiceMonitor`` in any namespace in the cluster. Adjust the configuration to reduce the scope if needed.

3. Once you have installed and configured the Prometheus operator, enable the Service Monitors in the Helm chart by configuring the following keys in your ``values`` file:

.. code-block:: yaml
flyteadmin:
serviceMonitor:
enabled: true
flytepropeller:
serviceMonitor:
enabled: true
.. note::

By default, the ``ServiceMonitor`` is configured with a ``scrapeTimeout`` of 30s and and ``interval`` of 60s. You can customize these values if needed.

With the above configuration in place you should be able to import the dashboards in your Grafana instance.

The above mentioned are basic dashboards and do no include all the metrics exposed by Flyte.
Please help us improve the dashboards by contributing to them 🙏.
Refer to the build scripts `here <https://github.com/flyteorg/flyte/tree/master/stats>`__.
10 changes: 6 additions & 4 deletions docs/flyte_agents/developing_agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,10 +184,7 @@ kubectl set image deployment/flyteagent flyteagent=ghcr.io/flyteorg/flyteagent:l
kubectl rollout restart deployment flytepropeller -n flyte
```

### 5.


### Canary deployment
### 5. Canary deployment

Agents can be deployed independently in separate environments. Decoupling agents from the
production environment ensures that if any specific agent encounters an error or issue, it will not impact the overall production system.
Expand All @@ -210,7 +207,12 @@ you can route particular task requests to designated agent services by adjusting
endpoint: "dns:///flyteagent.flyte.svc.cluster.local:8000"
insecure: true
timeouts:
# CreateTask, GetTask and DeleteTask are for async agents.
# ExecuteTaskSync is for sync agents.
CreateTask: 5s
GetTask: 5s
DeleteTask: 5s
ExecuteTaskSync: 10s
defaultTimeout: 10s
agents:
custom_agent:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ tasks:
- sidecar
- K8S-ARRAY
default-for-task-types:
- bigquery_query_job_task: agent-service
- sensor: agent-service
- container: container
- container_array: K8S-ARRAY
```
Expand All @@ -69,7 +69,12 @@ plugins:
endpoint: "localhost:8000" # your grpc agent server port
insecure: true
timeouts:
GetTask: 10s
# CreateTask, GetTask and DeleteTask are for async agents.
# ExecuteTaskSync is for sync agents.
CreateTask: 5s
GetTask: 5s
DeleteTask: 5s
ExecuteTaskSync: 10s
defaultTimeout: 10s
```
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ Below are the API reference to the different components of Flyte:
- Flyte's official Python SDK.
* - {doc}`FlyteCTL <flytectl/docs_index>`
- Flyte's command-line interface for interacting with a Flyte cluster.
* - {doc}`FlyteIDL <flytectl/docs_index>`
* - {doc}`FlyteIDL <reference_flyteidl>`
- Flyte's core specification language.
```

Expand Down
106 changes: 0 additions & 106 deletions flyte.yaml

This file was deleted.

Loading

0 comments on commit b867fac

Please sign in to comment.