You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Istio service mesh is the component for which we have the highest amount of internal and customer incident. In many or most of the cases the problem is caused by misconfiguration on customer side. We should make troubleshooting easier for both customer and SRE to lower the number of the cases that get wired to Goats.
Reasons
Istio is difficult to troubleshoot without extensive knowledge
It is often hard to find the correct tooling required to find the issue as Istio documentation is huge and for istioctl some of the most useful commands are present under the so called experimentalistioctl x command
Split up description field into additional ones that would be easier to aggregate
Add more information to the Custom Resource when a Warning happened
Add conditions to status
Troubleshooting guide for debugging Istio issues with concrete commands.
Provide solutions and best practices based on state of the cluster:
Warnings in Busola
❓Use output of istioctl analyze - IDEA: Seperate istio-agent that can be used the retrieve info about service-mesh state
❓Provide admission webhook that would block resources that have configuration not allowed in Kyma (for example Authorization Policies that block ingress-gatewayhealthz endpoint)
❓Cluster reconciliation/upgrade check: Run a check before reconciliation (or Istio upgrade) to evaluate if the current cluster should be reconciled. This way we might reduce the incidents by setting Istio CR in a warning state (user action required) and skip reconciliation if a resource/field is used on cluster, that would lead to an error state, e.g. EnvoyFilter (proxy_protocol).
The text was updated successfully, but these errors were encountered:
Description
Istio service mesh is the component for which we have the highest amount of internal and customer incident. In many or most of the cases the problem is caused by misconfiguration on customer side. We should make troubleshooting easier for both customer and SRE to lower the number of the cases that get wired to Goats.
Reasons
experimental
istioctl x
commandTasks to do/discuss(❓)
Improve and make Istio Custom Resource status more fine grained #444
description
field into additional ones that would be easier to aggregateWarning
happenedTroubleshooting guide for debugging Istio issues with concrete commands.
Provide solutions and best practices based on state of the cluster:
istioctl analyze
- IDEA: Seperate istio-agent that can be used the retrieve info about service-mesh stateingress-gateway
healthz
endpoint)❓Cluster reconciliation/upgrade check: Run a check before reconciliation (or Istio upgrade) to evaluate if the current cluster should be reconciled. This way we might reduce the incidents by setting Istio CR in a warning state (user action required) and skip reconciliation if a resource/field is used on cluster, that would lead to an error state, e.g. EnvoyFilter (proxy_protocol).
The text was updated successfully, but these errors were encountered: