Improve troubleshooting of Istio #440
Labels
area/service-mesh
Issues or PRs related to service-mesh
Epic
kind/feature
Categorizes issue or PR as related to a new feature.
Description
Istio service mesh is the component for which we have the highest amount of internal and customer incident. In many or most of the cases the problem is caused by misconfiguration on customer side. We should make troubleshooting easier for both customer and SRE to lower the number of the cases that get wired to Goats.
Reasons
experimental
istioctl x
commandTasks to do/discuss(❓)
Improve and make Istio Custom Resource status more fine grained #444
description
field into additional ones that would be easier to aggregateWarning
happenedTroubleshooting guide for debugging Istio issues with concrete commands.
Provide solutions and best practices based on state of the cluster:
istioctl analyze
- IDEA: Seperate istio-agent that can be used the retrieve info about service-mesh stateingress-gateway
healthz
endpoint)❓Cluster reconciliation/upgrade check: Run a check before reconciliation (or Istio upgrade) to evaluate if the current cluster should be reconciled. This way we might reduce the incidents by setting Istio CR in a warning state (user action required) and skip reconciliation if a resource/field is used on cluster, that would lead to an error state, e.g. EnvoyFilter (proxy_protocol).
The text was updated successfully, but these errors were encountered: