You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
We are continually confronted with requests to our team that we are not responsible for or that users can easily resolve themselves if they search for the right information about the issue.
The idea is to create a guide that can be used by different parties (SRE, L2 support and users) to perform an initial investigation of a problem if they suspect it is Istio's cause.
This guide should help to rule out Istio as the cause of the problem so that our team does not have to be involved. Furthermore, we want to add documentation on how to fix problems so that our team does not have to be involved. The latter topic could overlap with the documentation on operational awareness.
Our goal is to reduce the effort required to investigate issues within our team, so the guide needs to be easy to understand and consume.
The following ideas originate from a conversation in Slack:
Before creating a new issue for a cluster, check for already existing issues for that cluster and verify if the issues are related.
For connection issues, before forwarding it to Istio module team check NetworkPolicies first. In the past we had issues that were forwarded to Istio, but it was easy to spot by checking the NetworkPolicies, that it's not Istio-related.
When an Istio problem is reported for a Kyma module, SRE should first check whether it also occurs for other modules. If the problem only occurs with a specific module, the team that owns that module should start investigating first before involving us.
what about checking istio cr status? if it’s warning that’s on user action
checking peerauth for blocking IPs
using istioctl analyse?
Check response flags: DC (Downstream client terminated connection), UC (Upstream terminated connection), is out of scope for Istio team, since it relates to client or workload application behaviour.
We need to decide where we want to place this documentation as it should be visible for every party. There was a proposal to
create a document “How to report Istio related issue” either in Troubleshooting Section.
Description
We are continually confronted with requests to our team that we are not responsible for or that users can easily resolve themselves if they search for the right information about the issue.
The idea is to create a guide that can be used by different parties (SRE, L2 support and users) to perform an initial investigation of a problem if they suspect it is Istio's cause.
This guide should help to rule out Istio as the cause of the problem so that our team does not have to be involved. Furthermore, we want to add documentation on how to fix problems so that our team does not have to be involved. The latter topic could overlap with the documentation on operational awareness.
Our goal is to reduce the effort required to investigate issues within our team, so the guide needs to be easy to understand and consume.
The following ideas originate from a conversation in Slack:
There is also a Troubleshooting page in the Istio GitHub wiki that can be referenced or checked for ideas.
We need to decide where we want to place this documentation as it should be visible for every party. There was a proposal to
create a document “How to report Istio related issue” either in Troubleshooting Section.
DoD:
Attachments
The text was updated successfully, but these errors were encountered: