First Istio troubleshooting guide #543

triffer · 2023-12-28T14:22:14Z

Description
We are continually confronted with requests to our team that we are not responsible for or that users can easily resolve themselves if they search for the right information about the issue.
The idea is to create a guide that can be used by different parties (SRE, L2 support and users) to perform an initial investigation of a problem if they suspect it is Istio's cause.
This guide should help to rule out Istio as the cause of the problem so that our team does not have to be involved. Furthermore, we want to add documentation on how to fix problems so that our team does not have to be involved. The latter topic could overlap with the documentation on operational awareness.

Our goal is to reduce the effort required to investigate issues within our team, so the guide needs to be easy to understand and consume.

The following ideas originate from a conversation in Slack:

Before creating a new issue for a cluster, check for already existing issues for that cluster and verify if the issues are related.

For connection issues, before forwarding it to Istio module team check NetworkPolicies first. In the past we had issues that were forwarded to Istio, but it was easy to spot by checking the NetworkPolicies, that it's not Istio-related.

When an Istio problem is reported for a Kyma module, SRE should first check whether it also occurs for other modules. If the problem only occurs with a specific module, the team that owns that module should start investigating first before involving us.

what about checking istio cr status? if it’s warning that’s on user action

checking peerauth for blocking IPs

using istioctl analyse?

Check response flags: DC (Downstream client terminated connection), UC (Upstream terminated connection), is out of scope for Istio team, since it relates to client or workload application behaviour.

There is also a Troubleshooting page in the Istio GitHub wiki that can be referenced or checked for ideas.

We need to decide where we want to place this documentation as it should be visible for every party. There was a proposal to
create a document “How to report Istio related issue” either in Troubleshooting Section.

DoD:

Provide documentation.

Attachments

triffer added the kind/feature label Dec 28, 2023

kolodziejczak self-assigned this Mar 8, 2024

triffer self-assigned this Mar 18, 2024

kolodziejczak unassigned triffer Mar 25, 2024

strekm closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Istio troubleshooting guide #543

First Istio troubleshooting guide #543

triffer commented Dec 28, 2023 •

edited

Loading

First Istio troubleshooting guide #543

First Istio troubleshooting guide #543

Comments

triffer commented Dec 28, 2023 • edited Loading

triffer commented Dec 28, 2023 •

edited

Loading