Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Istio troubleshooting guide #543

Closed
1 task done
triffer opened this issue Dec 28, 2023 · 0 comments
Closed
1 task done

First Istio troubleshooting guide #543

triffer opened this issue Dec 28, 2023 · 0 comments
Assignees

Comments

@triffer
Copy link
Contributor

triffer commented Dec 28, 2023

Description
We are continually confronted with requests to our team that we are not responsible for or that users can easily resolve themselves if they search for the right information about the issue.
The idea is to create a guide that can be used by different parties (SRE, L2 support and users) to perform an initial investigation of a problem if they suspect it is Istio's cause.
This guide should help to rule out Istio as the cause of the problem so that our team does not have to be involved. Furthermore, we want to add documentation on how to fix problems so that our team does not have to be involved. The latter topic could overlap with the documentation on operational awareness.

Our goal is to reduce the effort required to investigate issues within our team, so the guide needs to be easy to understand and consume.

The following ideas originate from a conversation in Slack:

  • Before creating a new issue for a cluster, check for already existing issues for that cluster and verify if the issues are related.
  • For connection issues, before forwarding it to Istio module team check NetworkPolicies first. In the past we had issues that were forwarded to Istio, but it was easy to spot by checking the NetworkPolicies, that it's not Istio-related.
  • When an Istio problem is reported for a Kyma module, SRE should first check whether it also occurs for other modules. If the problem only occurs with a specific module, the team that owns that module should start investigating first before involving us.
  • what about checking istio cr status? if it’s warning that’s on user action
  • checking peerauth for blocking IPs
  • using istioctl analyse?
  • Check response flags: DC (Downstream client terminated connection), UC (Upstream terminated connection), is out of scope for Istio team, since it relates to client or workload application behaviour.

There is also a Troubleshooting page in the Istio GitHub wiki that can be referenced or checked for ideas.

We need to decide where we want to place this documentation as it should be visible for every party. There was a proposal to
create a document “How to report Istio related issue” either in Troubleshooting Section.

DoD:

  • Provide documentation.

Attachments

@kolodziejczak kolodziejczak self-assigned this Mar 8, 2024
@triffer triffer self-assigned this Mar 18, 2024
@strekm strekm closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants