You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are operating in an Amazon EKS environment where communication between different clusters was previously facilitated through an NLB, utilizing api-gateway. We have transitioned to using linkerd multicluster Communication for pod-to-pod communication.
However, we are experiencing intermittent latency spikes, occurring approximately every 30 seconds, with response times exceeding 100ms when attempting pod-to-pod communication, compared to the previous average response time of around 10ms for the same client application using the previous route.
Upon investigation with APM (DataDog), it's evident that there are untracked spans consuming significant time, likely indicating a bottleneck occurring within the linkerd communication process.
Considering the circumstances, it appears there may be something within the communication facilitated by linkerd causing periodic delays. How should we proceed? What additional information would be beneficial to provide?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We are operating in an Amazon EKS environment where communication between different clusters was previously facilitated through an NLB, utilizing api-gateway. We have transitioned to using linkerd multicluster Communication for pod-to-pod communication.
However, we are experiencing intermittent latency spikes, occurring approximately every 30 seconds, with response times exceeding 100ms when attempting pod-to-pod communication, compared to the previous average response time of around 10ms for the same client application using the previous route.
Upon investigation with APM (DataDog), it's evident that there are untracked spans consuming significant time, likely indicating a bottleneck occurring within the linkerd communication process.
Considering the circumstances, it appears there may be something within the communication facilitated by linkerd causing periodic delays. How should we proceed? What additional information would be beneficial to provide?
linkerd version: stable-2.14.2
Beta Was this translation helpful? Give feedback.
All reactions