Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: io error: Connection timed out (os error 110) Errors #1446

Open
ricoberger opened this issue Feb 7, 2025 · 2 comments
Open

receive: io error: Connection timed out (os error 110) Errors #1446

ricoberger opened this issue Feb 7, 2025 · 2 comments

Comments

@ricoberger
Copy link

ricoberger commented Feb 7, 2025

Hi, we are currently trying to migrate from Istio with Sidecars to Ambient Mode. With Ambient Mode enabled we are seeing a lot of the following errors in ztunnel:

receive: io error: Connection timed out (os error 110)

These errors are happening very regular, between services, when calling the Kubernetes API or an external service (like Azure Blob Storage).

We are using Istio 1.24.2, which should include #1377 and if we enable the debug log it also looks like the keepalive settings are used: set keepalive: Ok(())

@ricoberger
Copy link
Author

Error log:

{"level":"error","time":"2025-02-04T12:58:13.683616Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:54344","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2858,"bytes_recv":2749,"duration":"1315643ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:52:35.299743Z","scope":"access","message":"connection complete","src.addr":"10.244.149.41:41994","src.workload":"kobs-cluster-8f95f8bfb-r9v8s","src.namespace":"kobs","dst.addr":"10.244.8.128:27017","dst.workload":"krusty-mongodb-1","dst.namespace":"email","direction":"outbound","bytes_sent":1165,"bytes_recv":3054,"duration":"1005865ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:52:05.043579Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:58028","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2858,"bytes_recv":2945,"duration":"1435470ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:48:50.019713Z","scope":"access","message":"connection complete","src.addr":"10.244.149.41:33062","src.workload":"kobs-cluster-8f95f8bfb-r9v8s","src.namespace":"kobs","dst.addr":"10.244.8.128:27017","dst.workload":"krusty-mongodb-1","dst.namespace":"email","direction":"outbound","bytes_sent":1165,"bytes_recv":3054,"duration":"1009570ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:48:45.923725Z","scope":"access","message":"connection complete","src.addr":"10.244.149.41:33050","src.workload":"kobs-cluster-8f95f8bfb-r9v8s","src.namespace":"kobs","dst.addr":"10.244.8.128:27017","dst.workload":"krusty-mongodb-1","dst.namespace":"email","direction":"outbound","bytes_sent":1368,"bytes_recv":2592,"duration":"1005475ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:46:43.507594Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:36604","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2851,"bytes_recv":2905,"duration":"1417886ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:46:23.027596Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:56610","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2858,"bytes_recv":2912,"duration":"1305391ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:45:21.583593Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:48532","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2863,"bytes_recv":2749,"duration":"1258038ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:42:45.939583Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:35540","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2858,"bytes_recv":3141,"duration":"1513534ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:41:19.459714Z","scope":"access","message":"connection complete","src.addr":"10.244.149.41:55622","src.workload":"kobs-cluster-8f95f8bfb-r9v8s","src.namespace":"kobs","dst.addr":"10.244.8.128:27017","dst.workload":"krusty-mongodb-1","dst.namespace":"email","direction":"outbound","bytes_sent":1168,"bytes_recv":2253,"duration":"1010925ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:41:15.363733Z","scope":"access","message":"connection complete","src.addr":"10.244.149.41:55604","src.workload":"kobs-cluster-8f95f8bfb-r9v8s","src.namespace":"kobs","dst.addr":"10.244.8.128:27017","dst.workload":"krusty-mongodb-1","dst.namespace":"email","direction":"outbound","bytes_sent":1368,"bytes_recv":2592,"duration":"1006833ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:39:06.803602Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:48754","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2855,"bytes_recv":688,"duration":"1482258ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:37:51.023600Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:43892","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2843,"bytes_recv":676,"duration":"1208402ms","error":"receive: io error: Connection timed out (os error 110)"}
{"level":"error","time":"2025-02-04T12:32:21.299564Z","scope":"access","message":"connection complete","src.addr":"10.244.14.105:40548","src.workload":"strimzi-cluster-operator-865d9675d9-dv4d7","src.namespace":"strimzi","dst.addr":"10.234.0.4:443","dst.service":"kubernetes.default.svc.cluster.local","dst.workload":"kubernetes","dst.namespace":"default","direction":"outbound","bytes_sent":2843,"bytes_recv":676,"duration":"1467678ms","error":"receive: io error: Connection timed out (os error 110)"}

Output of ss -ont on one of the effected Pods:

State                Recv-Q           Send-Q                               Local Address:Port                                 Peer Address:Port            Process
FIN-WAIT-1           0                54                                    10.244.14.15:41730                                  10.234.0.4:443              timer:(on,32sec,10)
ESTAB                0                0                                     10.244.14.15:59128                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:44074                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:59116                                  10.234.0.4:443
FIN-WAIT-1           0                54                                    10.244.14.15:57114                                  10.234.0.4:443              timer:(on,38sec,8)
ESTAB                0                0                                     10.244.14.15:36774                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:57146                                10.244.21.17:15008
ESTAB                0                0                                     10.244.14.15:59182                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:51602                                  10.234.0.4:443
FIN-WAIT-1           0                54                                    10.244.14.15:43022                                  10.234.0.4:443              timer:(on,32sec,12)
ESTAB                0                0                                     10.244.14.15:59152                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:59166                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:60916                                 10.244.21.3:15008
ESTAB                0                0                                     10.244.14.15:59142                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:51858                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:59160                                  10.234.0.4:443
FIN-WAIT-1           0                54                                    10.244.14.15:34334                                  10.234.0.4:443              timer:(on,1min50sec,13)
ESTAB                0                0                                     10.244.14.15:59192                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:43826                                10.244.25.19:15008
ESTAB                0                0                                     10.244.14.15:46702                                  10.234.0.4:443
ESTAB                0                0                                     10.244.14.15:59138                                  10.234.0.4:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50108
CLOSE-WAIT           0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:47318
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:47464
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50142
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50150
ESTAB                0                0                            [::ffff:10.244.14.15]:50096                           [::ffff:10.0.0.1]:443
ESTAB                0                0                            [::ffff:10.244.14.15]:50130                           [::ffff:10.0.0.1]:443
CLOSE-WAIT           0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:51032
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:51624
ESTAB                0                0                            [::ffff:10.244.14.15]:50150                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50130
ESTAB                0                0                            [::ffff:10.244.14.15]:50160                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50120
ESTAB                0                0                            [::ffff:10.244.14.15]:43680                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:43680
ESTAB                0                0                            [::ffff:10.244.14.15]:51624                           [::ffff:10.0.0.1]:443
ESTAB                0                0                            [::ffff:10.244.14.15]:50120                           [::ffff:10.0.0.1]:443
ESTAB                0                0                            [::ffff:10.244.14.15]:52260                           [::ffff:10.0.0.1]:443
ESTAB                0                0                            [::ffff:10.244.14.15]:50142                           [::ffff:10.0.0.1]:443
ESTAB                0                0                            [::ffff:10.244.14.15]:47464                           [::ffff:10.0.0.1]:443
FIN-WAIT-2           0                0                            [::ffff:10.244.14.15]:53598                           [::ffff:10.0.0.1]:443              timer:(timewait,52sec,0)
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:60696
ESTAB                0                0                            [::ffff:10.244.14.15]:50090                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50096
ESTAB                0                0                            [::ffff:10.244.14.15]:50086                           [::ffff:10.0.0.1]:443
CLOSE-WAIT           0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:47554
ESTAB                0                0                            [::ffff:10.244.14.15]:60696                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:52260
ESTAB                0                0                            [::ffff:10.244.14.15]:50108                           [::ffff:10.0.0.1]:443
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50086
CLOSE-WAIT           0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:53598
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50160
ESTAB                0                0                               [::ffff:127.0.0.1]:15001                       [::ffff:10.244.14.15]:50090

@howardjohn
Copy link
Member

So timer:(on,32sec,10) implies it has tried to retransmit 10 times. This is not actually a keepalive failure at all, but a packet loss scenario

Ref https://github.com/sivasankariit/iproute2/blob/1179ab033c31d2c67f406be5bcd5e4c0685855fe/misc/ss.c#L449C20-L449C28 and idiag_timer from https://man7.org/linux/man-pages/man7/sock_diag.7.html.

In simulated packet loss I see persist which is a a zero window probe timer fwiw. The expected is keepalive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants