Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many nodes have more open connections than expected #2532

Open
sameh-farouk opened this issue Feb 9, 2025 · 1 comment
Open

Many nodes have more open connections than expected #2532

sameh-farouk opened this issue Feb 9, 2025 · 1 comment

Comments

@sameh-farouk
Copy link
Member

Describe the bug

I checked on Testnet the new open connections metrics exposed by the new zos update and it seems many nodes have more open connections than expected

for example, node 13 (twin 16) on Testnet has 17 open connections vs 5 on other nodes on Devnet or QAnet

We need to debug this behavior and see what leads the nodes to open these extra connections

Expected behavior

It's expected to have about 5 connections as communicated on a related issue and observed on Devnet and QAnet networks

Logs

{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":4000797868032,"mru":202800259072,"ipv4u":0},"used":{"cru":10,"sru":357556027392,"hru":0,"mru":54337774387,"ipv4u":1},"system":{"cru":0,"sru":16106127360,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":64,"workloads":70,"last_deployment_timestamp":1737469014},"open_connections":7}', 'schema': 'application/json', 'epoch': 1739095835, 'twin_src': '3'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":0,"sru":214748364800,"hru":0,"mru":20280025907,"ipv4u":1},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":11,"workloads":13,"last_deployment_timestamp":1730715372},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095835, 'twin_src': '5'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":168975097856,"ipv4u":0},"used":{"cru":3,"sru":160511819776,"hru":0,"mru":25923651993,"ipv4u":2},"system":{"cru":0,"sru":16106127360,"hru":0,"mru":16897509785,"ipv4u":0},"users":{"deployments":11,"workloads":17,"last_deployment_timestamp":1737651948},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095835, 'twin_src': '10'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":6001185964032,"mru":202800123904,"ipv4u":0},"used":{"cru":2,"sru":88046829568,"hru":0,"mru":28769283686,"ipv4u":0},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280012390,"ipv4u":0},"users":{"deployments":7,"workloads":8,"last_deployment_timestamp":1730717256},"open_connections":12}', 'schema': 'application/json', 'epoch': 1739095835, 'twin_src': '11'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":2,"sru":88046829568,"hru":0,"mru":28769297203,"ipv4u":0},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":5,"workloads":6,"last_deployment_timestamp":1730717286},"open_connections":8}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '13'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":8,"sru":325343772672,"hru":0,"mru":54237111091,"ipv4u":0},"system":{"cru":0,"sru":16106127360,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":11,"workloads":15,"last_deployment_timestamp":1730717604},"open_connections":7}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '14'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":4,"sru":110570242048,"hru":0,"mru":29843039027,"ipv4u":2},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":39,"workloads":44,"last_deployment_timestamp":1738588689},"open_connections":9}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '15'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800123904,"ipv4u":0},"used":{"cru":8,"sru":259296067584,"hru":0,"mru":44237876838,"ipv4u":2},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280012390,"ipv4u":0},"users":{"deployments":15,"workloads":22,"last_deployment_timestamp":1738588760},"open_connections":17}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '16'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202802475008,"ipv4u":0},"used":{"cru":2,"sru":120259084288,"hru":0,"mru":24575214796,"ipv4u":0},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280247500,"ipv4u":0},"users":{"deployments":3,"workloads":4,"last_deployment_timestamp":1730718432},"open_connections":9}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '17'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":6001185964032,"mru":202800259072,"ipv4u":0},"used":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":1,"workloads":1,"last_deployment_timestamp":1730718432},"open_connections":9}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '19'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":8,"sru":394063249408,"hru":0,"mru":37459895091,"ipv4u":1},"system":{"cru":0,"sru":5368709120,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":3,"workloads":5,"last_deployment_timestamp":1730719314},"open_connections":10}', 'schema': 'application/json', 'epoch': 1739095836, 'twin_src': '21'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":5000991916032,"mru":202800259072,"ipv4u":0},"used":{"cru":6,"sru":303843770368,"hru":0,"mru":38533636915,"ipv4u":3},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":7,"workloads":13,"last_deployment_timestamp":1737664174},"open_connections":7}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '22'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":6001185964032,"mru":202800259072,"ipv4u":0},"used":{"cru":2,"sru":120259084288,"hru":0,"mru":24574993203,"ipv4u":1},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":4,"workloads":6,"last_deployment_timestamp":1735130826},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '23'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":6001185964032,"mru":202799075328,"ipv4u":0},"used":{"cru":1,"sru":21999124480,"hru":0,"mru":20816778444,"ipv4u":1},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20279907532,"ipv4u":0},"users":{"deployments":4,"workloads":6,"last_deployment_timestamp":1737657414},"open_connections":8}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '26'}
{'version': 1, 'data': '{"total":{"cru":32,"sru":3000614658048,"hru":16003126444032,"mru":135008747520,"ipv4u":0},"used":{"cru":2,"sru":587336777728,"hru":0,"mru":17795842048,"ipv4u":0},"system":{"cru":0,"sru":16106127360,"hru":0,"mru":13500874752,"ipv4u":0},"users":{"deployments":3,"workloads":7,"last_deployment_timestamp":1737469029},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '154'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":0,"sru":21474836480,"hru":0,"mru":20280025907,"ipv4u":0},"system":{"cru":0,"sru":21474836480,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":1,"workloads":1,"last_deployment_timestamp":1730720034},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '470'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800095232,"ipv4u":0},"used":{"cru":6,"sru":225485783040,"hru":0,"mru":28874138419,"ipv4u":1},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280009523,"ipv4u":0},"users":{"deployments":3,"workloads":4,"last_deployment_timestamp":1733734554},"open_connections":13}', 'schema': 'application/json', 'epoch': 1739095837, 'twin_src': '471'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800136192,"ipv4u":0},"used":{"cru":5,"sru":132569366528,"hru":0,"mru":30379897651,"ipv4u":3},"system":{"cru":0,"sru":21474836480,"hru":0,"mru":20280013619,"ipv4u":0},"users":{"deployments":12,"workloads":19,"last_deployment_timestamp":1739022195},"open_connections":12}', 'schema': 'application/json', 'epoch': 1739095838, 'twin_src': '475'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800136192,"ipv4u":0},"used":{"cru":0,"sru":5368709120,"hru":0,"mru":20280013619,"ipv4u":0},"system":{"cru":0,"sru":5368709120,"hru":0,"mru":20280013619,"ipv4u":0},"users":{"deployments":0,"workloads":0,"last_deployment_timestamp":0},"open_connections":10}', 'schema': 'application/json', 'epoch': 1739095838, 'twin_src': '477'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800119808,"ipv4u":0},"used":{"cru":0,"sru":5368709120,"hru":0,"mru":20280011980,"ipv4u":0},"system":{"cru":0,"sru":5368709120,"hru":0,"mru":20280011980,"ipv4u":0},"users":{"deployments":0,"workloads":0,"last_deployment_timestamp":0},"open_connections":5}', 'schema': 'application/json', 'epoch': 1739095838, 'twin_src': '478'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800119808,"ipv4u":0},"used":{"cru":0,"sru":10737418240,"hru":0,"mru":20280011980,"ipv4u":0},"system":{"cru":0,"sru":10737418240,"hru":0,"mru":20280011980,"ipv4u":0},"users":{"deployments":0,"workloads":0,"last_deployment_timestamp":0},"open_connections":13}', 'schema': 'application/json', 'epoch': 1739095838, 'twin_src': '479'}
{'version': 1, 'data': '{"total":{"cru":24,"sru":512110190592,"hru":9001778946048,"mru":202800259072,"ipv4u":0},"used":{"cru":0,"sru":5368709120,"hru":0,"mru":20280025907,"ipv4u":0},"system":{"cru":0,"sru":5368709120,"hru":0,"mru":20280025907,"ipv4u":0},"users":{"deployments":0,"workloads":0,"last_deployment_timestamp":0},"open_connections":12}', 'schema': 'application/json', 'epoch': 1739095838, 'twin_src': '480'}
@sameh-farouk
Copy link
Member Author

sameh-farouk commented Feb 10, 2025

I checked today the number of open connections again to see how often they increase and it seems that node 95 has opened one more extra connection in the last 24 hours so far (6 vs 5 yesterday)

node 95 logs:
https://mon.grid.tf/explore?orgId=1&left=%7B%22datasource%22:%22Loki-ZOS%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnetwork%3D%5C%22testing%5C%22,node%3D%5C%225HfEiceBtjJ7skLRp3ose2XgVaWbw7D5w7TtkqNTvGVmoDBC%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%221739006366568%22,%22to%22:%221739092766568%22%7D%7D

Is it related to this?

[+] mycelium: 2025-02-09T18:36:37.992520Z ERROR mycelium::peer: Failed to flush buffered peer connection control packets: Connection reset by peer (os error 104)
[+] noded: 2025-02-09T18:36:26Z info utc time from tfchain: 2025-02-09 18:36:25 +0000 UTC
[+] mycelium: 2025-02-09T18:36:23.994905Z ERROR mycelium::peer: Frame error from TCP 192.168.1.159:39106 <-> 188.40.132.242:9651: Connection reset by peer (os error 104)
[+] mycelium: 2025-02-09T18:36:23.994881Z ERROR mycelium::peer: Frame error from TCP 192.168.1.159:40418 <-> 136.243.47.186:9651: Connection reset by peer (os error 104)
[-] dhcp-zos: zos: adding default route via fe80::3a07:16ff:fe0d:fca0
[+] noded: 2025-02-09T18:36:19Z info running NTP check against tfchain
[-] dhcp-zos: zos: deleting default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: fe80::3a07:16ff:fe0d:fca0: no longer a default router
[+] api-gateway: 2025-02-09T18:35:54Z error error="connection stalling"
[-] dhcp-zos: zos: adding default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: deleting default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: fe80::3a07:16ff:fe0d:fca0: no longer a default router
----
[+] mycelium: 2025-02-09T12:07:18.826226Z ERROR mycelium::peer: Failed to flush buffered peer connection control packets: Connection reset by peer (os error 104)
[-] dhcp-zos: zos: adding default route via fe80::3a07:16ff:fe0d:fca0
[+] mycelium: 2025-02-09T12:07:01.524658Z ERROR mycelium::peer_manager: Couldn't connect to endpoint, turn on debug logging for more details endpoint.address=65.21.231.58:9651 endpoint.proto=Tcp
[-] dhcp-zos: zos: deleting default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: fe80::3a07:16ff:fe0d:fca0: no longer a default router
[+] mycelium: 2025-02-09T12:06:59.426775Z ERROR mycelium::peer: Frame error from TCP 192.168.1.159:33110 <-> 65.21.231.58:9651: Connection reset by peer (os error 104)
[+] mycelium: 2025-02-09T12:06:59.403989Z ERROR mycelium::peer: Frame error from TCP 192.168.1.159:56622 <-> 136.243.47.186:9651: Connection reset by peer (os error 104)
[+] api-gateway: 2025-02-09T12:06:58Z error error="connection stalling"
[+] api-gateway: 2025-02-09T12:06:54Z error error="connection stalling"
[-] dhcp-zos: zos: adding default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: deleting default route via fe80::3a07:16ff:fe0d:fca0
[-] dhcp-zos: zos: fe80::3a07:16ff:fe0d:fca0: no longer a default router

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant