share/p2p/shrexeds: Error handling #2176

distractedm1nd · 2023-05-09T15:07:53Z

A lot of new errors are appearing client side against BSR.

Application Error times out after 60s

2023-05-09T15:30:49.015+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{"hash": "FB78B3FEC22B77F98C4A23BB27EB557E202172AD74A018800CB9A09299A99EC8", "peer": "12D3KooWBc9nKfmKJZKqkto9UHvQce1ZZZy3MoNdZzDQpgi2SFkD", "attempt": 1, "err": "failed to read status from stream: Application error 0x0 (remote)", "finished (s)": 57.119983625}

Failed to read status from stream: deadline exceeded after 60s

Why does it happen at this point? If ratelimited, the err should be io.EOF. The deadline should be smaller here

2023-05-09T16:14:34.920+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{"peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "error": "failed to read status from stream: deadline exceeded"}
2023-05-09T16:14:34.920+0200	DEBUG	share/getters	getters/shrex.go:172	eds: request failed	{"hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "attempt": 1, "err": "context deadline exceeded", "finished (s)": 60.000769791}

Somehow we get a peer that doesnt support the protocol (almost immediate error)

How is this possible with our current nodes?
We should probably just blacklist

2023-05-09T15:24:15.781+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “error”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{“hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “attempt”: 3, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“, “finished (s)“: 0.019803125}

Stateless reset with token after 22s

Has something to do w QUIC

2023-05-09T15:02:30.163+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729"}
2023-05-09T15:02:30.163+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “attempt”: 1, “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729”, “finished (s)“: 22.121177}

Failed to open stream: context deadline exceeded (after 60s)

We should not let the opening of the stream take more than maybe 5 seconds.

2023-05-09T14:32:34.940+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “error”: “failed to open stream: context deadline exceeded”}
2023-05-09T14:32:34.940+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “attempt”: 1, “err”: “context deadline exceeded”, “finished (s)“: 60.000322042}

~~7. RequestEDS checks child context for ctx error, then returns parent context error, leading to returning of nil, nil and causing panic~~ Resolved

"failed to read eds from ods bytes: share: reading next car entry: stream reset"

2023-05-13T17:15:36.413+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "80730570DB5B2F39DCC2BE8091A3322E3B0B404ABAA2EC22F570E9615A60161C", "err": "failed to read eds from ods bytes: share: reading next car entry: stream reset"}

Not an actual error, but server node could always return ErrNotFound and not get any punishment

No recent network activity.

Happens for multiple peers at the same time, so probably related to client connectivity issues. Needs to be handled accordingly (no remote peer punishment)

2023-05-13T20:03:30.190+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBASepmTQtATM196L1wWH6kVd4TBgpUN57koyfjT2vheX", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "3F2C7DE3EB02FE219BF7CE931030E6E79C0E687000187B5FF456DEE2D9FB010D", "err": "failed to read eds from ods bytes: share: reading next car entry: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "A5723C5C033FB1F6A54BE906639C901FBC3C01278F5EDD8C26CB651BB6D96997", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "76044121A20F6AB9FAE153F52CE671C4363385478FC1A12BF8717113F1A96F6A", "err": "failed to read status from stream: timeout: no recent network activity"}

Failed to open stream: failed to dial

Happens right after 10, so must be related to connectivity issue on client side.
Needs to be handled too

2023-05-13T21:03:59.627+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "83676B211F75ECC843F411D3E2A57B8BD368DD1B8786CB8A73AFDFB08FA1E45B", "err": "failed to open stream: failed to dial 12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng:\n  * [/ip4/68.183.28.164/tcp/2121] dial tcp4 0.0.0.0:2121->68.183.28.164:2121: i/o timeout\n  * [/ip4/68.183.28.164/udp/2121/quic-v1] timeout: no recent network activity"}

Stream reset on reading response status.

The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.

2023-05-13T21:03:02.747+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBTFMz9doATz6gxb9VUZ5d2iJ1udBfJCkho6ixgeu6nhE", "hash": "4EA6E0541DAEF1C7206EC5B5CA9C00F6546F6E32E12D6711F6B4C60E69BD3C8D", "err": "failed to read status from stream: stream reset"}

Routing not found

It is extremely unlikely that peer has disconnected at the point of shrexeds request. The fact that there are 3 peers disconnected in short sequence requires investigation.

2023-05-13T21:03:08.633+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "9F78CE8A2E753B78A4380C530F4AA8B6F8125CC38F38EE9BDDAD8932E083672C", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:20.434+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "4657B89D6D7618A6928B1612A26DB9D03BA3D6182A4388D4A0165DDD759C0FB1", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:42.354+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBCfAdTYhtpuzbKqnc5WM3qh7Zkp3DmjrbAwgTsgiKQPW", "hash": "B366AFE879E7979DF8A132A3D3E5833F48FCF7F9434F6C4E4AE81CC0B1F7277C", "err": "failed to open stream: failed to find peers: routing: not found"}

The text was updated successfully, but these errors were encountered:

Resolves item 7 on #2176

walldiss · 2023-05-13T14:44:01Z

Added 3 more possible errors and edge case that needs handling for edsclient

walldiss · 2023-05-15T06:52:28Z

Application Error times out after 60s

From libp2p docs:

``{12D... Application error 0x0: conn-27571160: system: cannot reserve inbound connection: resource limit exceeded}```

This error occurred when a connection had been made with a remote peer but the remote peer dropped the connection (due to it exceeding limits).
This was actually an `Application error` emitted by `quic-go` and it was a bug in go-libp2p that we sent the whole message.
For now, we decided to stop sending this confusing error message. In the future, we will report such errors via [error codes](https://github.com/libp2p/specs/issues/479).

Action plan:

Need to check if it is possible to make libp2p gracefully drop active connections, without interrupting active transfer.
Check connection stats for full nodes on bsr and revise connection limit.
The reason for error could be too many peers trying to sample from the same full node after shrex.Sub msg. Need to check if error happens only for recent headers sampling or for historic too.

distractedm1nd · 2023-05-15T07:06:52Z

On the client side, it sounds like Application Error should be handled as a RateLimit/NotFound request, and be scheduled for a retry (so just cooldown the peer)

walldiss · 2023-05-15T07:11:38Z

No recent network activity.

Happens for multiple peers at the same time, so probably related to client connectivity issues. Needs to be handled accordingly (no remote peer punishment)

2023-05-13T20:03:30.190+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBASepmTQtATM196L1wWH6kVd4TBgpUN57koyfjT2vheX", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "3F2C7DE3EB02FE219BF7CE931030E6E79C0E687000187B5FF456DEE2D9FB010D", "err": "failed to read eds from ods bytes: share: reading next car entry: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "A5723C5C033FB1F6A54BE906639C901FBC3C01278F5EDD8C26CB651BB6D96997", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "76044121A20F6AB9FAE153F52CE671C4363385478FC1A12BF8717113F1A96F6A", "err": "failed to read status from stream: timeout: no recent network activity"}

Failed to open stream: failed to dial

Happens right after 10, so must be related to connectivity issue on client side.
Needs to be handled too

2023-05-13T21:03:59.627+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "83676B211F75ECC843F411D3E2A57B8BD368DD1B8786CB8A73AFDFB08FA1E45B", "err": "failed to open stream: failed to dial 12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng:\n  * [/ip4/68.183.28.164/tcp/2121] dial tcp4 0.0.0.0:2121->68.183.28.164:2121: i/o timeout\n  * [/ip4/68.183.28.164/udp/2121/quic-v1] timeout: no recent network activity"}

Stream reset on reading response status.

The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.

2023-05-13T21:03:02.747+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBTFMz9doATz6gxb9VUZ5d2iJ1udBfJCkho6ixgeu6nhE", "hash": "4EA6E0541DAEF1C7206EC5B5CA9C00F6546F6E32E12D6711F6B4C60E69BD3C8D", "err": "failed to read status from stream: stream reset"}

Routing not found

It is extremely unlikely that peer has disconnected at the point of shrexeds request. The fact that there are 3 peers disconnected in short sequence requires investigation.

2023-05-13T21:03:08.633+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "9F78CE8A2E753B78A4380C530F4AA8B6F8125CC38F38EE9BDDAD8932E083672C", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:20.434+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "4657B89D6D7618A6928B1612A26DB9D03BA3D6182A4388D4A0165DDD759C0FB1", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:42.354+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBCfAdTYhtpuzbKqnc5WM3qh7Zkp3DmjrbAwgTsgiKQPW", "hash": "B366AFE879E7979DF8A132A3D3E5833F48FCF7F9434F6C4E4AE81CC0B1F7277C", "err": "failed to open stream: failed to find peers: routing: not found"}

walldiss · 2023-05-15T07:13:10Z

Stateless reset with token after 22s

Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.

2023-05-14T00:34:49.049+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWJkrQ6tRXik4gyN6A6BsKd6UsMhMDH2KLXnj6sGXHJJn2", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token bd133665910f96aa331ea6e530ec4c30"}
2023-05-14T00:34:49.262+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWEkFUj5rrWoeQL9stJR6AkC2F1MBT4MUeiqaoi8ZUAADA", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token a2ff5632a83e43d3ca13a3499dcd9704"}
2023-05-14T00:34:49.499+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDWJUWz3zRfB2bfXJxK7WPSM9QsDDPeME1BYJqq2h9oTh", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token eda08205ebeb0307b4104d76523c8477"}
2023-05-14T00:34:49.558+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWHB32cdsPLemcuJdjyLUJdWqp1R4GJK6s2ueqJvfxobdg", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token b2f223721890f860f6fcd2677df4d9ad"}

Wondertan · 2023-05-15T09:22:02Z

@walldiss

Need to check if it is possible to make libp2p gracefully drop active connections, without interrupting active transfer.

It's not and there is not so much value in doing so.

Check connection stats for full nodes on bsr and revise connection limit.

FN/BNs have unlimited resources. The limits are only on LN.

The reason for error could be too many peers trying to sample from the same full node after shrex.Sub msg. Need to check if error happens only for recent headers sampling or for historic too.

FN does not have limits. This error is coming from remote LN.

We should not be doing anything with this error and simply ignore it and handle with general error handling flow.

Wondertan · 2023-05-15T09:33:13Z

No recent network activity.

No traffic was over the connection for over 30secs. Might be a connection that was ungracefully dropped, e.g. application was killed without telling the remote. Should go over general error-handling flow as well.

Failed to open stream: failed to dial
Happens right after 10, so must be related to connectivity issue on client side.

Or the server side

Needs to be handled too

It does not need a special error case and can go over the general error path.

Stream reset on reading response status.
The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.

There are other cases when this can happen. The client can gracefully stop their node while doing a request which results in a stream reset. It's unlikely to be a bug

Wondertan · 2023-05-15T09:41:31Z

To summarize. Most of the errors above are unrelated to shrex. Similarly, shrex should not be responsible for handling them. They are various manifestations of issues on the remote side, which shrex can do nothing about. It should just ignore this peer, let it go and move on.

However, I really appreciate the effort here to document each.

Wondertan · 2023-05-15T09:51:22Z

Stateless reset with token after 22s

Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.

Do we know when this will happen? AFAUR, this can happen when you kill all the connections(reboot) and reconnect to the same peers, which has not yet figured out that the old connection was dead.

EDIT: quic-go/quic-go#2836 (comment), seems true

walldiss · 2023-05-15T09:58:01Z

FN does not have limits. This error is coming from remote LN.

Light node don't serve eds via shrex server. So the error happens when FN / BN is reseting the stream on server side

walldiss · 2023-05-15T10:01:06Z

It's not and there is not so much value in doing so.

If we can prevent them to happen by properly configuring networking stack it would benefit shrex stability. Why do you think there is no much value to fix this error case?

Wondertan · 2023-05-15T10:21:17Z

If we can prevent them to happen by properly configuring networking stack it would benefit shrex stability. Why do you think there is no much value to fix this error case?

I mean, there is not so much value in trying to save active transfers.

bert2002 · 2023-06-05T11:58:49Z

Hi,
we are seeing a similar or same issue with v0.10.3 on mocha. We run it in docker with

celestia full start --core.ip rpc-mocha.pops.one --gateway --gateway.addr 0.0.0.0 --gateway.port 26659 --p2p.network mocha

and experience following issue:

Jun 05 11:32:59 node systemd[1]: Started celestia.service.
Jun 05 11:32:59 node celestia[32404]:  Network celestia_default  Creating
Jun 05 11:33:00 node celestia[32404]:  Network celestia_default  Created
Jun 05 11:33:00 node celestia[32404]:  Container celestia-celestia-1  Creating
Jun 05 11:33:00 node celestia[32404]:  Container celestia-celestia-1  Created
Jun 05 11:33:00 node celestia[32404]: Attaching to celestia-celestia-1
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.332Z        INFO        node        nodebuilder/module.go:22        Accessing keyring...
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.335Z        WARN        module/gateway        gateway/module.go:25        Gateway is enabled, however gateway endpoints are not authenticated. Use with caution!
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.354Z        INFO        badger        [email protected]/levels.go:183        All 0 tables opened in 0s
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  |
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023/06/05 11:33:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.603Z        INFO        module/header        header/config.go:58        No trusted peers in config, initializing with default bootstrappers as trusted peers
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        header/p2p        p2p/exchange.go:84        client: starting client        {"protocol ID": "/mocha/header-ex/v0.0.3"}
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        fraudserv        fraudserv/helpers.go:29        joining topic        {"id": "/mocha/fraud-sub/badencoding/v0.0.1"}
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        fraudserv        fraudserv/service.go:97        starting fraud proof service        {"protocol ID": "/mocha/fraud/v0.0.1"}
Jun 05 11:33:02 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:02.608Z        INFO        share/discovery        discovery/discovery.go:254        discovering peers        {"want": 5}
Jun 05 11:33:04 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:04.584Z        INFO        share/discovery        discovery/discovery.go:300        discovered wanted peers        {"amount": 5}
Jun 05 11:33:04 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:04.990Z        INFO        share/discovery        discovery/discovery.go:300        discovered wanted peers        {"amount": 6}
Jun 05 11:33:06 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:06.584Z        INFO        basichost        basic/natmgr.go:93        DiscoverNAT error:no NAT found
Jun 05 11:33:07 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:07.010Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWJPvkxWQzJYnaiVQMgrXTuJzvExvhQRbwuQg5aRdKT4yA addr=/ip4/85.239.239.131/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:33:08 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:08.281Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWGpNJx4iSahk1HC34qj7DLtH2yn3KaaTSnwNCstMk3jiA addr=/ip4/89.117.57.31/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:33:13 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:13.502Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWEvcLVzhzBbegE8ot8LsA514vXjrkzDiNWq1mVbysXZkv addr=/ip4/84.115.217.205/udp/45637/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:34:09.139Z        INFO        dagstore        [email protected]/dagstore_control.go:48        dagstore closed
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:34:09.146Z        INFO        badger        [email protected]/db.go:1027        Storing value log head: {Fid:0 Len:29 Offset:1650}
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  |
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | Error: node: failed to start: header/p2p: failed to open a new stream: protocols not supported: [/mocha/header-ex/v0.0.3]
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1 exited with code 0
Jun 05 11:34:09 node celestia[32785]:  Container celestia-celestia-1  Stopping
Jun 05 11:34:09 node celestia[32785]:  Container celestia-celestia-1  Stopping
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Stopped
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Removing
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Removed
Jun 05 11:34:20 node celestia[32785]:  Network celestia_default  Removing
Jun 05 11:34:20 node celestia[32785]:  Network celestia_default  Removed
Jun 05 11:34:20 node systemd[1]: celestia.service: Succeeded.

This is causing the container to stop and needs to be restarted (though same problem happens again).

Any advice or more information are needed?

Thanks,
bert

Wondertan · 2023-06-05T12:00:55Z

Hey @bert2002, please open a separate issue for the issue. The good news is that we know what went wrong

Wondertan · 2023-06-05T12:22:18Z

I am closing this one is resolved and in favor of #2234

If you (@distractedm1nd and @walldiss) think there are essential errors here that were not resolved, let's move them into a separate issue. This one has lots of unrelated context

distractedm1nd added bug Something isn't working area:shares Shares and samples labels May 9, 2023

derrandz mentioned this issue May 9, 2023

fix(share/p2p/shrexeds): return correct err on context error #2177

Merged

5 tasks

distractedm1nd pushed a commit that referenced this issue May 10, 2023

fix(share/p2p/shrexeds): return correct err on context error (#2177)

a57ad56

Resolves item 7 on #2176

renaynay changed the title ~~bug: shrex-eds error handling~~ share/p2p/shrexeds: Error handling May 10, 2023

renaynay mentioned this issue May 10, 2023

[EPIC] shrex stability + performance #2106

Open

distractedm1nd self-assigned this May 10, 2023

Wondertan closed this as completed Jun 5, 2023

bert2002 mentioned this issue Jun 6, 2023

node: failed to start: header/p2p: failed to open a new stream: protocols not supported: [/mocha/header-ex/v0.0.3] #2327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

share/p2p/shrexeds: Error handling #2176

share/p2p/shrexeds: Error handling #2176

distractedm1nd commented May 9, 2023 •

edited

Loading

walldiss commented May 13, 2023

walldiss commented May 15, 2023 •

edited

Loading

distractedm1nd commented May 15, 2023

walldiss commented May 15, 2023

walldiss commented May 15, 2023

Wondertan commented May 15, 2023

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

walldiss commented May 15, 2023

walldiss commented May 15, 2023

Wondertan commented May 15, 2023 •

edited

Loading

bert2002 commented Jun 5, 2023

Wondertan commented Jun 5, 2023 •

edited

Loading

Wondertan commented Jun 5, 2023

share/p2p/shrexeds: Error handling #2176

share/p2p/shrexeds: Error handling #2176

Comments

distractedm1nd commented May 9, 2023 • edited Loading

walldiss commented May 13, 2023

walldiss commented May 15, 2023 • edited Loading

distractedm1nd commented May 15, 2023

walldiss commented May 15, 2023

walldiss commented May 15, 2023

Wondertan commented May 15, 2023

Wondertan commented May 15, 2023 • edited Loading

Wondertan commented May 15, 2023 • edited Loading

Wondertan commented May 15, 2023 • edited Loading

walldiss commented May 15, 2023

walldiss commented May 15, 2023

Wondertan commented May 15, 2023 • edited Loading

bert2002 commented Jun 5, 2023

Wondertan commented Jun 5, 2023 • edited Loading

Wondertan commented Jun 5, 2023

distractedm1nd commented May 9, 2023 •

edited

Loading

walldiss commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented May 15, 2023 •

edited

Loading

Wondertan commented Jun 5, 2023 •

edited

Loading