Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

share/p2p/shrexeds: Error handling #2176

Closed
Tracked by #2106
distractedm1nd opened this issue May 9, 2023 · 15 comments
Closed
Tracked by #2106

share/p2p/shrexeds: Error handling #2176

distractedm1nd opened this issue May 9, 2023 · 15 comments
Assignees
Labels
area:shares Shares and samples bug Something isn't working

Comments

@distractedm1nd
Copy link
Collaborator

distractedm1nd commented May 9, 2023

A lot of new errors are appearing client side against BSR.

  1. Application Error times out after 60s
2023-05-09T15:30:49.015+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{"hash": "FB78B3FEC22B77F98C4A23BB27EB557E202172AD74A018800CB9A09299A99EC8", "peer": "12D3KooWBc9nKfmKJZKqkto9UHvQce1ZZZy3MoNdZzDQpgi2SFkD", "attempt": 1, "err": "failed to read status from stream: Application error 0x0 (remote)", "finished (s)": 57.119983625}
  1. Failed to read status from stream: deadline exceeded after 60s
  • Why does it happen at this point? If ratelimited, the err should be io.EOF. The deadline should be smaller here
2023-05-09T16:14:34.920+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{"peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "error": "failed to read status from stream: deadline exceeded"}
2023-05-09T16:14:34.920+0200	DEBUG	share/getters	getters/shrex.go:172	eds: request failed	{"hash": "9EB61119363E061DF7525B5DDEC77D13D63B36C4D479DBF29C4A2E3926148165", "peer": "12D3KooWFC3v38kFFJYWVNjR92iX962YpSVpAbGsDwYcgVqvTWk8", "attempt": 1, "err": "context deadline exceeded", "finished (s)": 60.000769791}
  1. Somehow we get a peer that doesnt support the protocol (almost immediate error)
  • How is this possible with our current nodes?
  • We should probably just blacklist
2023-05-09T15:24:15.781+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “error”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“}
2023-05-09T15:24:15.781+0200	DEBUG	share/getters	getters/shrex.go:171	eds: request failed	{“hash”: “2C159F3964034F0DE630D9309735AD80E267E012C7BA271A6D93B17E610A444E”, “peer”: “12D3KooWCV61dtrSkykyEJ4Z8i6LTgvYhaWVuEjChs8Crk4phLXJ”, “attempt”: 3, “err”: “failed to open stream: protocols not supported: [/blockspacerace-0/shrex/eds/v0.0.1]“, “finished (s)“: 0.019803125}
  1. Stateless reset with token after 22s
  • Has something to do w QUIC
2023-05-09T15:02:30.163+0200	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{“peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729"}
2023-05-09T15:02:30.163+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “D7A4B103159BCE6FC9AD0B3E8995F1932B2F9C76357BFD1E886298119E4D12E1", “peer”: “12D3KooWD4uxio9FnK2AUBoydXo5LqiiXbC7QZR658wUt6Hf5xfD”, “attempt”: 1, “err”: “failed to read status from stream: received a stateless reset with token a7b84d1da38f64eec20c9e2b8b395729”, “finished (s)“: 22.121177}
  1. Failed to open stream: context deadline exceeded (after 60s)
  • We should not let the opening of the stream take more than maybe 5 seconds.
2023-05-09T14:32:34.940+0200	DEBUG	shrex/eds	shrexeds/client.go:57	client: eds request to peer failed	{“peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “error”: “failed to open stream: context deadline exceeded”}
2023-05-09T14:32:34.940+0200	DEBUG	share/getters	getters/shrex.go:170	eds: request failed	{“hash”: “C71DA660939D4B174129A6F5839DB73B9F353544365C5EEADF2C7F7324F4C38A”, “peer”: “12D3KooWBbc9ixMAH4SzF3EH6SwSfo8gRW1bVrDUX19oxC4DaRbz”, “attempt”: 1, “err”: “context deadline exceeded”, “finished (s)“: 60.000322042}

7. RequestEDS checks child context for ctx error, then returns parent context error, leading to returning of nil, nil and causing panic Resolved

  1. "failed to read eds from ods bytes: share: reading next car entry: stream reset"
2023-05-13T17:15:36.413+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "80730570DB5B2F39DCC2BE8091A3322E3B0B404ABAA2EC22F570E9615A60161C", "err": "failed to read eds from ods bytes: share: reading next car entry: stream reset"}
  1. Not an actual error, but server node could always return ErrNotFound and not get any punishment

No recent network activity.

  • Happens for multiple peers at the same time, so probably related to client connectivity issues. Needs to be handled accordingly (no remote peer punishment)
2023-05-13T20:03:30.190+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBASepmTQtATM196L1wWH6kVd4TBgpUN57koyfjT2vheX", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "3F2C7DE3EB02FE219BF7CE931030E6E79C0E687000187B5FF456DEE2D9FB010D", "err": "failed to read eds from ods bytes: share: reading next car entry: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "A5723C5C033FB1F6A54BE906639C901FBC3C01278F5EDD8C26CB651BB6D96997", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "76044121A20F6AB9FAE153F52CE671C4363385478FC1A12BF8717113F1A96F6A", "err": "failed to read status from stream: timeout: no recent network activity"}
  1. Failed to open stream: failed to dial
  • Happens right after 10, so must be related to connectivity issue on client side.
  • Needs to be handled too
2023-05-13T21:03:59.627+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "83676B211F75ECC843F411D3E2A57B8BD368DD1B8786CB8A73AFDFB08FA1E45B", "err": "failed to open stream: failed to dial 12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng:\n  * [/ip4/68.183.28.164/tcp/2121] dial tcp4 0.0.0.0:2121->68.183.28.164:2121: i/o timeout\n  * [/ip4/68.183.28.164/udp/2121/quic-v1] timeout: no recent network activity"}
  1. Stream reset on reading response status.
  • The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.
2023-05-13T21:03:02.747+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBTFMz9doATz6gxb9VUZ5d2iJ1udBfJCkho6ixgeu6nhE", "hash": "4EA6E0541DAEF1C7206EC5B5CA9C00F6546F6E32E12D6711F6B4C60E69BD3C8D", "err": "failed to read status from stream: stream reset"}
  1. Routing not found
  • It is extremely unlikely that peer has disconnected at the point of shrexeds request. The fact that there are 3 peers disconnected in short sequence requires investigation.
2023-05-13T21:03:08.633+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "9F78CE8A2E753B78A4380C530F4AA8B6F8125CC38F38EE9BDDAD8932E083672C", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:20.434+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "4657B89D6D7618A6928B1612A26DB9D03BA3D6182A4388D4A0165DDD759C0FB1", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:42.354+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBCfAdTYhtpuzbKqnc5WM3qh7Zkp3DmjrbAwgTsgiKQPW", "hash": "B366AFE879E7979DF8A132A3D3E5833F48FCF7F9434F6C4E4AE81CC0B1F7277C", "err": "failed to open stream: failed to find peers: routing: not found"}
@distractedm1nd distractedm1nd added bug Something isn't working area:shares Shares and samples labels May 9, 2023
@renaynay renaynay changed the title bug: shrex-eds error handling share/p2p/shrexeds: Error handling May 10, 2023
@distractedm1nd distractedm1nd self-assigned this May 10, 2023
@walldiss
Copy link
Member

Added 3 more possible errors and edge case that needs handling for edsclient

@walldiss
Copy link
Member

walldiss commented May 15, 2023

  1. Application Error times out after 60s

From libp2p docs:

``{12D... Application error 0x0: conn-27571160: system: cannot reserve inbound connection: resource limit exceeded}```

This error occurred when a connection had been made with a remote peer but the remote peer dropped the connection (due to it exceeding limits).
This was actually an `Application error` emitted by `quic-go` and it was a bug in go-libp2p that we sent the whole message.
For now, we decided to stop sending this confusing error message. In the future, we will report such errors via [error codes](https://github.com/libp2p/specs/issues/479).

Action plan:

  • Need to check if it is possible to make libp2p gracefully drop active connections, without interrupting active transfer.
  • Check connection stats for full nodes on bsr and revise connection limit.
  • The reason for error could be too many peers trying to sample from the same full node after shrex.Sub msg. Need to check if error happens only for recent headers sampling or for historic too.

@distractedm1nd
Copy link
Collaborator Author

On the client side, it sounds like Application Error should be handled as a RateLimit/NotFound request, and be scheduled for a retry (so just cooldown the peer)

@walldiss
Copy link
Member

No recent network activity.

  • Happens for multiple peers at the same time, so probably related to client connectivity issues. Needs to be handled accordingly (no remote peer punishment)
2023-05-13T20:03:30.190+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBASepmTQtATM196L1wWH6kVd4TBgpUN57koyfjT2vheX", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "91DFD05708F4D96E2C1B13B040C07C962830D970922E8192423DF4F579B80EFC", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "3F2C7DE3EB02FE219BF7CE931030E6E79C0E687000187B5FF456DEE2D9FB010D", "err": "failed to read eds from ods bytes: share: reading next car entry: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "A5723C5C033FB1F6A54BE906639C901FBC3C01278F5EDD8C26CB651BB6D96997", "err": "failed to read status from stream: timeout: no recent network activity"}
2023-05-13T20:03:30.212+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWB7GvRuTWm6DUd4pt5VTRfYfciWKLkUpCKrCtJnGNKEv7", "hash": "76044121A20F6AB9FAE153F52CE671C4363385478FC1A12BF8717113F1A96F6A", "err": "failed to read status from stream: timeout: no recent network activity"}
  1. Failed to open stream: failed to dial
  • Happens right after 10, so must be related to connectivity issue on client side.
  • Needs to be handled too
2023-05-13T21:03:59.627+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "83676B211F75ECC843F411D3E2A57B8BD368DD1B8786CB8A73AFDFB08FA1E45B", "err": "failed to open stream: failed to dial 12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng:\n  * [/ip4/68.183.28.164/tcp/2121] dial tcp4 0.0.0.0:2121->68.183.28.164:2121: i/o timeout\n  * [/ip4/68.183.28.164/udp/2121/quic-v1] timeout: no recent network activity"}
  1. Stream reset on reading response status.
  • The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.
2023-05-13T21:03:02.747+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBTFMz9doATz6gxb9VUZ5d2iJ1udBfJCkho6ixgeu6nhE", "hash": "4EA6E0541DAEF1C7206EC5B5CA9C00F6546F6E32E12D6711F6B4C60E69BD3C8D", "err": "failed to read status from stream: stream reset"}
  1. Routing not found
  • It is extremely unlikely that peer has disconnected at the point of shrexeds request. The fact that there are 3 peers disconnected in short sequence requires investigation.
2023-05-13T21:03:08.633+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "9F78CE8A2E753B78A4380C530F4AA8B6F8125CC38F38EE9BDDAD8932E083672C", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:20.434+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWCAMGzWrcnbPradgkJEe2yTEWNYi5LPM4YqohDxDdE9Ng", "hash": "4657B89D6D7618A6928B1612A26DB9D03BA3D6182A4388D4A0165DDD759C0FB1", "err": "failed to open stream: failed to find peers: routing: not found"}
2023-05-13T21:03:42.354+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWBCfAdTYhtpuzbKqnc5WM3qh7Zkp3DmjrbAwgTsgiKQPW", "hash": "B366AFE879E7979DF8A132A3D3E5833F48FCF7F9434F6C4E4AE81CC0B1F7277C", "err": "failed to open stream: failed to find peers: routing: not found"}

@walldiss
Copy link
Member

  1. Stateless reset with token after 22s

Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.

2023-05-14T00:34:49.049+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWJkrQ6tRXik4gyN6A6BsKd6UsMhMDH2KLXnj6sGXHJJn2", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token bd133665910f96aa331ea6e530ec4c30"}
2023-05-14T00:34:49.262+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWEkFUj5rrWoeQL9stJR6AkC2F1MBT4MUeiqaoi8ZUAADA", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token a2ff5632a83e43d3ca13a3499dcd9704"}
2023-05-14T00:34:49.499+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWDWJUWz3zRfB2bfXJxK7WPSM9QsDDPeME1BYJqq2h9oTh", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token eda08205ebeb0307b4104d76523c8477"}
2023-05-14T00:34:49.558+0300	WARN	shrex/eds	shrexeds/client.go:72	client: eds request to peer failed	{"peer": "12D3KooWHB32cdsPLemcuJdjyLUJdWqp1R4GJK6s2ueqJvfxobdg", "hash": "EBD6143BDA97E7AB966E959D95CFCBE234AF7B48CF65BDFDEC2423EE90D4192F", "err": "failed to read status from stream: received a stateless reset with token b2f223721890f860f6fcd2677df4d9ad"}

@Wondertan
Copy link
Member

@walldiss

Need to check if it is possible to make libp2p gracefully drop active connections, without interrupting active transfer.

It's not and there is not so much value in doing so.

Check connection stats for full nodes on bsr and revise connection limit.

FN/BNs have unlimited resources. The limits are only on LN.

The reason for error could be too many peers trying to sample from the same full node after shrex.Sub msg. Need to check if error happens only for recent headers sampling or for historic too.

FN does not have limits. This error is coming from remote LN.

We should not be doing anything with this error and simply ignore it and handle with general error handling flow.

@Wondertan
Copy link
Member

Wondertan commented May 15, 2023

  1. No recent network activity.

No traffic was over the connection for over 30secs. Might be a connection that was ungracefully dropped, e.g. application was killed without telling the remote. Should go over general error-handling flow as well.

  1. Failed to open stream: failed to dial
    Happens right after 10, so must be related to connectivity issue on client side.

Or the server side

Needs to be handled too

It does not need a special error case and can go over the general error path.

  1. Stream reset on reading response status.
    The only reason for stream reset at this stage is if request was invalid. But since running client works fine with majority of peers, it seems to be server issue.

There are other cases when this can happen. The client can gracefully stop their node while doing a request which results in a stream reset. It's unlikely to be a bug

@Wondertan
Copy link
Member

Wondertan commented May 15, 2023

To summarize. Most of the errors above are unrelated to shrex. Similarly, shrex should not be responsible for handling them. They are various manifestations of issues on the remote side, which shrex can do nothing about. It should just ignore this peer, let it go and move on.

However, I really appreciate the effort here to document each.

@Wondertan
Copy link
Member

Wondertan commented May 15, 2023

Stateless reset with token after 22s

Happens for multiple peers at the same time. Probably related to closing connections over the limit on libp2p level.

Do we know when this will happen? AFAUR, this can happen when you kill all the connections(reboot) and reconnect to the same peers, which has not yet figured out that the old connection was dead.

EDIT: quic-go/quic-go#2836 (comment), seems true

@walldiss
Copy link
Member

FN does not have limits. This error is coming from remote LN.

Light node don't serve eds via shrex server. So the error happens when FN / BN is reseting the stream on server side

@walldiss
Copy link
Member

It's not and there is not so much value in doing so.

If we can prevent them to happen by properly configuring networking stack it would benefit shrex stability. Why do you think there is no much value to fix this error case?

@Wondertan
Copy link
Member

Wondertan commented May 15, 2023

If we can prevent them to happen by properly configuring networking stack it would benefit shrex stability. Why do you think there is no much value to fix this error case?

I mean, there is not so much value in trying to save active transfers.

@bert2002
Copy link

bert2002 commented Jun 5, 2023

Hi,
we are seeing a similar or same issue with v0.10.3 on mocha. We run it in docker with

celestia full start --core.ip rpc-mocha.pops.one --gateway --gateway.addr 0.0.0.0 --gateway.port 26659 --p2p.network mocha

and experience following issue:

Jun 05 11:32:59 node systemd[1]: Started celestia.service.
Jun 05 11:32:59 node celestia[32404]:  Network celestia_default  Creating
Jun 05 11:33:00 node celestia[32404]:  Network celestia_default  Created
Jun 05 11:33:00 node celestia[32404]:  Container celestia-celestia-1  Creating
Jun 05 11:33:00 node celestia[32404]:  Container celestia-celestia-1  Created
Jun 05 11:33:00 node celestia[32404]: Attaching to celestia-celestia-1
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.332Z        INFO        node        nodebuilder/module.go:22        Accessing keyring...
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.335Z        WARN        module/gateway        gateway/module.go:25        Gateway is enabled, however gateway endpoints are not authenticated. Use with caution!
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.354Z        INFO        badger        [email protected]/levels.go:183        All 0 tables opened in 0s
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  |
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023/06/05 11:33:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.603Z        INFO        module/header        header/config.go:58        No trusted peers in config, initializing with default bootstrappers as trusted peers
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        header/p2p        p2p/exchange.go:84        client: starting client        {"protocol ID": "/mocha/header-ex/v0.0.3"}
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        fraudserv        fraudserv/helpers.go:29        joining topic        {"id": "/mocha/fraud-sub/badencoding/v0.0.1"}
Jun 05 11:33:01 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:01.615Z        INFO        fraudserv        fraudserv/service.go:97        starting fraud proof service        {"protocol ID": "/mocha/fraud/v0.0.1"}
Jun 05 11:33:02 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:02.608Z        INFO        share/discovery        discovery/discovery.go:254        discovering peers        {"want": 5}
Jun 05 11:33:04 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:04.584Z        INFO        share/discovery        discovery/discovery.go:300        discovered wanted peers        {"amount": 5}
Jun 05 11:33:04 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:04.990Z        INFO        share/discovery        discovery/discovery.go:300        discovered wanted peers        {"amount": 6}
Jun 05 11:33:06 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:06.584Z        INFO        basichost        basic/natmgr.go:93        DiscoverNAT error:no NAT found
Jun 05 11:33:07 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:07.010Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWJPvkxWQzJYnaiVQMgrXTuJzvExvhQRbwuQg5aRdKT4yA addr=/ip4/85.239.239.131/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:33:08 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:08.281Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWGpNJx4iSahk1HC34qj7DLtH2yn3KaaTSnwNCstMk3jiA addr=/ip4/89.117.57.31/udp/2121/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:33:13 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:33:13.502Z        INFO        canonical-log        swarm/swarm_dial.go:500        CANONICAL_PEER_STATUS: peer=12D3KooWEvcLVzhzBbegE8ot8LsA514vXjrkzDiNWq1mVbysXZkv addr=/ip4/84.115.217.205/udp/45637/quic-v1 sample_rate=100 connection_status="established" dir="outbound"
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:34:09.139Z        INFO        dagstore        [email protected]/dagstore_control.go:48        dagstore closed
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | 2023-06-05T11:34:09.146Z        INFO        badger        [email protected]/db.go:1027        Storing value log head: {Fid:0 Len:29 Offset:1650}
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  |
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1  | Error: node: failed to start: header/p2p: failed to open a new stream: protocols not supported: [/mocha/header-ex/v0.0.3]
Jun 05 11:34:09 node celestia[32404]: celestia-celestia-1 exited with code 0
Jun 05 11:34:09 node celestia[32785]:  Container celestia-celestia-1  Stopping
Jun 05 11:34:09 node celestia[32785]:  Container celestia-celestia-1  Stopping
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Stopped
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Removing
Jun 05 11:34:20 node celestia[32785]:  Container celestia-celestia-1  Removed
Jun 05 11:34:20 node celestia[32785]:  Network celestia_default  Removing
Jun 05 11:34:20 node celestia[32785]:  Network celestia_default  Removed
Jun 05 11:34:20 node systemd[1]: celestia.service: Succeeded.

This is causing the container to stop and needs to be restarted (though same problem happens again).

Any advice or more information are needed?

Thanks,
bert

@Wondertan
Copy link
Member

Wondertan commented Jun 5, 2023

Hey @bert2002, please open a separate issue for the issue. The good news is that we know what went wrong

@Wondertan
Copy link
Member

I am closing this one is resolved and in favor of #2234

If you (@distractedm1nd and @walldiss) think there are essential errors here that were not resolved, let's move them into a separate issue. This one has lots of unrelated context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:shares Shares and samples bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants