Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(wallet): Ignore chain down notifications when connection is lost #5820

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

friofry
Copy link
Contributor

@friofry friofry commented Sep 10, 2024

Description.

  • Added more fine-grained error handling to detect connection problems in network_errors.go (+tests)
  • market.go and chain/client.go will not trigger a notification event if there is a connection error.

Future improvements:.

  • Flatten the provider list. Currently there is a fallback mechanism.
  • Remove rate limiter from rpc client
  • Probably refactor market.go similar to chain/client.go. (Use WalletNotifier in market.go)

fixes status-im/status-mobile#21071
refs status-im/status-mobile#21056
refs status-im/status-mobile#20736

@status-im-auto
Copy link
Member

status-im-auto commented Sep 10, 2024

Jenkins Builds

Click to see older builds (85)
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ 1caf1ad #1 2024-09-10 00:50:00 ~1 min tests 📄log
✔️ 1caf1ad #1 2024-09-10 00:51:14 ~2 min tests-rpc 📄log
✔️ 1caf1ad #1 2024-09-10 00:53:25 ~4 min linux 📦zip
✔️ 1caf1ad #1 2024-09-10 00:54:43 ~5 min android 📦aar
✔️ 1caf1ad #1 2024-09-10 00:56:20 ~7 min ios 📦zip
✔️ 0de4099 #2 2024-09-10 00:59:12 ~1 min android 📦aar
✔️ 0de4099 #2 2024-09-10 00:59:50 ~1 min linux 📦zip
✔️ 0de4099 #2 2024-09-10 01:00:11 ~2 min tests-rpc 📄log
✖️ 0de4099 #2 2024-09-10 01:00:13 ~2 min tests 📄log
✔️ 0de4099 #2 2024-09-10 01:02:54 ~5 min ios 📦zip
✔️ 44159b1 #3 2024-09-10 08:44:01 ~2 min tests-rpc 📄log
✔️ 44159b1 #3 2024-09-10 08:45:41 ~4 min linux 📦zip
✔️ 44159b1 #3 2024-09-10 08:46:37 ~5 min ios 📦zip
✔️ 44159b1 #3 2024-09-10 08:46:57 ~5 min android 📦aar
✔️ 44159b1 #3 2024-09-10 09:14:28 ~32 min tests 📄log
✔️ 932744e #4 2024-09-16 10:52:37 ~2 min tests-rpc 📄log
✔️ 932744e #4 2024-09-16 10:53:30 ~3 min ios 📦zip
✔️ 932744e #4 2024-09-16 10:53:52 ~3 min linux 📦zip
✔️ 932744e #4 2024-09-16 10:54:26 ~4 min android 📦aar
✖️ 932744e #4 2024-09-16 11:21:56 ~31 min tests 📄log
✔️ 24fe8af #5 2024-09-16 12:51:03 ~1 min android 📦aar
✔️ 24fe8af #5 2024-09-16 12:51:28 ~2 min linux 📦zip
✔️ 24fe8af #5 2024-09-16 12:51:37 ~2 min tests-rpc 📄log
✔️ 24fe8af #5 2024-09-16 12:52:07 ~2 min ios 📦zip
✖️ 24fe8af #5 2024-09-16 13:20:09 ~30 min tests 📄log
✔️ bb304ae #6 2024-09-17 11:56:43 ~1 min android 📦aar
✔️ bb304ae #6 2024-09-17 11:57:14 ~1 min linux 📦zip
✔️ bb304ae #6 2024-09-17 11:57:35 ~2 min tests-rpc 📄log
✔️ bb304ae #6 2024-09-17 11:59:15 ~4 min ios 📦zip
✔️ bb304ae #6 2024-09-17 12:27:21 ~32 min tests 📄log
✖️ b201177 #7 2024-09-24 13:39:16 ~50 sec tests 📄log
b201177 #7 2024-09-24 13:39:25 ~1 min ios 📄log
✖️ b201177 #7 2024-09-24 13:39:36 ~1 min tests-rpc 📄log
b201177 #7 2024-09-24 13:39:43 ~1 min android 📄log
b201177 #7 2024-09-24 13:39:54 ~1 min linux 📄log
5960c4a #8 2024-09-25 21:16:29 ~35 sec linux 📄log
5960c4a #8 2024-09-25 21:16:56 ~1 min android 📄log
5960c4a #8 2024-09-25 21:17:07 ~1 min ios 📄log
✖️ 5960c4a #8 2024-09-25 21:17:07 ~1 min tests-rpc 📄log
✖️ 5960c4a #8 2024-09-25 21:18:15 ~2 min tests 📄log
c6cfa03 #9 2024-09-26 07:19:15 ~36 sec android 📄log
c6cfa03 #9 2024-09-26 07:19:30 ~41 sec linux 📄log
c6cfa03 #9 2024-09-26 07:19:30 ~50 sec ios 📄log
✖️ c6cfa03 #9 2024-09-26 07:19:48 ~1 min tests-rpc 📄log
✖️ c6cfa03 #9 2024-09-26 07:21:07 ~2 min tests 📄log
✖️ 54276ee #10 2024-09-26 13:02:39 ~1 min tests 📄log
✔️ 54276ee #10 2024-09-26 13:03:22 ~2 min tests-rpc 📄log
✔️ 54276ee #10 2024-09-26 13:04:19 ~3 min linux 📦zip
✔️ 54276ee #10 2024-09-26 13:04:29 ~3 min ios 📦zip
✔️ 54276ee #10 2024-09-26 13:04:59 ~4 min android 📦aar
✖️ 3c09987 #11 2024-09-30 10:03:59 ~1 min tests 📄log
✔️ 3c09987 #11 2024-09-30 10:04:07 ~1 min android 📦aar
✔️ 3c09987 #11 2024-09-30 10:04:47 ~2 min tests-rpc 📄log
✔️ 3c09987 #11 2024-09-30 10:04:47 ~2 min linux 📦zip
✔️ 3c09987 #11 2024-09-30 10:05:53 ~3 min ios 📦zip
ff86a39 #12 2024-10-04 17:29:44 ~50 sec ios 📄log
ff86a39 #12 2024-10-04 17:29:52 ~1 min android 📄log
ff86a39 #12 2024-10-04 17:30:05 ~1 min linux 📄log
✖️ ff86a39 #12 2024-10-04 17:30:14 ~1 min tests 📄log
✖️ ff86a39 #12 2024-10-04 17:30:48 ~1 min tests-rpc 📄log
1f07487 #13 2024-10-04 17:46:32 ~41 sec android 📄log
1f07487 #13 2024-10-04 17:46:39 ~47 sec linux 📄log
✖️ 1f07487 #13 2024-10-04 17:47:08 ~1 min tests 📄log
1f07487 #13 2024-10-04 17:47:38 ~1 min ios 📄log
✖️ 1f07487 #13 2024-10-04 17:47:47 ~1 min tests-rpc 📄log
7f35f91 #14 2024-10-04 17:57:42 ~39 sec ios 📄log
7f35f91 #14 2024-10-04 17:57:42 ~40 sec android 📄log
✖️ 7f35f91 #14 2024-10-04 17:58:32 ~1 min tests 📄log
7f35f91 #14 2024-10-04 17:58:59 ~1 min linux 📄log
✖️ 7f35f91 #14 2024-10-04 17:59:07 ~1 min tests-rpc 📄log
a446459 #15 2024-10-06 21:40:56 ~42 sec ios 📄log
a446459 #15 2024-10-06 21:41:14 ~46 sec android 📄log
a446459 #15 2024-10-06 21:41:14 ~51 sec linux 📄log
✖️ a446459 #15 2024-10-06 21:41:26 ~1 min tests-rpc 📄log
✖️ a446459 #15 2024-10-06 21:41:33 ~1 min tests 📄log
6625522 #16 2024-10-06 21:57:04 ~39 sec ios 📄log
6625522 #16 2024-10-06 21:57:17 ~43 sec android 📄log
6625522 #16 2024-10-06 21:57:17 ~48 sec linux 📄log
✖️ 6625522 #16 2024-10-06 21:57:30 ~1 min tests-rpc 📄log
✖️ 6625522 #16 2024-10-06 21:57:40 ~1 min tests 📄log
3b2948a #17 2024-10-06 22:17:37 ~38 sec ios 📄log
3b2948a #17 2024-10-06 22:17:50 ~42 sec android 📄log
3b2948a #17 2024-10-06 22:17:50 ~46 sec linux 📄log
✖️ 3b2948a #17 2024-10-06 22:18:08 ~1 min tests-rpc 📄log
✖️ 3b2948a #17 2024-10-06 22:18:13 ~1 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ f4143cd #18 2024-10-09 09:48:09 ~1 min tests 📄log
✔️ f4143cd #18 2024-10-09 09:49:51 ~3 min ios 📦zip
✔️ f4143cd #18 2024-10-09 09:49:58 ~3 min tests-rpc 📄log
✔️ f4143cd #18 2024-10-09 09:50:58 ~4 min linux 📦zip
✔️ f4143cd #18 2024-10-09 09:51:17 ~5 min android 📦aar
✔️ cd6dc2b #19 2024-10-09 10:00:20 ~5 min ios 📦zip
✔️ cd6dc2b #19 2024-10-09 10:00:59 ~5 min android 📦aar
✔️ cd6dc2b #19 2024-10-09 10:01:40 ~6 min tests-rpc 📄log
✔️ cd6dc2b #19 2024-10-09 10:02:35 ~7 min linux 📦zip
✖️ cd6dc2b #19 2024-10-09 10:04:11 ~8 min tests 📄log

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch from 1caf1ad to 0de4099 Compare September 10, 2024 00:57
Copy link
Collaborator

@alaibe alaibe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: This only work as we don't close all the circuit

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch from 0de4099 to 44159b1 Compare September 10, 2024 08:41
Message: message,
At: time.Now().Unix(),
})
if pm.onlineChecker == nil || pm.onlineChecker.Online() {
Copy link
Contributor

@dlipicar dlipicar Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not really sure about this change...

  1. I don't think we've got a guarantee that the onlinechecker.Online() state will change before the providers are detected as offline/offline. This extra check will potentially inhibit the last provider down notifications and the first providers up ones.
  2. The providers ARE indeed down from the client's perspective (simply due to the fact that there's no internet connection), shouldn't the client decide what to do with that piece of info instead of not communicating the individual states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Agree, the solution is not robust at all. The peer count only becomes zero after a timeout (but after the connection is re-established, it becomes greater than zero quite quickly).

  2. Users still see the "please check your internet connection" banner. So the chain-down notifications might be excessive.

if 2 is OK. I see two options here:

  • Run a background loop that accesses the provider's server with a delay (maybe with a fake key).
  • Simply skip notifications for net.Error, net.DNSError, net.OpError, tls.RecordHeaderError, context.DeadlineExceeded, http.ErrServerClosed and "connection refused", "i/o timeout" in toggleConnectionState.

and I'm leaning towards the 2nd, I've done a few tests on desktop and it seems to work good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can move this logic to the client side. I can add errors (raw or as an enum) to the "down" status, so the client can decide if it wants to display the notification. I would suggest doing this as a separate task.

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch 2 times, most recently from 932744e to 24fe8af Compare September 16, 2024 12:49
@friofry friofry removed the draft label Sep 16, 2024
@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch from 24fe8af to bb304ae Compare September 17, 2024 11:54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add this to the mock section of Makefile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to Makefile

if !isNotFoundError(err) && !isVMError(err) && !errors.Is(err, ErrRequestsOverLimit) && !errors.Is(err, context.Canceled) {
if network_utils.IsConnectionError(err) {
connected = false
skipNotification = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not happy about not sending the notifications in this case...
Example, some providers could be inaccessible due to firewall rules, while others could be working just fine. The client should be able to say "hey, you cannot access these providers so you cannot do this and this operations, but you can do these other ones for now".

What I would do, if you feel it's reasonable, is add some enum field to the notification with more detail about the cause for the "provider down" event. One of the values could be "Network Down" which I feel would have the same value as the skipNotification flag here. Then, if the Mobile client wants to not raise banners when that's the reason, you guys can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned here #5820 (comment)

What do you think about making it a separate task?
Because it will require some clarification from the UI team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, perhaps I'm being stubborn and I don't want to block you unnecessarily so I'd like to hear some other folks' opinions.

The way I see it right now, this could happen after these changes:

  • Chain providers are down for whatever reason (firewall, providers are really down, etc), so we cannot refresh balances. We inhibit the notification.
  • The completely independent "Internet connection down" state (I think it's tied to the number of Waku peers being 0 or something) does not get triggered
  • The client shows no banner about providers being down, internet connection being down or balances being out of date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion on the status-go guild, we agreed to add a field to the wallet notification event to determine the type of error to avoid breaking desktop logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some findings while I was working on this task #5854

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch 2 times, most recently from 5960c4a to c6cfa03 Compare September 26, 2024 07:18
@friofry
Copy link
Contributor Author

friofry commented Sep 26, 2024

Summary of discussion with @dlipicar

  • Extract web3 errors from network_errors.go into a separate file
  • Generalise the health-manager provider to use it for collectibles and market
    • Move health-manager up one level.
    • Only handle basic http errors in determineErrorType. And add advanced error handling for blockchain, market providers
  • Change the provider status aggregation logic. Up - if any up, Down - all down, Unknown - some down, some unknown
  • Remove DeadlineExceeded from IsConnectionError
  • Add Context to all selects
  • Cleanup code (defer mutex.unlock)
  • Method to rebuild health manager to reset stats
    • Subscribe to testnet switch events and reset state
  • Double check slice implementation in Unsubscribe
  • Mark old event as obsolete
  • Send event directly from rpc/client.go
  • Add Stop method to rpc/client.go
  • Move health-manager to separate PR and do integration separately

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch 7 times, most recently from 6625522 to 3b2948a Compare October 6, 2024 22:16
@friofry
Copy link
Contributor Author

friofry commented Oct 7, 2024

added a PR for a health-manager: #5924

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch from 3b2948a to f4143cd Compare October 9, 2024 09:46
Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 66.42202% with 183 lines in your changes missing coverage. Please review.

Project coverage is 10.51%. Comparing base (55bad8f) to head (cd6dc2b).
Report is 10 commits behind head on develop.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
healthmanager/provider_errors/provider_errors.go 37.96% 35 Missing and 32 partials ⚠️
healthmanager/blockchain_health_manager.go 77.19% 25 Missing and 1 partial ⚠️
rpc/client.go 67.50% 18 Missing and 8 partials ⚠️
...althmanager/provider_errors/rpc_provider_errors.go 54.28% 13 Missing and 3 partials ⚠️
healthmanager/providers_health_manager.go 79.03% 13 Missing ⚠️
healthmanager/aggregator/aggregator.go 83.56% 10 Missing and 2 partials ⚠️
healthmanager/rpcstatus/provider_status.go 50.00% 11 Missing ⚠️
circuitbreaker/circuit_breaker.go 69.56% 6 Missing and 1 partial ⚠️
rpc/chain/client.go 70.58% 3 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (55bad8f) and HEAD (cd6dc2b). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (55bad8f) HEAD (cd6dc2b)
unit 1 0
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #5820       +/-   ##
============================================
- Coverage    47.02%   10.51%   -36.51%     
============================================
  Files          832      832               
  Lines       137725   136989      -736     
============================================
- Hits         64763    14404    -50359     
- Misses       65429   120728    +55299     
+ Partials      7533     1857     -5676     
Flag Coverage Δ
functional 10.51% <66.42%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
node/get_status_node.go 37.01% <100.00%> (-12.84%) ⬇️
services/wallet/service.go 97.41% <ø> (+1.93%) ⬆️
rpc/chain/client.go 33.33% <70.58%> (+9.14%) ⬆️
circuitbreaker/circuit_breaker.go 67.74% <69.56%> (-27.07%) ⬇️
healthmanager/rpcstatus/provider_status.go 50.00% <50.00%> (ø)
healthmanager/aggregator/aggregator.go 83.56% <83.56%> (ø)
healthmanager/providers_health_manager.go 79.03% <79.03%> (ø)
...althmanager/provider_errors/rpc_provider_errors.go 54.28% <54.28%> (ø)
healthmanager/blockchain_health_manager.go 77.19% <77.19%> (ø)
rpc/client.go 52.14% <67.50%> (-14.53%) ⬇️
... and 1 more

... and 642 files with indirect coverage changes

@friofry friofry force-pushed the ab/issue-21071-chain-down-notifications branch from f4143cd to cd6dc2b Compare October 9, 2024 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⛓️ ⬇️ Investigate Chain Down Issues
4 participants