Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect Websocket is connected or disconnected #20

Closed
devopsokdone opened this issue Sep 25, 2020 · 15 comments
Closed

Detect Websocket is connected or disconnected #20

devopsokdone opened this issue Sep 25, 2020 · 15 comments
Assignees
Labels
status: done This issue is considered resolved.

Comments

@devopsokdone
Copy link

We have upgraded the SDK to v 2.0.1 and imported the networking module with retry policy while making the connection as explained in response to question in issue 19.

But if the app is open and long poll interval is over the Websocket disconnects which is correct also sometimes it disconnects if we switch between apps or when network comes back after brief disconnection.

Our question is, is there a way to detect if socket connection is connected, connecting or disconnected is there a way to check and trigger reconnection if socket is disconnected.

Currently what we do is dispose and subscribe whenever app comes in foreground, is restarted or internet state changes but for long poll disconnection we don't have a way to handle as well as we don't want to do the unnecessary reconnection if the connection is already alive.

@are are self-assigned this Sep 25, 2020
@are are added status: in progress This issue is being worked on. type: feature request labels Sep 25, 2020
@are
Copy link
Contributor

are commented Sep 25, 2020

Hi there! I've released a RC version, can you please test it out and let me know if this fixes your issue?

To use the RC version, you can try changing the pubnub version to 3.0.0-rc like so:

  pubnub: 3.0.0-rc

Alternatively, change your pubnub dependency in pubspec.yaml to the following to use latest github version:

  pubnub:
    git:
      url: [email protected]:pubnub/dart.git
      ref: v3.0.0-rc

In this new version, there should be a new method on the PubNub class called reconnect(). This method will reconnect all long-running connections if necessary automatically. You can call it any time you need. Let me know if this works for you.

Please keep in mind, that with this new version, some imports may have changed (especially for StreamLogger or RetryPolicy) - they have been moved to separate module imports (package:pubnub/logging.dart and package:pubnub/networking.dart respectively).

Thanks for your cooperation!

@devopsokdone
Copy link
Author

Thanks for this @are we managed to update using the first option. Other then reconnection we experienced breaking changes in ChannelGroup and fetching messageActions from history.

While we spend a few hours fixing them but we got stuck with the following errors while trying to subscribe:

1 st ERROR******
Set channelSet
Set channelGroupSet

selfSubscription = client.subscribe(
withPresence: true,
channelGroups: channelGroupSet,
channels: channelSet);

[ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: type 'EfficientLengthMappedIterable<String, String>' is not a subtype of type 'Set'
E/flutter ( 6300): #0 Subscription.presenceChannels (package:pubnub/src/subscribe/subscription.dart:37:7)
E/flutter ( 6300): #1 Manager._updateLoop. (package:pubnub/src/subscribe/manager.dart:30:50)
E/flutter ( 6300): #2 SetMixin.fold (dart:collection/set.dart:159:44)
E/flutter ( 6300): #3 Manager._updateLoop (package:pubnub/src/subscribe/manager.dart:25:35)
E/flutter ( 6300): #4 Manager.createSubscription (package:pubnub/src/subscribe/manager.dart:51:5)
E/flutter ( 6300): #5 SubscribeDx.subscribe (package:pubnub/src/subscribe/subscribe.dart:30:32)
E/flutter ( 6300): #6 WebSocketClient.connect (package:ok_done_pubnub/core/client/websocket_client.dart:150:16)
E/flutter ( 6300):
E/flutter ( 6300): #7 AuthStore.startConnectivity (package:ok_done_pubnub/stores/auth_store.dart:69:29)
E/flutter ( 6300):
E/flutter ( 6300): #8 AuthStore.checkLoggedIn (package:ok_done_pubnub/stores/auth_store.dart:50:7)
E/flutter ( 6300):
E/flutter ( 6300): #9 SplashScreen.build.. (package:ok_done_pubnub/screens/splash/splash.dart:23:23)
E/flutter ( 6300): #10 SchedulerBinding._invokeFrameCallback (package:flutter/src/scheduler/binding.dart:1117:15)
E/flutter ( 6300): #11 SchedulerBinding.handleDrawFrame (package:flutter/src/scheduler/binding.dart:1064:9)
E/flutter ( 6300): #12 SchedulerBinding.scheduleWarmUpFrame. (package:flutter/src/scheduler/binding.dart:865:7)
E/flutter ( 6300): #13 _rootRun (dart:async/zone.dart:1182:47)
E/flutter ( 6300): #14 _CustomZone.run (dart:async/zone.dart:1093:19)
E/flutter ( 6300): #15 _CustomZone.runGuarded (dart:async/zone.dart:997:7)
E/flutter ( 6300): #16 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1037:23)
E/flutter ( 6300): #17 _rootRun (dart:async/zone.dart:1190:13)
E/flutter ( 6300): #18 _CustomZone.run (dart:async/zone.dart:1093:19)
E/flutter ( 6300): #19 _CustomZone.bindCallback. (dart:async/zone.dart:1021:23)
E/flutter ( 6300): #20 Timer._createTimer. (dart:async-patch/timer_patch.dart:18:15)
E/flutter ( 6300): #21 _Timer._runTimers (dart:isolate-patch/timer_impl.dart:397:19)
E/flutter ( 6300): #22 _Timer._handleMessage (dart:isolate-patch/timer_impl.dart:428:5)
E/flutter ( 6300): #23 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:168:12)

If we remove withPresence: true, then this error is not coming.

2 nd ERROR*****
Before upgrading to this version, we used to include all channels in Channel Group and make the subscription at channelGroup level but in the new version until we specify both ChannelSet and ChannelGroupSet it is giving the following error. We want to use channelGroup based subscription instead of each channel level.

selfSubscription = client.subscribe(channelGroups: channelGroupSet);

[ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: NoSuchMethodError: The method 'contains' was called on null.
E/flutter ( 6864): Receiver: null
E/flutter ( 6864): Tried calling: contains("pubub302")
E/flutter ( 6864): #0 Object.noSuchMethod (dart:core-patch/object_patch.dart:51:5)
E/flutter ( 6864): #1 Subscription.resume. (package:pubnub/src/subscribe/subscription.dart:89:21)
E/flutter ( 6864): #2 _WhereStream._handleData (dart:async/stream_pipe.dart:193:24)
E/flutter ( 6864): #3 _ForwardingStreamSubscription._handleData (dart:async/stream_pipe.dart:157:13)
E/flutter ( 6864): #4 _rootRunUnary (dart:async/zone.dart:1198:47)
E/flutter ( 6864): #5 _CustomZone.runUnary (dart:async/zone.dart:1100:19)
E/flutter ( 6864): #6 _CustomZone.runUnaryGuarded (dart:async/zone.dart:1005:7)
E/flutter ( 6864): #7 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:357:11)
E/flutter ( 6864): #8 _DelayedData.perform (dart:async/stream_impl.dart:611:14)
E/flutter ( 6864): #9 _StreamImplEvents.handleNext (dart:async/stream_impl.dart:730:11)
E/flutter ( 6864): #10 _PendingEvents.schedule. (dart:async/stream_impl.dart:687:7)
E/flutter ( 6864): #11 _rootRun (dart:async/zone.dart:1182:47)
E/flutter ( 6864): #12 _CustomZone.run (dart:async/zone.dart:1093:19)
E/flutter ( 6864): #13 _CustomZone.runGuarded (dart:async/zone.dart:997:7)
E/flutter ( 6864): #14 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1037:23)
E/flutter ( 6864): #15 _rootRun (dart:async/zone.dart:1190:13)
E/flutter ( 6864): #16 _CustomZone.run (dart:async/zone.dart:1093:19)
E/flutter ( 6864): #17 _CustomZone.runGuarded (dart:async/zone.dart:997:7)
E/flutter ( 6864): #18 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1037:23)
E/flutter ( 6864): #19 _microtaskLoop (dart:async/schedule_microtask.dart:41:21)
E/flutter ( 6864): #20 _startMicrotaskLoop (dart:async/schedule_microtask.dart:50:5)

3rd ERROR*****
Same error as reported in issue #19 after upgrade to v3.0.0-rc.

@devopsokdone
Copy link
Author

The main item we were to try was reconnect() but due to lack of documentation and trying to overcome errors above we have not been able to figure out who to use it.

@are can you write a small para explaining how this new option is to be used, as per your comments > This method will reconnect all long-running connections if necessary automatically. You can call it any time you need.

As it says it will automatically reconnect, do we need to do anything in our code while initializing the client or subscribing for the automatic reconnection to work? Also if we have to call this manually please give some guidance or example.

Our requirement is

  • connection should not drop when the app is in the foreground
  • reconnection should happen when the app comes from background to foreground, the internet goes off and comes back
  • 2 other scenarios: a channel is added or removed and re-subscription is needed with the new set and when a notification is received via FCM and the application needs to connect.

@are
Copy link
Contributor

are commented Sep 28, 2020

To fix those errors, please update pubnub dependency to version 3.0.0-rc.1.

When it comes to using reconnect, I will try to write something today or tomorrow as well as reasoning for removing ChannelSet (and possibly bringing it back if you preferred it). I will link to it here once I publish it.

@devopsokdone
Copy link
Author

devopsokdone commented Sep 29, 2020

Thanks @are we have managed to upgrade and first 2 out of the 3 exceptions are taken care off, we are able to subscribe withPresence True and channelGroup.

For 3rd error steps to reproduce are same as mentioned in issue 19.

We are awaiting your guidance on reconnect and connection reliability.

@are
Copy link
Contributor

are commented Sep 29, 2020

So the issue is a bit more complex and its not a problem with the PubNub SDK itself. Take a look here: flutter/flutter#33427

Basically, Debugger reports some exceptions as unhandled, but in reality they are handled correctly. This issue only happens in development debug mode, when the application is build for production, there is no issue with running the code.

This problem happens because of the Dio library that we use for Http, and multiple people have the same problem with other libraries as well. To circumvent this problem, you need to uncheck Break On Uncaught Exceptions in your flutter editor/debugger and then the code for reconnection should work correctly. Unfortunately, until the version 2.10 lands as stable, there is no other resolution for this.

@devopsokdone
Copy link
Author

Will try this tomorrow and report back.

Regarding reconnection based on our study we need to understand it at 3 levels

  • Internet - Supervisor module takes care of up or down
  • Websocket - could not find a way check if connected or not (may be reconnect() will take care of this)
  • Subscription - pause and resume, this works and able to get isPaused to trigger a resume.

We need help to understand how SDK takes care of Websocket part and what should we do when initiating the client and subscription so that automatic reconnection is taken care by SDK, secondly in certain situations we need to run History Fetching before reconnection to ensure all messages are retrieved before connection is established if the client was disconnected for more than 10-12 mins. Other scenario is when channel is added or removed we need to resubscribe/reconnect with new set.

Key options available from connection perspective are subscribe, dispose, reconnect, pause, resume what we need to understand is when to use what and are we missing something. Hence few examples will be very useful.

@are
Copy link
Contributor

are commented Sep 29, 2020

Great news - I think I have been able to fix the issues with reconnection in Android. It's available in 3.0.0-rc.2, let me know if that works for you.

How reconnection works

In the SDK we differentiate between two types of network requests: normal and subscribe.

  • normal is the usual http request.
  • subscribe is using long-polling to connect to the PubNub service.

Just to be clear: we are not using websocket protocol in our services because of scalability issues that this protocol may have, please refer to this PubNub Support article for more information about it.

subscribe requests are handled by the manager. When you have a long-polling request awaiting for new messages from your subscription, network status may change. Depending on the platform, various things can happen - Android throws a HttpException, while iOS does not throw anything and the long-polling request becomes stale (but the SDK does not know about this).

Meanwhile, when you are making a normal request (i.e. publish or history) - if the network is unavailable it will either fail immediately or timeout. This signals the supervisor that the network is down and supervisor in turn signals the manager that any currently running long-polls may be stale. manager cancels those existing long-polls and reissues them again. Because the network is down, they will fail and it can enter the normal retrial logic.

When it comes to initializing the client and subscription there is only one thing you need to do to enable reconnection and retrial - and that is to pass in the RetryPolicy to the networking module. To do that, you need to do the following:

import 'package:pubnub/pubnub.dart';
import 'package:pubnub/networking.dart';

var client = PubNub(
  networking:
    NetworkingModule(retryPolicy: RetryPolicy.exponential(maxRetries: 10)),
  defaultKeyset: myKeyset,
);

You can either use the builtin RetryPolicy.exponential or implement your own RetryPolicy.
This gives the supervisor enough information to proceed with retrying the requests. You don't need to do anything else to handle basic network issues.

Using History to retrieve missed messages:

In general, subscribe request should be able to retrieve up to 100 messages from the moment that we lost network connectivity, so for small outages it should be enough (depending on your load). If you want to create custom logic based on network availability in PubNub SDK you can use the client.signals.networkIsUp stream that will emit each time the SDK reconnects to the servers. You can then compare current DateTime with latest message (by converting Timetoken to DateTime using timetoken.toDateTime() and make a request to history.

Adding or removing channels

Subscription is immutable and it is not possible to change channels after creating one, so the recommended way is to create a new Subscription based on the old one:

 var sub = pubnub.subscribe(channels: {'my1', 'my2'});

  // some time later

  await sub.cancel();
  sub = pubnub.subscribe(channels: sub.channels.difference({'my1'})); // subtracts 'my1' channel => result {'my2'}

  // some time later

  await sub.cancel();
  sub = pubnub.subscribe(channels: sub.channels.union({'my3'})); // adds 'my3' channel => result {'my2', 'my3'}

How to use pubnub.reconnect()

There are no drawbacks when calling reconnect. You can use it:

  • when you detect that app has been sent from the background to the foreground,
  • when the connectivity package signals that network is off,
  • when there have been no activity for some time.

If I missed anything, we will be updating the documentation in the coming weeks to include more comprehensive examples and tutorials on how to integrate the SDK with Flutter. Let me know if this works for you.

@devopsokdone
Copy link
Author

Appreciate your help @are we are now using v3.0.0-rc.2

Reconnect is working and we are also able to check if SDK connection is up or down.

Here is one observation:

  1. Device Internet off
  • We are getting the isNetworkConnected as false. ✔️
  1. Device Internet On
  • We are getting the internet up event from OS. ✔️
  1. But SDK connection only shows isNetworkConnected as true only when we start typing. If we don't do anything in the app but send messages from other device messages will not be received on this device. Sometimes it will show isNetworkConnected true after 1+ min(s). So reconnect is working but establishing subscription is taking time or not happening until some activity is done.❓

We are still testing but we have been able to reproduce this few times.

@are
Copy link
Contributor

are commented Sep 30, 2020

So this is the bigger issue with the reconnection in general: unless we do some action and that action fails, we cannot know that the network is down. You can try tweaking the RetryPolicy or using the connectivity flutter plugin to try to combat that, or setting up some custom application logic that if after 15 seconds of inactivity in the application you can try to call reconnect manually to maybe catch the moment when the network is down. This is a broader problem with network connected apps in general and you need to try to find a solution that suits your usecase. If you have more questions on how you can leverage PubNub to handle this, don't hesitate to ask!

@devopsokdone
Copy link
Author

devopsokdone commented Sep 30, 2020

We managed to run a few more tests and below are the observations, the reason to report here is despite manually calling reconnect it is taking time for reconnection and the bigger worry is when the reconnection happens getting pending messages from the server is quite inconsistent (sometimes all messages missed, partial messages missed and no messages missed).

************ When we are not calling reconnection but depending on SDK to automatically reconnect*************************

10 Messages Received out of 10 sent before reconnection (No issues with the number of messages received)
I/flutter ( 6089): NetworkIsConnected false at 2020-09-30 20:47:14.351024 (when we disconnected the Internet)
I/flutter ( 6089): Device Network Turned On at 2020-09-30 20:47:57.170693 (when we connected the Internet)
I/flutter ( 6089): NetworkIsConnected true at 2020-09-30 20:48:19.983670 (when the SDK got connected with the service)

100 Messages Received out of 110 sent before reconnection (No issues with the number of messages received)
I/flutter ( 6089): NetworkIsConnected false at 2020-09-30 20:49:01.903299
I/flutter ( 6089): Device Network Turned On at 2020-09-30 20:51:15.017892
I/flutter ( 6089): NetworkIsConnected true at 2020-09-30 20:52:07.504774

************ When we are calling reconnection upon network availability*************************

24 Messages received out of 50 sent before reconnect - Partial messages missed
I/flutter ( 1252): NetworkIsConnected false at 2020-09-30 19:11:22.948579
I/flutter ( 1252): Device Network Turned On at 2020-09-30 19:12:58.787433
I/flutter ( 1252): NetworkIsConnected true at 2020-09-30 19:13:29.919518

No Messages Received Out of 50 sent before reconnect - All messages missed
I/flutter ( 1252): NetworkIsConnected false at 2020-09-30 19:22:11.330537
I/flutter ( 1252): Device Network Turned On at 2020-09-30 19:25:18.426564
I/flutter ( 1252): NetworkIsConnected true at 2020-09-30 19:26:15.976766

All 50 Messages received out of 50 sent before reconnect - No messages missed
I/flutter ( 1252): NetworkIsConnected false at 2020-09-30 19:29:40.888595
I/flutter ( 1252): Device Network Turned On at 2020-09-30 19:33:44.475270
I/flutter ( 1252): NetworkIsConnected true at 2020-09-30 19:33:48.049197

All 50 Messages received out of 50 sent before reconnect - No messages missed
I/flutter ( 3077): NetworkIsConnected false at 2020-09-30 19:44:49.758988
I/flutter ( 3077): Device Network Turned On at 2020-09-30 19:47:17.850071
I/flutter ( 3077): NetworkIsConnected true at 2020-09-30 19:47:55.799399

No Messages Received out of 110 sent before reconnect - All messages missed
I/flutter ( 3077): NetworkIsConnected false at 2020-09-30 20:09:47.399924
I/flutter ( 3077): Device Network Turned On at 2020-09-30 20:16:37.120838
I/flutter ( 3077): NetworkIsConnected true at 2020-09-30 20:16:54.932481

//Manual Reconnection code.
Connectivity()
.onConnectivityChanged
.listen((ConnectivityResult result) async {
if (result != ConnectivityResult.none) {
webSocketClient.reconnect();
}
}

//Print log for SDK Connection is Up Or Down
final sub = client.signals.networkIsConnected.listen((event) {
print(
"NetworkIsConnected $event at ${DateTime.now()}");
});


This is a different observation and seems to be working as planned but just want to reconfirm, the networkIsConnected will only switch from false to true and true to false if the state changes, but if the state is false and the internet goes on and off it will not print false again because it was not set to true when the internet was recovered.

It is a different matter of why it did not turn true upon internet availability, below are the logs.

I/flutter ( 3077): NetworkIsConnected false at 2020-09-30 19:50:13.961787
I/flutter ( 3077): Device Network Turned On at 2020-09-30 19:50:22.945806
I/flutter ( 3077): Device Network Turned On at 2020-09-30 19:51:41.432435
I/flutter ( 3077): Device Network Turned On at 2020-09-30 19:52:25.853357

@devopsokdone
Copy link
Author

Further to yesterday's message, did testing for getting pending messages upon automatic reconnection.

We are also publishing a dummy typing signal on UUID channel to generate some traffic for connection check to work. As soon as this event is sent NetworkIsConnected shows true. But pending messages either take 25-45 secs to start coming or are missed totally.

We are testing all of this within 2-3 or max 5 mins cycles so we are surely not exceeding the cache time of 10-16 mins on server. We request if you can test the reliability of getting messages from server upon reconnection, seems something worth investigating either in SDK or our code.

**************Log when all messages missed

No messages received even after NetworkIsConnected returned true.

I/flutter (17857): NetworkIsConnected false at 2020-10-01 13:04:56.770834
I/flutter (17857): Device Network Turned On at 2020-10-01 13:09:20.574558
I/flutter (17857): Typing False Event Sent at 2020-10-01 13:09:25.577714
I/flutter (17857): NetworkIsConnected true at 2020-10-01 13:09:25.836085

@devopsokdone
Copy link
Author

Due to unreliability of getting missed messages upon reconnection, we have introduced history fetching as soon as device internet is up.

Due to history event networkIsConnected is set to true and now we are getting messages from history and repeat of partial messages from subscription, though we are ignoring duplicates but it is unwanted and confusing.

@are
Copy link
Contributor

are commented Oct 1, 2020

Thanks for diving deep into this. This is something that will require a bit more time to investigate.

@are
Copy link
Contributor

are commented Oct 8, 2020

@devopsokdone the fix to the original issue has been released in v3.0.0.

I will close this issue for clarity, but don't worry - the reliability issues are on our internal checklist. Feel free to open a new issue for that if you want to have it tracked.

@are are closed this as completed Oct 8, 2020
@are are added status: done This issue is considered resolved. and removed status: in progress This issue is being worked on. labels Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: done This issue is considered resolved.
Projects
None yet
Development

No branches or pull requests

2 participants