-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VoLTE vs NATPING #53
Comments
I realize its bad in terms of draining UEs battery but I had it in place to ensure that SIP signaling messages are exchanged reliably and not getting lost in the transition from IDLE <--> CONNECTED. Also, it can be easily disabled in pcscf.cfg so one can tailor to their deployment needs
I am not sure whether I stood your point here but in any case VoLTE SIP signaling is not NATed in mobile networks
Havent test dual SIM phone so not aware of this issue.
I agree that its a weird naming and misleading to think that it a ping for NATed client only which is not the case. In general its a SIP OPTIONS based pinging which is usually need when calling between SIP clients behind a NAT.
In order to do that you can enable frequent keepalives to only VoWifi client by checking their sip.P-Access-Network-Info SIP header and adding them to the to-be-pinged list maintained in the script only if its a WLAN |
The handling of state transition (or any other radio induced) small unstable periods should be handled through some sort of retry mechanism, not via constant ping :-) Something simple might work: try the first time with short timeout, then a bit longer, then one last time with even longer. Something like: 200ms, 500ms, 1000ms. I am not expecting you to incorporate some elaborate adaptive retrans scheme here :-)
That is what I meant. The signalling traffic between the UE IMS client and the P/S/I-CSCF. When there is a call, there is constant traffic anyhow :-)
This is maybe due to the nature of DSDS operation. If the VoLTE SIM for Open5GS is in the second slot, and the primary slot is occupied and actively used, this can happen. Maybe we should take a look if Open5Gs does hourly TAUs, as it can happen (although it shouldn't), that a long idling GTP-U tunnel gets lost, especially if there is heavy mobility within the tracking area. It could have been a simple UE fault as well, although since I turned it off it works perfectly well. Even with very long idling sessions I have not seen any issues. Last time I intentionally waited 24 hours between two calls, inbetween it was only the hourly SIP session renewals.
That is clear.
As I explained it, I dont think that this is needed for Vowifi as well. The main reason is that the IPSec link between UE <--> ePDG can be "kept alive" (from a NAT point of view) with DPD, unless there is also NAT expected between the ePDG and P/I/S-CSCFs. Good to know about the sip.P-Access-Network-Info option though. |
Thanks for confirming this. Then, I can safely disable the SIP OPTIONS ping. One question though, could the keep alive be ePDG implementation specific? |
If the IPsec traffic of Vowifi terminates on the ePDG (which is not part of the IMS), then yes we can disable it. And no, the DPD is part of the IPsec base standard, so every ePDG should support it, and based on my experience, it should be set to 10 seconds or so as most routers has a rater short NAT alive timeout for UDP sessions. I am not sure if Kamailio can act as an ePDG (I mean as a separate node, not part of P/S/I-CSCF), but if it can, the DPD should be configured there. I think we need to revise other parts as well in terms of timers, let me elaborate: I traced a VoLTE session of a large commercial provider. There is no SIP keepalive in there at all (as expected), the validity time is 7200 seconds, and SIP session renewal happens 12 minutes before that. I think we should aim for that, or half of that: 3600 seconds and renewal 6 minutes before that. Maybe in a test lab this 3600 seconds is a better choice. But this affects a lot of other parameters: What I did (besides disabling the NATPING and NAT parts) is this: tcp_connection_lifetime is changed to 3630 seconds on all 3 nodes (a bit longer than the session validity timer).
"ims_registrar_pcscf", "subscription_expires", 3600
"ims_registrar_scscf", "subscription_default_expires", 3600)
Nothing changed here, everything was Kamilio default already. With this from time to time my UE is not able to make any calls (after hours of idling), I am trying to catch the point about what happens and compare it with the logs. Once this is working, it would be nice to connect calls to an actual PBX (I have experience with Asterisk) to have PBX features, like echo test, MOH etc. My general problem with Kamailio that tutorials and meaningful (eg. in-context) documentation is non-existent... One more question: I am looking at the IMS example files form the Kamailio source, they are all very old. And I can also see that in your branch quite a few part is heavily modified. Would you care to elaborate on the differences? And which one we should use: the ones in Kamailio_IMS_Config branch, or the ones in the kamailio branch or the ones in docker_open5gs? |
I think SIP session renewal is UE dependent (or maybe we are missing a mechanism in IMS to renew session). The reason I set the subscription_expires to that high value is because I was trying to fix a bug raised long back where user said calls were automatically dropped after 3600 seconds (i.e. UE was not issue re-INVITE to renew the session).
I can understand. In Kamailio, the focus is less on IMS
I have a branch in my forked kamailio repo to update the example files in Kamailio source repo but never managed to get it merged. I would recommend using the one in this repo if you want a working setup. I dont think if you use the examples from kamailio source the calling works. Regarding the differences, I cant recollect right now since I worked on it 4 years back or so. But I remember vaguely that it had to do with removing IPSec connections, OPTIONS pining of UE and routing SIP req/replies when IPSec connections are involved. |
Not really. The UE always start the initial REGISTER with 600000 seconds, but that is not the actual session timeout, just what the UE initially requests. But if the network responds with a lower value (which is the case for large commercials), that takes precedence.
Yeah, we cant really keep doing that. Checked another commercial operator and they also use 7200 secs and they also renew 12 minutes before that, exactly like the previous one I checked (both of them are large international players, not a small local market operator). If you have time, you can also check one in your country with NSG. Would be nice to know how widely this is the case. As you have seen, I am also dealing with this EBI issue @ Open5GS. So I cant reliably check long lasting sessions until that one is fixed.
To be clear, you mean here the Kamailio_IMS_Config repo's master branch? |
yes, if you are using kamailio master branch from source. If you are using 5.3 tag then use 5.3 branch in this repo
Is it possible to upload a pcap here how the session is refreshed? |
using latest stable from the official branch (5.8.4) so I will use the master here. Thanks!
I cant provide a PCAP, but this is how it looks: The phone just does the same signalling procedure as at initial registration. The expiry timer sent by the network is 7200 seconds. The only question is: how does the phone know that it needs to re-register exactly 12 minutes before the 7200 seconds pass, and why 12 minutes exactly? Looking at the signalling decoder does not indicate this 12 minutes anywhere... MOD: There are some differences: The commercial provider does not send any "expires" part in the XML message body of the NOTIFY message, while Kamailio does: This is the commercial signalling, only expiry is in the message header not in the XML part: In the commercial signalling there is also no "Subscription to REG saved" message sent to the phone, but Kamailio does do that. |
I managed to fix the session refresh issue - please give this branch a try - https://github.com/herlesupreeth/docker_open5gs/tree/improve_5g_ims a try. Now, the UE session expires is set to 3600 seconds. If you want to test this you can try by reducing expires to 300 seconds in icscf_init.sh, scscf_init.sh, pcscf_init.sh (SUBSCRIPTION_EXPIRES_ENV) I also disabled NATPING |
Will take a look at it, could take a couple days. Much appreciate it. |
Dear @herlesupreeth
I finished setting up VoLTE with Open5GS. Calls can be made successfully within the network, both AMR-WB and EVS works just fine.
However, I noticed that there is constant SIP OPTION sent every 5 seconds to all VoLTE clients. This is bad for multiple reasons:
So what I did is commented out these lines in pcscf.cfg:
##!define WITH_NAT
##!define WITH_NATPING
And now for more than a day even the previously "lost" phone does work properly, and all the VoLTE clients can properly go to idle mode as there is no unnecessary constant signalling traffic.
Of course, if this P-CSCF is to be used for VoWIFI, that is a different situation as VoWIFI does need frequent connection keepalive to be able to pass through NAT. I wonder if there would be a way to disable keepalive for VoLTE and enable it for VoWIFI connections on the same P-CSCF, or the solution is to run separate P-CSCF for Vowifi...
Using Kamailio 5.8.4.
MOD: I have to correct myself: even on VoWIFI this NATPING stuff is not needed on the P-CSCF side, as the UEs IPsec traffic terminates on the ePDG, which can keep the link up with DPD and the IPsec link itself does the NAT traversal, this is transparent to the IMS traffic inside the tunnel.
The text was updated successfully, but these errors were encountered: