Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pfkey_delete_parse hangs on (ips_refcount > 4) when NAT_TRAVERSAL is defined #484

Open
dandema opened this issue Feb 13, 2024 · 4 comments

Comments

@dandema
Copy link

dandema commented Feb 13, 2024

Hello community,

Setup

Openswan 2.6.51.3 (built with NAT_TRAVERSAL) on Linux 4.4.60; strongswan 5.7.1 as IKE daemon.
KLIPS stack with PF_KEY messaging.
IKEv2 connection "gateway-to-gateway".

# ipsec.conf
conn vxlan
        type = tunnel
        auto = route
        keyexchange = ikev2
        ikelifetime = 86400s
        lifetime = 86400s
        ike = aes256-sha2_256-modp2048!
        esp = aes256-sha2_256-modp2048!
        dpdaction = clear
        dpddelay = 30s
        leftupdown = <...>
        leftauth = pubkey
        left = <local-ipsec>
        leftsubnet = <local-gw>/32
        rightupdown = <...>
        rightauth = pubkey
        right = <remote-ipsec>
        rightsubnet = <remote-gw>/32
        leftid = <...>
        rightid = <...>
        leftcert = <...>

# strongswan.conf
charon {
        port = 500
        port_nat_t = 4500
        i_dont_care_about_security_and_use_aggressive_mode_psk = yes
        cisco_unity = yes
        make_before_break = yes
        plugins {
                socket-default {
                        listen4 = <ipsec-local>
                        use_ipv6 = no
                }
                kernel-netlink {
                        fwmark = !0x80/0x80
                }
                kernel-klips {
                        ipsec_dev_count = 1
                        ipsec_dev_mtu = 1554
                }
                xauth-passwd {
                        auth_groups = ipsecxauth
                }
        }
}

Issue

Strongswan hangs indefinitely after calling kernel_klips_ipsec->del_sa->pfkey_send()

Analysis

  • In Openswan, pfkey_delete_parse() stucks here, after ipsec_sa_getbyid() incremented ips_refcount from 4 to 5:
	ipsp = ipsec_sa_getbyid(&(extr->ips->ips_said), IPSEC_REFSA);
        ...
	if (atomic_read(&ipsp->ips_refcount) > 4) {
		spin_unlock_bh(&tdb_lock);
		wait_event_interruptible(ipsp->ips_waitq, (atomic_read(&ipsp->ips_refcount) <= 4));

See last line in the attached log.

  • ips_refcount became 4 and didn't go back to 3 at a former call of pfkey_update_parse():
#ifdef NAT_TRAVERSAL
	if (extr->ips->ips_natt_sport || extr->ips->ips_natt_dport) {
		...
		nat_t_ips_saved = extr->ips;
		extr->ips = ipsq;
		### --- No ipsec_sa_put(ipsq) neither here nor later --- ###
	}
	else
#endif
	{
		...
		/* this will call delchain-equivalent if refcount=>0 */
		ipsec_sa_put(ipsq, IPSEC_REFSA);
	}

See these lines in the attached log:
[ 137.642771] klips_debug:pfkey_update_parse: .
[ 137.643473] klips_debug:pfkey_update_parse: .

Proposed fix

--- a/linux/net/ipsec/pfkey_v2_parser.c
+++ b/linux/net/ipsec/pfkey_v2_parser.c
@@ -635,6 +635,7 @@ pfkey_update_parse(struct sock *sk, stru
 		 */
 
 		extr->ips = nat_t_ips_saved;
+		ipsec_sa_put(ipsq, IPSEC_REFSA);
 
 		error = 0;
 		KLIPS_PRINT(debug_pfkey,

What do you think about our analysis and the way we're fixing it?

-best regards

Daniele De Matteis

full-log-with-pfkey-debug.txt

@letoams
Copy link
Contributor

letoams commented Feb 13, 2024 via email

@dandema
Copy link
Author

dandema commented Feb 13, 2024

Hi letoams,

KLIPS (and openswan) has been abandoned code since about 2012. The fact that you use it is crazy. It doesn’t support AES_GCM or AESNI instructions or IPv6

Thanks for your feedback, but it isn't that crazy.
We use KLIPS it because we run a packet accelerator that requires to find a linux device for ipsec.
And this is a gw-to-gw tunnel, where we don't need neither ipv6 nor other ciphering algos than the few I listed.

@letoams
Copy link
Contributor

letoams commented Feb 13, 2024 via email

@dandema
Copy link
Author

dandema commented Feb 13, 2024

So use a native XFRMi interface

Nice to do, and it is in our roadmap, but KLIPS is our current software stack

AES-CBC is a number of factors slower than AES-GCM

Cyphering algos are decided by the ISP in our case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants