dccp/tcp: fix routing redirect race

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: open-power-host-os#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . open-power-host-os#9 [] tcp_rcv_established at ffffffff81580b64 open-power-host-os#10 [] tcp_v4_do_rcv at ffffffff8158b54a open-power-host-os#11 [] tcp_v4_rcv at ffffffff8158cd02 open-power-host-os#12 [] ip_local_deliver_finish at ffffffff815668f4 open-power-host-os#13 [] ip_local_deliver at ffffffff81566bd9 open-power-host-os#14 [] ip_rcv_finish at ffffffff8156656d open-power-host-os#15 [] ip_rcv at ffffffff81566f06 open-power-host-os#16 [] __netif_receive_skb_core at ffffffff8152b3a2 open-power-host-os#17 [] __netif_receive_skb at ffffffff8152b608 open-power-host-os#18 [] netif_receive_skb at ffffffff8152b690 open-power-host-os#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] open-power-host-os#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] open-power-host-os#21 [] net_rx_action at ffffffff8152bac2 open-power-host-os#22 [] __do_softirq at ffffffff81084b4f open-power-host-os#23 [] call_softirq at ffffffff8164845c open-power-host-os#24 [] do_softirq at ffffffff81016fc5 open-power-host-os#25 [] irq_exit at ffffffff81084ee5 open-power-host-os#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
NXPower · Mar 22, 2017 · b34c9f7 · b34c9f7
1 parent 7ebf301
commit b34c9f7
Show file tree

Hide file tree

Showing 4 changed files with 14 additions and 8 deletions.
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
@@ -289,7 +289,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 
 	switch (type) {
 	case ICMP_REDIRECT:
-		dccp_do_redirect(skb, sk);
+		if (!sock_owned_by_user(sk))
+			dccp_do_redirect(skb, sk);
 		goto out;
 	case ICMP_SOURCE_QUENCH:
 		/* Just silently ignore these. */

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
@@ -122,10 +122,12 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	np = inet6_sk(sk);
 
 	if (type == NDISC_REDIRECT) {
-		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+		if (!sock_owned_by_user(sk)) {
+			struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
 
-		if (dst)
-			dst->ops->redirect(dst, sk, skb);
+			if (dst)
+				dst->ops->redirect(dst, sk, skb);
+		}
 		goto out;
 	}
 

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
@@ -421,7 +421,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 
 	switch (type) {
 	case ICMP_REDIRECT:
-		do_redirect(icmp_skb, sk);
+		if (!sock_owned_by_user(sk))
+			do_redirect(icmp_skb, sk);
 		goto out;
 	case ICMP_SOURCE_QUENCH:
 		/* Just silently ignore these. */

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
@@ -382,10 +382,12 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	np = inet6_sk(sk);
 
 	if (type == NDISC_REDIRECT) {
-		struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
+		if (!sock_owned_by_user(sk)) {
+			struct dst_entry *dst = __sk_dst_check(sk, np->dst_cookie);
 
-		if (dst)
-			dst->ops->redirect(dst, sk, skb);
+			if (dst)
+				dst->ops->redirect(dst, sk, skb);
+		}
 		goto out;
 	}