Connection is RST if data (< init window) with MPTCP options received, then MPTCP options are dropped #518

matttbe · 2024-10-02T15:29:53Z

@cpaasch reported me an issue seen in the field, due to a middlebox stripping MPTCP options later on after the 3WHS.

This packetdrill test is what we are supposed to have:

--tolerance_usecs=100000
`../common/defaults.sh`

0.0    socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0     setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0     bind(3, ..., ...) = 0
+0     listen(3, 1) = 0

// 3WHS is OK
+0.0     <  S   0:0(0)                    win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
+0.0     >  S.  0:0(0)           ack 1               <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
+0.1     <   .  1:1(0)           ack 1    win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
+0     accept(3, ..., ...) = 4

+0.01    <  P.  1:501(500)       ack 1    win 2048                                              <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
// From here, the MPTCP options will be dropped by a middlebox
+0.0     >   .  1:1(0)           ack 501                                                        <dss dack8=501 dll=0 nocs>
+0.01  read(4, ..., 500) = 500
+0     write(4, ..., 100) = 100

// The server replies with data, still thinking MPTCP is being used
+0       >  P.  1:101(100)       ack 501                                                        <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
// But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
+0.01    <   .  501:501(0)       ack 101  win 2048

+0.0     <  P.  501:601(100)     ack 101  win 2048
+0.01  `nstat -s | grep MPTcp`  // in case of issue, MPRstTx counter is 1
+0.0     >   .  101:101(0)       ack 601

But instead of sending a "plain TCP" ACK for the last packet here, we can see a RST packet with MP_TCPRST set to 1:

0.131450 R. 101:101(0) ack 601 win 1049 <mp_reset 1>

The server sends a RST, because when it detects the errors, it first checks if a fallback is possible [1] using subflow_can_fallback() which returns false due to:

return !subflow->fully_established;

Technically, the server should not be in fully_established mode here, because it didn't receive a valid DATA_ACK from the client, nor data for more than the initial window (implying a DATA_ACK has been received from the other side).

But not to change the whole behaviour, maybe easier to do something like this if it is OK with the RFC:

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a1e28e1d8b4e..61482f8b425b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1282,7 +1282,7 @@ static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
        else if (READ_ONCE(msk->csum_enabled))
                return !subflow->valid_csum_seen;
        else
-               return !READ_ONCE(subflow->fully_established);
+               return READ_ONCE(msk->allow_infinite_fallback);
 }
 
 static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)

The text was updated successfully, but these errors were encountered:

dcaratti · 2024-10-02T16:54:35Z

nit: RFC says we "SHOULD" use middlebox interference (MPTCP_RST_EMIDDLEBOX or 0x6 ) in this scenario, not the generic one (that's 0x1). currently get_mapping_status() returns MAPPING_INVALID. we might want to use a dedicated value of mapping_status for situations:

--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -960,7 +960,8 @@ enum mapping_status {
        MAPPING_EMPTY,
        MAPPING_DATA_FIN,
        MAPPING_DUMMY,
-       MAPPING_BAD_CSUM
+       MAPPING_BAD_CSUM,
+       MAPPING_NODSS
 };
 
 static void dbg_bad_map(struct mptcp_subflow_context *subflow, u32 ssn)
@@ -1103,6 +1104,10 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
 
        mpext = mptcp_get_ext(skb);
        if (!mpext || !mpext->use_map) {
+
+               if (!mpext)
+                       return MAPPING_NODSS;
+
                if (!subflow->map_valid && !skb->len) {
                        /* the TCP stack deliver 0 len FIN pkt to the receive
                         * queue, that is the only 0len pkts ever expected here,
@ -1332,7 +1337,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
                status = get_mapping_status(ssk, msk);
                trace_subflow_check_data_avail(status, skb_peek(&ssk->sk_receive_queue));
                if (unlikely(status == MAPPING_INVALID || status == MAPPING_DUMMY ||
-                            status == MAPPING_BAD_CSUM))
+                            status == MAPPING_BAD_CSUM || status == MAPPING_NODSS))
                        goto fallback;
 
                if (status != MAPPING_OK)
@@ -1385,7 +1390,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
                         * subflow_error_report() will introduce the appropriate barriers
                         */
                        subflow->reset_transient = 0;
-                       subflow->reset_reason = MPTCP_RST_EMPTCP;
+                       subflow->reset_reason = status == MAPPING_NODSS ? MPTCP_RST_EMIDDLEBOX : MPTCP_RST_EMPTCP;

matttbe · 2024-10-02T17:34:03Z

Note that removing these two lines...

mptcp_net-next/net/mptcp/protocol.c

Lines 3361 to 3362 in ecea735

    
           if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK) 
        
           	__mptcp_subflow_fully_established(msk, subflow, mp_opt);

...makes this test fails:

v1_mp_capable_bind_no_cs_ooo.pkt:16: error handling packet: live packet field ipv6_payload_len: expected: 40 (0x28) vs actual: 32 (0x20)
script packet:  0.013904 . 1:1(0) ack 1 <nop,nop,sack 101:201,dss dack4 16777216 flags: A>
actual packet:  0.013883 . 1:1(0) ack 1 win 1050 <nop,nop,sack 101:201>

Should it do a fallback?

Also, the packetdrill I shared doesn't pass with this patch.

pabeni · 2024-10-02T20:01:11Z

Also, the packetdrill I shared doesn't pass with this patch.

I gave a quick look. The whole 'fully_established' tracking will require a more significant refactor to match accurately the RFC spec; check_fully_established() being the main curprit. Such function will need to track more status and look at more pkts.

TL;DR: this path does not look suitable for small fix. I suggest we rollback to the previous idea, using ->allow_infinite_fallback.

We can plan the fully_establish refactor for a later net-next series.

As a side effect, no new squash-to patches for "mptcp: handle consistently DSS corruption"

With the current kernel, the server incorrectly resets a connection if its first packet has data (< init window) and MPTCP options, but then the next ones no longer have MPTCP options. [1] The server should not reset this connection in this case, but do a fallback. It should reset the connection only if it has received a DATA_ACK, or data for more than the initial window (implying DATA_ACK have been received by the other peer) [2]. This new packetdrill test checks the expected behaviour. Link: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Reported-by: Christoph Paasch <[email protected]> Closes: #518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: #519 [4] Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Message-Id: <20241003-mptcp-gh-518 [email protected]>

matttbe · 2024-10-07T09:25:31Z

To give a bit of context to the last two messages, here is what we discussed at the last meeting:

the server is marking the subflow as 'fully_established' too early, it should do that:
- after having received a first DATA_ACK
- or after having received more than the initial window (implying DATA_ACK have been received by the client)
For the moment, the server is sending an MP_TCPRST with "MPTCP-specific error" reason, but it should be "Middlebox interference" → see Davide's patch
It would be interesting to have MIB counters for the fallback/RST cases: see Add MIB counters for fallback to TCP #514
Plan:
- Check the impact of removing these two lines because it might change the behaviour (what I did, see my last message):
  
  mptcp_net-next/net/mptcp/protocol.c
  
  Lines 3361 to 3362 in ecea735
  
  if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK)
  
  __mptcp_subflow_fully_established(msk, subflow, mp_opt);
- If the impact is too important, we can do that incrementally and starts by using the code suggested in the ticket's description
- Probably we should review how allow_infinite_fallback is being used: it should not be needed → only fallback if we have the initial subflow not in fully_established → Ticket Reset the connection once "MPTCP has been verified" instead of a fallback #519 has been created

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Reported-by: Christoph Paasch <[email protected]> Closes: #518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: #519 [4] Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Reported-by: Christoph Paasch <[email protected]> Closes: #518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: #519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: multipath-tcp/mptcp_net-next#519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

…s false RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of MPJ subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: multipath-tcp/mptcp_net-next#518 (comment) Signed-off-by: Davide Caratti <[email protected]>

…s false RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of MPJ subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: #518 (comment) Signed-off-by: Davide Caratti <[email protected]> Message-Id: <4b2cee91613e597a172b46bb0d9d3143053c52da.1728408247.git.dcaratti@redhat.com>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: multipath-tcp/mptcp_net-next#519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Reported-by: Christoph Paasch <[email protected]> Closes: #518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: #519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: multipath-tcp/mptcp_net-next#519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Reported-by: Christoph Paasch <[email protected]> Closes: #518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: #519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: multipath-tcp/mptcp_net-next#518 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: #518 Signed-off-by: Davide Caratti <[email protected]>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: multipath-tcp/mptcp_net-next#518 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: #518 Signed-off-by: Davide Caratti <[email protected]>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: multipath-tcp/mptcp_net-next#518 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Signed-off-by: NipaLocal <nipa@local>

RFC8684 suggests use of "Middlebox interference (code 0x06)" in case of fully established subflow that carries data at TCP level with no DSS sub-option. This is generally the case when mpext is NULL or mpext->use_map is 0: use a dedicated value of 'mapping_status' and use it before closing the socket in subflow_check_data_avail(). Link: multipath-tcp/mptcp_net-next#518 Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: multipath-tcp/mptcp_net-next#519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

stable inclusion from stable-v6.6.57 commit a448ced30802a924fd8b1df39671fb5a50a7aaf1 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IB2M97 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a448ced30802a924fd8b1df39671fb5a50a7aaf1 -------------------------------- commit 119d51e225febc8152476340a880f5415a01e99e upstream. As reported by Christoph [1], before this patch, an MPTCP connection was wrongly reset when a host received a first data packet with MPTCP options after the 3wHS, but got the next ones without. According to the MPTCP v1 specs [2], a fallback should happen in this case, because the host didn't receive a DATA_ACK from the other peer, nor receive data for more than the initial window which implies a DATA_ACK being received by the other peer. The patch here re-uses the same logic as the one used in other places: by looking at allow_infinite_fallback, which is disabled at the creation of an additional subflow. It's not looking at the first DATA_ACK (or implying one received from the other side) as suggested by the RFC, but it is in continuation with what was already done, which is safer, and it fixes the reported issue. The next step, looking at this first DATA_ACK, is tracked in [4]. This patch has been validated using the following Packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // 3WHS is OK +0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey> +0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]> +0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]> +0 accept(3, ..., ...) = 4 // Data from the client with valid MPTCP options (no DATA_ACK: normal) +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop> // From here, the MPTCP options will be dropped by a middlebox +0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs> +0.1 read(4, ..., 500) = 500 +0 write(4, ..., 100) = 100 // The server replies with data, still thinking MPTCP is being used +0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop> // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options +0.1 < . 501:501(0) ack 101 win 2048 +0.0 < P. 501:601(100) ack 101 win 2048 // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window +0.0 > . 101:101(0) ack 601 Note that this script requires Packetdrill with MPTCP support, see [3]. Fixes: dea2b1e ("mptcp: do not reset MP_CAPABLE subflow on mapping errors") Cc: [email protected] Reported-by: Christoph Paasch <[email protected]> Closes: multipath-tcp/mptcp_net-next#518 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2] Link: https://github.com/multipath-tcp/packetdrill [3] Link: multipath-tcp/mptcp_net-next#519 [4] Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: Matthieu Baerts (NGI0) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Wen Zhiwei <[email protected]>

matttbe added the bug label Oct 2, 2024

This was referenced Oct 3, 2024

mptcp: check fallback if opt dropped after first data multipath-tcp/packetdrill#157

Merged

Reset the connection once "MPTCP has been verified" instead of a fallback #519

Open

matttbe closed this as completed in c03010a Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection is RST if data (< init window) with MPTCP options received, then MPTCP options are dropped #518

Connection is RST if data (< init window) with MPTCP options received, then MPTCP options are dropped #518

matttbe commented Oct 2, 2024 •

edited

Loading

dcaratti commented Oct 2, 2024 •

edited by matttbe

Loading

matttbe commented Oct 2, 2024

pabeni commented Oct 2, 2024

matttbe commented Oct 7, 2024

Connection is RST if data (< init window) with MPTCP options received, then MPTCP options are dropped #518

Connection is RST if data (< init window) with MPTCP options received, then MPTCP options are dropped #518

Comments

matttbe commented Oct 2, 2024 • edited Loading

dcaratti commented Oct 2, 2024 • edited by matttbe Loading

matttbe commented Oct 2, 2024

pabeni commented Oct 2, 2024

matttbe commented Oct 7, 2024

matttbe commented Oct 2, 2024 •

edited

Loading

dcaratti commented Oct 2, 2024 •

edited by matttbe

Loading