Long-term TCP sessions & MPTCP

来源：互联网发布：网络简介阅读答案编辑：程序博客网时间：2024/06/04 22:17

https://github.com/multipath-tcp/mptcp/issues/153

the following issue is fixed withhttps://github.com/multipath-tcp/mptcp/commit/133537deb63d04e1dfb5af7fd82ed51ba243e518

wapsicommentedon 20 Nov 2016 •

I'm using SSH port tunneling and MPTCP and I've noticed that after several hours or days the MPTCP stops working (traffic doesn't go thru all available interfaces / gateways anymore, it is using only one path).Restarting of this long-term / sustained SSH/TCP session fixes the issueand MPTCP "starts to work again".

What could cause this? Is there any way to debug this problem? Is there any way to tell to MPTCP to lookup new paths again or something similar? I see that under /proc//net/mptcp_net/ and /proc//net/mptcp_fullmesh there are some stats available but is there anything like echo 1 > /proc//net/mptcp_net/discover_paths_again like thing?

My MPTCP settings:
[ 0.412909] MPTCP: Stable release v0.91.2
kernel.osrelease = 4.1.35.mptcp
net.ipv4.tcp_allowed_congestion_control = lia reno cubic
net.ipv4.tcp_available_congestion_control = lia reno balia wvegas cubic olia
net.ipv4.tcp_congestion_control = lia (tried other ones too but the issue remains)
net.core.wmem_max = 115343360
net.core.rmem_max = 115343360
net.ipv4.tcp_rmem = 10240 87380 115343360
net.ipv4.tcp_wmem = 10240 87380 115343360
net.mptcp.mptcp_binder_gateways =
net.mptcp.mptcp_checksum = 0
net.mptcp.mptcp_debug = 0
net.mptcp.mptcp_enabled = 1
net.mptcp.mptcp_path_manager = fullmesh
net.mptcp.mptcp_scheduler = default
net.mptcp.mptcp_syn_retries = 10
net.mptcp.mptcp_version = 1

cpaasch added thequestion labelon 22 Nov 2016

Owner

cpaaschcommentedon 22 Nov 2016

Hello,

do you have a packet-trace of this behavior? It might be that you have a NAT on the path that is timing out.

titercommentedon 22 Nov 2016

Hi,

I see the same behavior here, using MPTCP to aggregate two DSL links. My local gateway is connected to both DSL routers (NAT'ed in both cases), and maintains a long-running, MPTCP-enabled OpenVPN connection to a relay router, through which traffic gets routed. I am fairly happy with that MPTCP setup btw, it has been running for a couple years now and proved effective at hiding glitches from either DSL, and aggregating bandwidth with about 90% efficiency.

Sometimes a subflow dies, e.g. after one of the DSL routers restarts and ends up with a new public IP address. My current workaround is to run a background task that detects that and bounces OpenVPN - but if there is a better way to handle it, I am interested.

Running Debian kernel 4.1.35.mptcp on both endpoints

Owner

cpaaschcommentedon 22 Nov 2016

Can you give the below patch a try? (didn't test it at all! just compiled ;))

You might have to tweak the sysctl's tcp_retries* to have a faster subflow timeout.
When loading the path-manager you have to set the module-parameter create_on_err to 1. Module parameters are in/sys/module/mptcp_fullmesh/parameters

diff --git a/include/net/mptcp.h b/include/net/mptcp.hindex cb5e4cf76b23..e66b8aa295ca 100644--- a/include/net/mptcp.h+++ b/include/net/mptcp.h@@ -230,6 +230,7 @@ struct mptcp_pm_ops { void (*release_sock)(struct sock *meta_sk); void (*fully_established)(struct sock *meta_sk); void (*new_remote_address)(struct sock *meta_sk);+void (*subflow_error)(struct sock *meta_sk, struct sock *sk); int  (*get_local_id)(sa_family_t family, union inet_addr *addr,      struct net *net, bool *low_prio); void (*addr_signal)(struct sock *sk, unsigned *size,diff --git a/net/mptcp/mptcp_ctrl.c b/net/mptcp/mptcp_ctrl.cindex 6045ba160225..853310cbc5d9 100644--- a/net/mptcp/mptcp_ctrl.c+++ b/net/mptcp/mptcp_ctrl.c@@ -610,13 +610,13 @@ EXPORT_SYMBOL(mptcp_select_ack_sock); static void mptcp_sock_def_error_report(struct sock *sk) { const struct mptcp_cb *mpcb = tcp_sk(sk)->mpcb;+struct sock *meta_sk = mptcp_meta_sk(sk);  if (!sock_flag(sk, SOCK_DEAD)) mptcp_sub_close(sk, 0);  if (mpcb->infinite_mapping_rcv || mpcb->infinite_mapping_snd ||     mpcb->send_infinite_mapping) {-struct sock *meta_sk = mptcp_meta_sk(sk);  meta_sk->sk_err = sk->sk_err; meta_sk->sk_err_soft = sk->sk_err_soft;@@ -633,6 +633,9 @@ static void mptcp_sock_def_error_report(struct sock *sk) tcp_done(meta_sk); } +if (mpcb->pm_ops->subflow_error)+mpcb->pm_ops->subflow_error(meta_sk, sk);+ sk->sk_err = 0; return; }diff --git a/net/mptcp/mptcp_fullmesh.c b/net/mptcp/mptcp_fullmesh.cindex 71eb2d4ad2d4..61fda6e1be3e 100644--- a/net/mptcp/mptcp_fullmesh.c+++ b/net/mptcp/mptcp_fullmesh.c@@ -95,6 +95,10 @@ static int num_subflows __read_mostly = 1; module_param(num_subflows, int, 0644); MODULE_PARM_DESC(num_subflows, "choose the number of subflows per pair of IP addresses of MPTCP connection"); +static int create_on_err __read_mostly = 0;+module_param(create_on_err, int, 0644);+MODULE_PARM_DESC(create_on_err, "recreate the subflow upon a timeout");+ static struct mptcp_pm_ops full_mesh __read_mostly;  static void full_mesh_create_subflows(struct sock *meta_sk);@@ -1370,6 +1374,24 @@ static void full_mesh_create_subflows(struct sock *meta_sk) } } +static void full_mesh_subflow_error(struct sock *meta_sk, struct sock *sk)+{+const struct mptcp_cb *mpcb = tcp_sk(meta_sk)->mpcb;++if (!create_on_err)+return;++if (mpcb->infinite_mapping_snd || mpcb->infinite_mapping_rcv ||+    mpcb->send_infinite_mapping ||+    mpcb->server_side || sock_flag(meta_sk, SOCK_DEAD))+return;++if (sk->sk_err != ETIMEDOUT)+return;++full_mesh_create_subflows(meta_sk);+}+ /* Called upon release_sock, if the socket was owned by the user during  * a path-management event.  */@@ -1799,6 +1821,7 @@ static struct mptcp_pm_ops full_mesh __read_mostly = { .release_sock = full_mesh_release_sock, .fully_established = full_mesh_create_subflows, .new_remote_address = full_mesh_create_subflows,+.subflow_error = full_mesh_subflow_error, .get_local_id = full_mesh_get_local_id, .addr_signal = full_mesh_addr_signal, .add_raddr = full_mesh_add_raddr,

wapsicommentedon 22 Nov 2016 •

Hmmm... Packet trace is quite difficult to take from because it could take 1 hour or 2 days when this happens. The capture file will be HUGE...

Yes, I've NAT between these MPTCP boxes. Here are the NAT TCP timeout settings (it's a Linux box):

[root@firewall ~]# sysctl -a|grep conntrack_tcp_timeoutnet.netfilter.nf_conntrack_tcp_timeout_close = 10net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60net.netfilter.nf_conntrack_tcp_timeout_established = **432000**net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300

And I've opened SSH tunnel using settings ServerAliveInterval 10 & ServerAliveCountMax 3 at the client side and ClientAliveInterval 10, ClientAliveCountMax 3 & TCPKeepAlive yes settings at the server side so if I understand those settings correct they should avoid TCP timeout issues.

Here are some stats from netstat commands (I've 3 gateways and 3 sublows an I exclude ^mptcp connections from this list because I want to list subflows):

$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth0 IP):"9$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth1 IP):"9$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth2 IP):"9

And after several hours those are something like:

$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth0 IP):"7$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth1 IP):"5$ netstat -n|grep " (SSH Server IP):(SSH server port) "|grep ^tcp|grep ESTABLISHED$|grep -c " (eth2 IP):"4

So some of the subflows have really dropped out. Now if I restart SSH sessions all the TCP subflows get established again. Another approach is run following commands:

$ ip link set dev eth0 multipath off ; sleep 1 ; ip link set dev eth0 multipath on$ ip link set dev eth1 multipath off ; sleep 1 ; ip link set dev eth1 multipath on$ ip link set dev eth2 multipath off ; sleep 1 ; ip link set dev eth2 multipath on

And then new subflows are established again using all available gateways.

Update: I'll try with the patch you just sent.

Owner

cpaaschcommentedon 22 Nov 2016

The keepalives unfortunately are not a safe solution (in today's implementation of MPTCP in Linux). Because, we chose to only keep the MPTCP-connection alive. Meaning TCP keepalives are sent at most on one single subflow. Thus, the other subflow is timing out.

The keepalive handling is probably something we should rethink.

wapsicommentedon 22 Nov 2016

Just tested with your patch applied and create_on_err parameter set:

$ cat /sys/module/mptcp_fullmesh/parameters/create_on_err1

sysctl mptcp settings used:

net.mptcp.mptcp_binder_gateways = net.mptcp.mptcp_checksum = 0net.mptcp.mptcp_debug = 0net.mptcp.mptcp_enabled = 1net.mptcp.mptcp_path_manager = fullmeshnet.mptcp.mptcp_scheduler = defaultnet.mptcp.mptcp_syn_retries = 10net.mptcp.mptcp_version = 1

and still some TCP subflows will be disconnected (after ~5 hours). Again if I run:

$ ip link set dev eth0 multipath off ; sleep 1 ; ip link set dev eth0 multipath on$ ip link set dev eth1 multipath off ; sleep 1 ; ip link set dev eth1 multipath on$ ip link set dev eth2 multipath off ; sleep 1 ; ip link set dev eth2 multipath on

the situation will be fixed and new subflows will be opened using all available gateways.

I used your patch only at the client's side. I assumed that it is necessary only on there.

Owner

cpaaschcommentedon 23 Nov 2016

Yes, it's only needed on the client's side.

You should also change the tcp-retries sysctl's to have faster timeouts:

sysctl -w net.ipv4.tcp_retries2=3

Please also take a packet-trace to see if you really get timeout.

Owner

cpaaschcommentedon 29 Nov 2016

@wapsi &@titer - Do you have an update?

cpaasch addedenhancement and removedquestion labelson 29 Nov 2016

wapsicommentedon 29 Nov 2016 •

I'm not able to get valid packet-trace atm. If I try to do this with tcpdump the cap file gets so huge before first subflow drops that it doesn't fit in my MPTCP router's HDD. Do you have any tips how to do this "sensibly"?

Owner

cpaaschcommentedon 2 Dec 2016

@wapsi You can limit the size of the packet-trace with the option-s 150. Then, if that's not enough, add-C 100 -W 10 -w capture. This limits the file-size to 100MB and overwrites the original when rotating.

djbobocommentedon 18 Dec 2016 •

Hi,
I'm seeing the same behavior on long running tcp connections.

Server with two interfaces - public (routable) ipv4
Client with two interfaces - masqueraded ipv4

Initial connection originates from the client (behind nat), full mesh is established as expected, in this case 2x2. However after some time one drops and it remains in 3 sub-flows.

Using OpenVPN (tcp) instead of ssh.

Using your debian kernel build https://dl.bintray.com/cpaasch/deb
Building kernel and will get back.

Owner