Linux SO_KEEPALIVE保活属性的使用及其意义

来源:互联网 发布:软件 商标注册证 编辑:程序博客网 时间:2024/06/05 16:13

对于面向连接的TCP socket,在实际应用中通常都要检测对端是否处于连接中,连接端口分两种情况:
1. 连接正常关闭,调用close() shutdown()连接优雅关闭,send与recv立马返回错误,select返回SOCK_ERR;
2. 连接的对端异常关闭,比如网络断掉,突然断电.
对于第二种情况,判断连接是否断开的方法有一下几种:
1. 自己编写心跳包程序,简单的说就是自己的程序加入一条线程,定时向对端发送数据包,查看是否有ACK,根据ACK的返回情况来管理连接。此方法比较通用,一般使用业务层心跳处理,灵活可控,但改变了现有的协议;
2. 使用TCP的keepalive机制,UNIX网络编程不推荐使用SO_KEEPALIVE来做心跳检测(为什么??)。
keepalive原理:TCP内嵌有心跳包,以服务端为例,当server检测到超过一定时间(/proc/sys/net/ipv4/tcp_keepalive_time 7200 即2小时)没有数据传输,那么会向client端发送一个keepalive packet,此时client端有三种反应:
1. client端连接正常,返回一个ACK.server端收到ACK后重置计时器,在2小时后在发送探测.如果2小时内连接上有数据传输,那么在该时间的基础上向后推延2小时发送探测包;
2. 客户端异常关闭,或网络断开。client无响应,server收不到ACK,在一定时间(/proc/sys/net/ipv4/tcp_keepalive_intvl 75 即75秒)后重发keepalive packet, 并且重发一定次数(/proc/sys/net/ipv4/tcp_keepalive_probes 9 即9次);
3. 客户端曾经崩溃,但已经重启.server收到的探测响应是一个复位,server端终止连接。
修改三个参数的系统默认值
临时方法:向三个文件中直接写入参数,系统重启需要重新设置;
临时方法:sysctl -w net.ipv4.tcp_keepalive_intvl=20
全局设置:可更改/etc/sysctl.conf,加上:
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60

/* Set TCP keep alive option to detect dead peers. The interval option  * is only used for Linux as we are using Linux-specific APIs to set  * the probe send time, interval, and count. */  int anetKeepAlive(char *err, int fd, int interval)  {      int val = 1;      //开启keepalive机制      if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) == -1)      {          anetSetError(err, "setsockopt SO_KEEPALIVE: %s", strerror(errno));          return ANET_ERR;      }  #ifdef __linux__      /* Default settings are more or less garbage, with the keepalive time      * set to 7200 by default on Linux. Modify settings to make the feature      * actually useful. */      /* Send first probe after interval. */      val = interval;      if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val)) < 0) {          anetSetError(err, "setsockopt TCP_KEEPIDLE: %s\n", strerror(errno));          return ANET_ERR;      }      /* Send next probes after the specified interval. Note that we set the      * delay as interval / 3, as we send three probes before detecting      * an error (see the next setsockopt call). */      val = interval/3;      if (val == 0) val = 1;      if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val)) < 0) {          anetSetError(err, "setsockopt TCP_KEEPINTVL: %s\n", strerror(errno));          return ANET_ERR;      }      /* Consider the socket in error state after three we send three ACK      * probes without getting a reply. */      val = 3;      if (setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val)) < 0) {          anetSetError(err, "setsockopt TCP_KEEPCNT: %s\n", strerror(errno));          return ANET_ERR;      }  #endif      return ANET_OK;  }