UDT拥塞控制算法协议及源码分析

来源：互联网发布：java hasnext 编辑：程序博客网时间：2024/06/04 18:42

此节对应与UDT协议文本的第七节。

Configurable Congestion Control (CCC)

The congestion control in UDT is an open framework so that userdefined control algorithm can be easily implemented and switched. Particularly, the native control algorithm is also implemented by this framework. The user-defined algorithm may redefine several control routines to read and adjust several UDT parameters. The routines will becalled when certain event occurs. For example, when an ACK is received, the control algorithm may increase the congestion window size.

可配置的拥塞控制

UDT中的拥塞控制是一个开放的框架，以致于用户自定义的拥塞控制算法很容易被实现和切换。特别的是，原生的控制算法也是用这个框架实现的。用户自定义的算法可以重新定义一些控制routines，读取和调整一些UDT参数。当相关事件发生的时候，routines将会被调用。例如，当收到一个ACK的时候，拥塞控制算法会增长拥塞窗口。

CCC Interface
UDT allow users to access two congestion control parameters:the congestion window size and the inter-packet sending interval. Users may adjust these two parameters to realize window-based control, rate-based control, or a hybrid approach. In addition, the following parameters should also be exposed.

1) RTT
2) Maximum Segment/Packet Size
3) Estimated Bandwidth
4) The latest packet sequence number that has been sent so far
5) Packet arriving rate at the receiver side

A UDT implementation may expose additional parameters as well. This information can be used in user-defined congestion control algorithms to adjust the packet sending rate.

The following control events can be redefined via CCC (e.g.,by a callback function).
1) init: when the UDT socket is connected.
2) close: when the UDT socket is closed.
3) onACK: when ACK is received.
4) onLOSS: when NAK is received.
5) onTimeout: when timeout occurs.
6) onPktSent: when a data packet is sent.
7) onPktRecv: when a data packet is received.

Users can also adjust the following parameters in the user-defined control algorithms.
1) ACK interval: An ACK may be sent every fixed number of packets. User may define this interval. If this value is -1, then it means no ACK will be sent based on packet interval.
2) ACK Timer: An ACK will also be sent every fixed time interval. This is mandatory in UDT. The maximum and default ACK time interval is SYN.
3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine this. Detailed description and discussion of UDT/CCC can be found in [GG05].

CCC接口

UDT允许用户访问两个拥塞控制参数：拥塞窗口大小和包与包之间的发送间隔。用户可以调整这两个参数来实现基于窗口的控制算法、基于速率的控制算法或者是一套混合的控制算法。初次之外，以下参数也是用户可见的。

1）RTT

2）最大报文段/包大小

3）估计带宽

4）最近发送的最大报文序列号

5）包在接收端的到达速率

在UDT的实现中暴露了一些额外的参数给用户。用户自定义的拥塞控制算法中使用这些参数来调整包的发送速度。

下面的控制事件可以通过CCC进行重定义（例如，通过一个回调函数）

1）init：当UDT socket 建立连接时调用

2）close：当UDT socket关闭时调用

3）onACK：当收到ACK时调用

4）onLOSS：当收到NAK时调用

5）onTImeout：当定时器超时时调用

6）onPktSent：当发送数据包的时候调用

7）onPktRecv：当收到数据包的时候调用

用户还可以在自定义的拥塞控制算法中调整如下参数：（注：由1）和2）可见，ACK是有累积确认和定时器确认双重触发的，其中累积确认是可以通过参数设置关闭的，不是必须的）

1）ACK 间隔：当收到一定数量的数据包的时候，才发送ACK（注：累积确认）。用户可以自定义这个发送间隔。如果这个间隔是-1，则代表不启动这种ACK确认方法。

2）ACK定时器：通过定时器触发ACK报文（注：定时器触发）。这种触发方式在UDT中是必须的。最大的和默认的ACK时间间隔为SYN的值。

3）RTO：在UDT中，RTO=4*RTT+RTTVar。用户也可以重定义这种计算方法。对于UDT/CCC详细的描述和讨论详见[GG05].

UDT's Native Control Algorithm
UDT has a native and default control algorithm, which will be used if no user-defined algorithm is implemented and configured. The native UDT algorithm should be implemented using CCC. UDT's native algorithm is a hybrid congestion control algorithm, hence it adjusts both the congestion window size and the inter-packet interval. The native algorithm uses timer-based ACK and the ACK interval is SYN.

The initial congestion window size is 16 packets and the initial inter-packet interval is 0. The algorithm start with Slow Start phase until the first ACK or NAK arrives.

UDT的拥塞控制算法

UDT有自己默认的拥塞控制算法，如果用户没有自己实现和配置拥塞控制算法，那么UDT将使用其自身的拥塞控制算法。UDT的原生拥塞控制算法是通过CCC实现的。UDT的原生拥塞控制算法是一种混合拥塞控制算法，它可以调整拥塞窗口大小和包与包之间的发送间隔。这种原生的算法使用基于定时器的ACK，ACK时间间隔为SYN。（注：代码中在判断的时候同时包含了累积确认和基于定时器的确认）

拥塞窗口的初始值为16，包发送间隔的初始值为0。算法开始处于满启动阶段，直到收到第一个ACK或者NAK报文。（注：根据代码，只要收到ACK或者NAK报文，就退出慢启动状态）

On ACK packet received:
1) If the current status is in the slow start phase, set the congestion window size to the product of packet arrival rate and (RTT + SYN). Slow Start ends. Stop.
2) Set the congestion window size (CWND) to: CWND = A * (RTT+ SYN) + 16.
3) The number of sent packets to be increased in the next SYN period (inc) is calculated as:
if (B <= C)
inc = min_inc;
else
inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, min_inc);
where B is the estimated link capacity and C is the current sending speed. All are counted as packets per second. Beta is a constant value of 0.0000015. "min_inc" is the minimum increase value, 0.01 - i.e.,we will increase at least 1 packet per second.（如何理解0.01代表每秒至少增加一个包？）
4) The SND period is updated as: SND = (SND * SYN) / (SND * inc + SYN). These four parameters are used in rate decrease, and their initial values are in the parentheses: AvgNAKNum (1), NAKCount (1), DecCount(1), LastDecSeq (initial sequence number - 1).

收到ACK报文时：

1）如果当前在慢启动状态，拥塞窗口大小=包到达速率（注：接收端统计后，通过ACK捎带给发送端）*（RTT+SYN）。慢启动结束。停止。

2）如果不是在满启动状态，拥塞窗口大小（CWND）=A*（RTT+SYN）+16.

3）在下一个SYN period，需要增加发送的包inc为：

if （B <=C）

inc = min_inc;

else

inc = max（10^ceil（log10（10（（B-C）*PS*8）））*Beta/PS，min_inc）

其中，B是估计的链路容量，C是当前的发送速率，它们的单位均为packets/s。Beta是一个常量0.0000015，min_inc是一个最小增长值，0.01-i.e，我们每秒钟至少增加一个包。（如何理解0.01代表每秒至少增加一个包？）

4）发送间隔更新为：SND=（SND*SYN）/（SND*inc+SYN）。以下四个参数用于速率减小的时候，它们的初始值为AvgNAKNum（1），NAKCount（1），DecCount（1），LastDecSeq（和初始的序列号一样为-1）

onACK源代码

void CUDTCC::onACK(int32_t ack)//收到ACK报文后会调用该函数{   int64_t B = 0;   double inc = 0;   // Note: 1/24/2012   // The minimum increase parameter is increased from "1.0 / m_iMSS" to 0.01   // because the original was too small and caused sending rate to stay at low level   // for long time.   const double min_inc = 0.01;   uint64_t currtime = CTimer::getTime();   if (currtime - m_LastRCTime < (uint64_t)m_iRCInterval)      return;   m_LastRCTime = currtime;//更新速率增长的时刻   if (m_bSlowStart)//如果是在慢启动阶段   {      m_dCWndSize += CSeqNo::seqlen(m_iLastAck, ack);//增加发送窗口      m_iLastAck = ack;      if (m_dCWndSize > m_dMaxCWndSize)//如果大于最大发送窗口      {         m_bSlowStart = false;//退出慢启动阶段         if (m_iRcvRate > 0)//这个是ACK反馈回来的接收速率信息，赋值为m_iRcvRate            m_dPktSndPeriod = 1000000.0 / m_iRcvRate;//m_dPktSndPeriod是包的发送间隔us，         else            m_dPktSndPeriod = m_dCWndSize / (m_iRTT + m_iRCInterval);      }   }   else//如果不在慢启动阶段      m_dCWndSize = m_iRcvRate / 1000000.0 * (m_iRTT + m_iRCInterval) + 16;  //由于m_iRcvRate单位是packet/s，而m_iRTT等是us。所以要对m_iRcvRate进行单位换算，发送窗口是通过接收速率*（RTT+RCInterval）+16来控制的   // During Slow Start, no rate increase   if (m_bSlowStart)      return;   if (m_bLoss)   {      m_bLoss = false;//收到ACK代表没有丢失，改变状态      return;   }   B = (int64_t)(m_iBandwidth - 1000000.0 / m_dPktSndPeriod);//理论上是网络中空闲的带宽   if ((m_dPktSndPeriod > m_dLastDecPeriod) && ((m_iBandwidth / 9) < B))//第二个表达式有什么实际意义，没想明白      B = m_iBandwidth / 9;   if (B <= 0)      inc = min_inc;   else   {      // inc = max(10 ^ ceil(log10( B * MSS * 8 ) * Beta / MSS, 1/MSS)      // Beta = 1.5 * 10^(-6)      inc = pow(10.0, ceil(log10(B * m_iMSS * 8.0))) * 0.0000015 / m_iMSS;      if (inc < min_inc)         inc = min_inc;   }   m_dPktSndPeriod = (m_dPktSndPeriod * m_iRCInterval) / (m_dPktSndPeriod * inc + m_iRCInterval);   /*分析下上面那个公式*/   /*   m_dPktSndPeriod = (m_dPktSndPeriod)*(m_iRCInterval/(m_dPktSndPeriod*inc + m_iRCInterval))),   其中(m_iRCInterval/(m_dPktSndPeriod*inc + m_iRCInterval))<1,inc越大这个   式子的值就越小，从而得到的m_dPktSndPeriod就越小，发送速率就越大，   至于为什么要采取这样的计算方式就不知道了   */}

We define a congestion period as the period between two NAKs in which the first biggest lost packet sequence number is greater than the LastDecSeq, which is the biggest sequence number when last time the packet sending rate is decreased. AvgNAKNum is the average number of NAKs in a congestion period. NAKCount is the current number of NAKs in the current period.

我们将两个NAK包之间的时间称为为一个拥塞时间段，在NAK包中，第一个最大的丢失包的序列号要大于LastDecSeq，LastDecSeq代表上一次减小速率时的最大序列号。AvgNAKNum是一个拥塞时间段的平均NAK数量。

NAKCount是当前时间段的NAK数量。

On NAK packet received:
1) If it is in slow start phase, slow start ends.If receiving rate has been observed set inter-packet interval to1/recvrate, stop. Otherwise, set the sending rate according to
the current window size (cwnd / rtt + SYN), continue to decrease it with step 2.
2) If this NAK starts a new congestion period, increase inter-packet interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset NAKCount to 1, and compute DecRandom to a random (average distribution) number between 1 and AvgNAKNum. Update LastDecSeq. Stop.
3) If DecCount <= 5, and NAKCount == DecCount * DecRandom:
a. Update SND period: SND = SND * 1.125;
b. Increase DecCount by 1;
c. Record the current largest sent sequence number (LastDecSeq).
The native UDT control algorithm is designed for bulk data transfer over high BDP networks. [GHG04a]

收到NAK包时：

1）如果在慢启动阶段，结束慢启动。如果inter-packet interval=1/recvrate,则停止（注：代码中则调用return）。否则，根据当前的窗口大小（cwnd/rtt+SYN）设置发送速率（也就是发包间隔inter-packet interval），然后跳转到step2）继续减小发送速率；

2）如果这个NAK开启了一个新的拥塞时段，增加包的发送间隔inter-packet interval（snd）snd=snd*1.125；更新 AvgNAKNum，将NAKCount重置为1，计算DecRandom，从1和AvgNAKNum中产生随机数（平均分布）赋值给DecRandom。更新LastDecSeq，停止。

3）如果DecCount<=5并且NAKCount==DecCount*DecRandom：

a.更新SND时间：SND=SND*1.125；

b.将DecCount加1；

c.记录当前发送的最大序列号（LastDecSeq）

原生UDT拥塞控制算法主要是为高BDP网络下大数据传输而设计的[GHG04a]

onLoss源代码

void CUDTCC::onLoss(const int32_t* losslist, int)//收到NAK报文后会调用该函数{   //Slow Start stopped, if it hasn't yet   if (m_bSlowStart)//如果在慢启动阶段   {      m_bSlowStart = false;//退出慢启动阶段      if (m_iRcvRate > 0)//单位是每秒钟多少个包packets/s,在处理ACK报文的时候设置，根据接收段的接收速度来设置发送端的m_iRcvRate，从而控制发送速度。      {         // Set the sending rate to the receiving rate.         m_dPktSndPeriod = 1000000.0 / m_iRcvRate;//单位是每微妙发送多少个包pakcet/us         return;      }      // If no receiving rate is observed, we have to compute the sending      // rate according to the current window size, and decrease it      // using the method below.      m_dPktSndPeriod = m_dCWndSize / (m_iRTT + m_iRCInterval);//如果不满足以上条件，则通过窗口大小来控制速率   }   m_bLoss = true;   if (CSeqNo::seqcmp(losslist[0] & 0x7FFFFFFF, m_iLastDecSeq) > 0)   {      m_dLastDecPeriod = m_dPktSndPeriod;      m_dPktSndPeriod = ceil(m_dPktSndPeriod * 1.125);      m_iAvgNAKNum = (int)ceil(m_iAvgNAKNum * 0.875 + m_iNAKCount * 0.125);//加权因子      m_iNAKCount = 1;      m_iDecCount = 1;      m_iLastDecSeq = m_iSndCurrSeqNo;      // remove global synchronization using randomization      srand(m_iLastDecSeq);      m_iDecRandom = (int)ceil(m_iAvgNAKNum * (double(rand()) / RAND_MAX));      if (m_iDecRandom < 1)         m_iDecRandom = 1;   }   else if ((m_iDecCount ++ < 5) && (0 == (++ m_iNAKCount % m_iDecRandom)))   {      // 0.875^5 = 0.51, rate should not be decreased by more than half within a congestion period      m_dPktSndPeriod = ceil(m_dPktSndPeriod * 1.125);//增加包与包之间的发送间隔      m_iLastDecSeq = m_iSndCurrSeqNo;   }}

1 0