UDT4协议源码分析之数据的发送和接收

来源：互联网发布：2016年网络犯罪案例编辑：程序博客网时间：2024/04/28 17:40

协议分析

Data Sending and Receiving 数据发送和接收

Each UDT entity has two logical parts: the sender and the receiver. The sender sends (and retransmits) application data according to the flow control and congestion control. The receiver receives both data packets and control packets, and sends out control packets according to the received packets and the timers.

每个UDT 实体有两个逻辑部分：sender和receiver。sender根据流量控制和拥塞控制来发送和重传应用层数据。receiver负责接收数据报文和控制报文，并根据接收到的报文和定时器来发送控制报文。

The receiver is responsible for triggering and processing all control events, including congestion control and reliability control, and their related mechanisms.

receiver主要负责触发和处理所有的控制事件，包括拥塞控制、可靠控制和其它相关机制。

UDT always tries to pack application data into fixed size packets (the maximum packet size negotiated during connection setup), unless there is not enough data to be sent.

UDT试着将应用层数据打包成固定大小的报文（报文的最大长度要和连接方进行协商），除非没有足够的数据发送。

We explained the rationale of some of the UDT data sending/receiving schemes in [GHG04b].

我们在[GHG04b]中阐述了UDT数据发送和接收机制的基本原理。

[GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences in Design and Implementation of a High Performance Transport Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA.

The Sender's Algorithm

Data Structures and Variables:

数据结构和变量：

1) Sender's Loss List: The sender's loss list is used to store the sequence numbers of the lost packets fed back by the receiver through NAK packets or inserted in a timeout event. The numbers are stored in increasing order.

sender丢失列表：sender丢失列表用于存储丢失数据报文的序列号，这些序列号是receiver端通过NAK包反馈回来，或者是超时事件得到的。这些序列号是以升序的顺序存储的。

Data Sending Algorithm:

数据发送算法：

1) If the sender's loss list is not empty, retransmit the first packet in the list and remove it from the list. Go to 5).

2) In messaging mode, if the packets has been the loss list for a time more than the application specified TTL (time-to-live), send a message drop request and remove all related packets from the loss list. Go to 1).
3) Wait until there is application data to be sent.
4) a. If the number of unacknowledged packets exceeds the flow/congestion window size, wait until an ACK comes. Go to 1).
b. Pack a new data packet and send it out.
5) If the sequence number of the current packet is 16n, where n is an integer, go to 2).
6) Wait (SND - t) time, where SND is the inter-packet interval updated by congestion control and t is the total time used by step 1 to step 5. Go to 1).

1）如果sender丢失列表不为空，重传丢失列表中的第一个数据包并将其从丢失列表中删除。跳转到5）。

2）在message mode中，如果数据包在丢失列表中的时间超过了应用层设置的TTL，则发送一个message drop请求并将相关的报文移除丢失列表。跳转到1）。

3）等待直到应用层有数据发送。

4）a、如果未被确认的数据报文的数量超过了流量/拥塞窗口的大小，则等待直到有ACK到来。跳转到1）。

b、将一个新的数据包打包并发送出去。

5）如果当前数据报文的序列号是16n，n是整数，则跳转到2）。

6）等待（SND-t）时间，SND代表包的发送间隔，发送间隔是有拥塞控制进行调整的，t代表从步骤1到步骤5的执行时间。跳转到1）

The Receiver's Algorithm

Data Structures and Variables:

数据结构和变量：

1) Receiver's Loss List: It is a list of tuples whose values include: the sequence numbers of detected lost data packets, the latest feedback time of each tuple, and a parameter k that is the number of times each one has been fed back in NAK. Values are stored in the increasing order of packet sequence numbers.
2) ACK History Window: A circular array of each sent ACK andthe time it is sent out. The most recent value will overwrite the oldest one if no more free space in the array.
3) PKT History Window: A circular array that records the arrival time of each data packet.
4) Packet Pair Window: A circular array that records the time interval between each probing packet pair.
5) LRSN: A variable to record the largest received data packet sequence number. LRSN is initialized to the initial sequence number minus 1.
6) ExpCount: A variable to record number of continuous EXP time-out events.

1）Receiver丢失列表：它是一个元组列表，它的元素包括：检测到的丢失的数据报文的序列号、每个元组的最近一次反馈时间、每个序列号在NAK中被反馈的次数k。元素根据数据报文序列号的大小升序存储。

2）ACK历史窗口：它是一个循环数组，数组中的值包括了发送的ACK以及ACK发送出去的时间。如果数组空间不够用，则用最新的值覆盖最旧的值。

3）PKT历史窗口：它是一个循环数组，用来记录每个数据包到达的时间。

4）Packet Pair窗口：它是一个循环数组，记录了探测包之间的到达间隔。

5）LRSN：是一个变量，该变量用来记录接收到的最大数据包的序列号。它的初始值为初始序列号-1。

6）ExpCount：一个变量来记录连续发生EXP（time-out）的次数。

Data Receiving Algorithm:
1) Query the system time to check if ACK, NAK, or EXP timer has expired. If there is any, process the event (as described below in this section) and reset the associated time variables.For ACK, also check the ACK packet interval.
2) Start time bounded UDP receiving. If no packet arrives, go to 1).
3) Reset the ExpCount to 1. If there is no unacknowledged data packet, or if this is an ACK or NAK control packet, reset the EXP timer.
4) Check the flag bit of the packet header. If it is a control packet, process it according to its type and go to 1).

5) If the sequence number of the current data packet is 16n + 1, where n is an integer, record the time interval between this packet and the last data packet in the Packet Pair Window.
6) Record the packet arrival time in PKT History Window.
7) a. If the sequence number of the current data packet is greater than LRSN + 1, put all the sequence numbers between (but excluding) these two values into the receiver's loss list and send them to the sender in an NAK packet.
b. If the sequence number is less than LRSN, remove it from the receiver's loss list.
8) Update LRSN. Go to 1).

数据接收算法：

1）询问系统时间，以便检查ACK、NAK或者EXP定时器是否超时。如果有其中一个超时，则处理以下描述的相关事件，以及重置相关的定时器变量。此外，还要检查ACK包的发送间隔。

2）开始时间是和UDP接收相绑定的。如果没有包到达，则跳转到1.

3）如果没有未被确认的包，或者如果这是一个ACK或NAK控制报文，重置EXP定时器，并将ExpCount设置成为1。

4）检查包头部的flag bit标志位。如果是控制报文，根据它的类型处理它，并跳转到1）

5）如果当前收到数据包的序列号为16n+1，n代表整型，则记录这个数据包和上一个数据包的时间间隔。

6）讲该间隔存储到PKT历史窗口中。

7）a、如果当前数据包的序列号大于LRSN+1，则将LRSN+1和当前收到的数据包的序列号之间的序列号放入到接收端丢失列表中，并通过NAK报文发送给发送端。

b、如果当前数据包的序列号小于LRSN，将其从接收端丢失列表中移除。

8）更新LRSN，跳转到1。

ACK Event Processing:
1) Find the sequence number prior to which all the packets have been received by the receiver (ACK number) according to the following rule: if the receiver's loss list is empty, the ACK number is LRSN + 1; otherwise it is the smallest sequence number in the receiver's loss list.
2) If (a) the ACK number equals to the largest ACK number ever acknowledged by ACK2, or (b) it is equal to the ACK number in the last ACK and the time interval between this two ACK packets is less than 2 RTTs, stop (do not send this ACK).
3) Assign this ACK a unique increasing ACK sequence number. Pack the ACK packet with RTT, RTT Variance, and flow window size (available receiver buffer size). If this ACK is not triggered by ACK timers, send out this ACK and stop.
4) Calculate the packet arrival speed according to the following algorithm: Calculate the median value of the last 16 packet arrival intervals (AI) using the values stored in PKT History Window. In these 16 values, remove those either greater than AI*8 or less than AI/8. If more than 8 values are left, calculate the average of the left values AI', and the packet arrival speed is 1/AI' (number of packets per second). Otherwise, return 0.
5) Calculate the estimated link capacity according to the following algorithm: Calculate the median value of thelast 16 packet pair intervals (PI) using the values in Packet Pair Window,and the link capacity is 1/PI (number of packets per second).
6) Pack the packet arrival speed and estimated link capacity into the ACK packet and send it out.
7) Record the ACK sequence number, ACK number and the departure time of this ACK in the ACK History Window.

ACK事件流程（ACK定时器触发或者丢包后触发）

1）根据如下规则确定接收端要发送的ACK number（注：这里的ACK number是数据包的序列号）：如果接收端丢失列表是空的，则ACK number 是LRSN+1；否则，ACK number为接收端丢失列表中最小的序列号。

2）如果（a）ACK number等于曾经被ACK2确认过的最大 ACK number，或者（b）等于上次发送出去的ACK number（注：发送出去的ACK number且没有收到该ACK number所对应的ACK2）并且两个ACK 包的间隔小于2*RTT，则不发送该ACK报文。

3）给这个ACK赋值上一个单增的ACK序列号（注：这里是ACK专门的序列号）。这个ACK会携带RTT，RTT variance,流量控制窗口大小（接收端的可用接收缓存）.如果这个ACK不是由ACK定时器触发的，则立马发送这个ACK。

4）根据以下算法计算包到达的速率：根据PKT History window中的值来计算上16个报文到达间隔的中值AI。在这16个值中，删除哪些大于AI*8 或者小于 AI/8的值

5）根据以下算法计算链路容量：根据Pakcet Pair Window中的值来计算16包对间隔PI，链路容量则为1/PI（每秒钟多少个包）

6）将之前计算出来的包到达速率和估计的链路容量放入到ACK包中，然后发送出去。

7）记录ACK sequence number和ACK number以及将这个ACK的离开时间存在ACK历史窗口中。

NAK Event Processing:
Search the receiver's loss list, find out all those sequence numbers whose last feedback time is k*RTT before, where k is initialized as 2 and increased by 1 each time the number is fed back. Compress (according to section 6.4) and send these numbers back to the sender in an NAK packet.

NAK事件流程（丢包后触发，注：在最新的UDT中，取消了NAK定时器，we are not sending back repeated NAK anymore and rely on the sender's EXP for retransmission）

搜索接收端的丢失列表，找出上一次反馈时间大于k*RTT的序列号，其中k为2，number fed back 后会增1。压缩这些丢失的序列号并放在NAK包中发送出去。

EXP Event Processing:
1) Put all the unacknowledged packets into the sender's loss list.
2) If (ExpCount > 16) and at least 3 seconds has elapsed since that last time when ExpCount is reset to 1, or, 3 minutes has elapsed, close the UDT connection and exit.
3) If the sender's loss list is empty, send a keep-alive packet to the peer side.
4) Increase ExpCount by 1.

EXP 事件流程

1）将没有确认的数据报文全部放入到发送端丢失列表中

2）如果ExpCount >16且自最近一次ExpCount重置为1时，时间流逝3s；或者，时间流逝3min，关闭UDT连接并退出。

3）如果发送丢失列表为空，则发送保活报文到peer side

4）将 ExpCount加1

On ACK packet received:
1) Update the largest acknowledged sequence number.
2) Send back an ACK2 with the same ACK sequence number in this ACK.
3) Update RTT and RTTVar.
4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN.
5) Update flow window size.
6) If this is a Light ACK, stop.
7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the value carried in the ACK.
8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is the value carried in the ACK.
9) Update sender's buffer (by releasing the buffer that has been acknowledged).
10) Update sender's loss list (by removing all those that has been acknowledged).

ACK报文的处理

1）更新最大已确认的序列号（数据包的序列号）

2）发送ACK2，ACK中携带ACK报文中的ACK序列号

3）更新RTT和RTTVar

4）更新ACK和NAK间隔为4*RTT+RTTVar+SYN（最新源代码中没有看到这个实现）

5）更新流量窗口大小

6）如果这个是Light ACK，停止处理

7）更新包到达速率：A=（A*7+a）/8，其中a位于ACK报文中

8）更新链路容量：B=（B*7+b）/8，其中b位于ACK报文中

9）更新发送端发送缓存（释放已经被确认的数据包）

10）更新发送丢失列表（移除已经确认的包的序列号）

On NAK packet received:
1) Add all sequence numbers carried in the NAK into the sender's loss list.
2) Update the SND period by rate control (see section 3.6).
3) Reset the EXP time variable.

NAK报文的处理

1）将NAK包中所携带的序列号全部取出来放入到发送丢失列表中。

2）通过速率控制调整发送速度

3）重置EXP时间变量

On ACK2 packet received:
1) Locate the related ACK in the ACK History Window according to the ACK sequence number in this ACK2.
2) Update the largest ACK number ever been acknowledged.
3) Calculate new rtt according to the ACK2 arrival time and the ACK departure time, and update the RTT value as: RTT = (RTT *7 + rtt) / 8.
4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4.
5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN.（注：更新NAK，在最新的代码中已经没有任何意义了，因为NAK不是依靠定时器触发的）

ACK2报文的处理

1）根据ACK2报文中的ACK number，在ACK History Window中找到相关的ACK

2）更新曾被确认的最大ACK number

3）根据ACK2的到达时间和ACK的离开时间，计算新的rtt，并更新RTT：RTT=（RTT*7+rtt）/8

4）更新RTTVar：RTTVar=（RTTVar*3+abs（RTT-rtt））/4

5）更新ACK和NAK period：4*RTT+RTTVar+SYN（注：更新NAK，在最新的代码中已经没有任何意义了，因为NAK不是依靠定时器触发的）

Flow Control
The flow control window size is 16 initially.
On ACK packet received:
The flow window size is updated to the receiver's available buffer size.

流量控制

流量控制窗口的初始大小为16.当收到ACK报文的时候，流量窗口的大小更新为接收端可用缓存的大小。

源代码分析

数据发送源码分析

在以下代码中需要注意的是self->m_pTimer->sleepto(ts);控制的是包与包之间的发送间隔。pop函数中有窗口控制。在一个UDT传输块中，发送窗口中的包是有发送间隔的，其间隔就是用sleepto控制的。在UDT拥塞控制算法中，窗口和窗口中的包发送间隔都是跟随着网络状况变化的。

#ifndef WIN32   void* CSndQueue::worker(void* param)#else   DWORD WINAPI CSndQueue::worker(LPVOID param)#endif{   CSndQueue* self = (CSndQueue*)param;//用SudQueue来控制线程   while (!self->m_bClosing)   {      uint64_t ts = self->m_pSndUList->getNextProcTime();//获取list中,处理下一个传输控制块的时间点      if (ts > 0)      {         // wait until next processing time of the first socket on the list         uint64_t currtime;         CTimer::rdtsc(currtime);         if (currtime < ts)//判断时间点是否到达            self->m_pTimer->sleepto(ts);//若没有到达，则sleep。控制list中不同传输控制块之间的处理间隔，为什么还要控制处理间隔呢？这里的间隔是包与包之间的间隔，和传输控制块的间隔是一回事         // it is time to send the next pkt         sockaddr* addr;         CPacket pkt;         if (self->m_pSndUList->pop(addr, pkt) < 0)//取相应传输控制块的数据，里面有窗口控制            continue;         self->m_pChannel->sendto(addr, pkt);//将数据包发送给目的地addr      }      else      {         // wait here if there is no sockets with data to be sent         #ifndef WIN32            pthread_mutex_lock(&self->m_WindowLock);            if (!self->m_bClosing && (self->m_pSndUList->m_iLastEntry < 0))               pthread_cond_wait(&self->m_WindowCond, &self->m_WindowLock);            pthread_mutex_unlock(&self->m_WindowLock);//等着send中的signal来触发         #else            WaitForSingleObject(self->m_WindowCond, INFINITE);         #endif      }   }   #ifndef WIN32      return NULL;   #else      SetEvent(self->m_ExitCond);      return 0;   #endif}

self—>m_pSndUList—>pop(addr,pkt)函数的源码分析

int CSndUList::pop(sockaddr*& addr, CPacket& pkt)//send UDT List{//CSndUList中的存储结构为CSNode   CGuard listguard(m_ListLock);   if (-1 == m_iLastEntry)//=-1说明m_pHeap中没有东西      return -1;   // no pop until the next schedulled time   uint64_t ts;   CTimer::rdtsc(ts);//在这里更新ts   if (ts < m_pHeap[0]->m_llTimeStamp)//m_pHeap中存储的是CSNode结构体,CSNode结构体中包含了指向UDT传输控制块的指针，timestamp以及CSNode节点位于m_pHeap结构的索引      return -1;//如果没有到达处理该传输控制块的时间点，则返回   CUDT* u = m_pHeap[0]->m_pUDT;//把UDT传输控制块从m_Heap中取出来   remove_(u);//并从m_pHeap中删除   if (!u->m_bConnected || u->m_bBroken)      return -1;   // pack a packet from the socket   if (u->packData(pkt, ts) <= 0)//ts主要用于控制包的发送间隔      return -1;   addr = u->m_pPeerAddr;   // insert a new entry, ts is the next processing time   if (ts > 0)//ts在pakcData中更新      insert_(ts, u);//重新将ts传递进去重新赋值给m_llTimeStamp,又将u加入到m_pHeap中，根据ts重新赋值后的大小，重新选择u在m_pHeap中的位置   return 1;}

u—>packData(pkt, ts)源码分析

int CUDT::packData(CPacket& packet, uint64_t& ts){   int payload = 0;   bool probe = false;   uint64_t entertime;   CTimer::rdtsc(entertime);   if ((0 != m_ullTargetTime) && (entertime > m_ullTargetTime))//初始值为0      m_ullTimeDiff += entertime - m_ullTargetTime;   // Loss retransmission always has higher priority.   if ((packet.m_iSeqNo = m_pSndLossList->getLostSeq()) >= 0)//如果有丢包   {      // protect m_iSndLastDataAck from updating by ACK processing      CGuard ackguard(m_AckLock);      int offset = CSeqNo::seqoff(m_iSndLastDataAck, packet.m_iSeqNo);      if (offset < 0)         return 0;      int msglen;      payload = m_pSndBuffer->readData(&(packet.m_pcData), offset, packet.m_iMsgNo, msglen);      if (-1 == payload)//message mode      {         int32_t seqpair[2];         seqpair[0] = packet.m_iSeqNo;         seqpair[1] = CSeqNo::incseq(seqpair[0], msglen);         sendCtrl(7, &packet.m_iMsgNo, seqpair, 8);         // only one msg drop request is necessary         m_pSndLossList->remove(seqpair[1]);         // skip all dropped packets         if (CSeqNo::seqcmp(m_iSndCurrSeqNo, CSeqNo::incseq(seqpair[1])) < 0)             m_iSndCurrSeqNo = CSeqNo::incseq(seqpair[1]);         return 0;      }      else if (0 == payload)         return 0;      ++ m_iTraceRetrans;      ++ m_iRetransTotal;   }   else//如果没有丢包   {      // If no loss, pack a new packet.      // check congestion/flow window limit      int cwnd = (m_iFlowWindowSize < (int)m_dCongestionWindow) ? m_iFlowWindowSize : (int)m_dCongestionWindow;//获取拥舍窗口大小      if (cwnd >= CSeqNo::seqlen(m_iSndLastAck, CSeqNo::incseq(m_iSndCurrSeqNo)))//小于拥塞窗口      {         if (0 != (payload = m_pSndBuffer->readData(&(packet.m_pcData), packet.m_iMsgNo)))         {            m_iSndCurrSeqNo = CSeqNo::incseq(m_iSndCurrSeqNo);            m_pCC->setSndCurrSeqNo(m_iSndCurrSeqNo);            packet.m_iSeqNo = m_iSndCurrSeqNo;            // every 16 (0xF) packets, a packet pair is sent            if (0 == (packet.m_iSeqNo & 0xF))//尽量使用位运算，发送包对               probe = true;         }         else         {            m_ullTargetTime = 0;// scheduled time of next packet sending            m_ullTimeDiff = 0; // aggregate difference in inter-packet time            ts = 0;            return 0;         }      }      else      {         m_ullTargetTime = 0;         m_ullTimeDiff = 0;         ts = 0;         return 0;      }   }   packet.m_iTimeStamp = int(CTimer::getTime() - m_StartTime);//m_StartTimezai CUDT::open()函数中初始化   packet.m_iID = m_PeerID;   packet.setLength(payload);   m_pCC->onPktSent(&packet);//更新包的发送间隔   //m_pSndTimeWindow->onPktSent(packet.m_iTimeStamp);   ++ m_llTraceSent;   ++ m_llSentTotal;   if (probe)   {      // sends out probing packet pair      ts = entertime;      probe = false;   }   else   {      #ifndef NO_BUSY_WAITING         ts = entertime + m_ullInterval;      #else         if (m_ullTimeDiff >= m_ullInterval)//处理时间大于包的发送间隔         {            ts = entertime;            m_ullTimeDiff -= m_ullInterval;         }         else//处理时间小于包的发送间隔，减去处理时间，更新发送间隔         {            ts = entertime + m_ullInterval - m_ullTimeDiff;            m_ullTimeDiff = 0;         }      #endif   }   m_ullTargetTime = ts;// scheduled time of next packet sending   return payload;}

数据接收源码分析

void*CRcvQueue::worker分析

#ifndef WIN32   void* CRcvQueue::worker(void* param)#else   DWORD WINAPI CRcvQueue::worker(LPVOID param)#endif{   CRcvQueue* self = (CRcvQueue*)param;   sockaddr* addr = (AF_INET == self->m_UnitQueue.m_iIPversion) ? (sockaddr*) new sockaddr_in : (sockaddr*) new sockaddr_in6;//开辟缓存存地址   CUDT* u = NULL;   int32_t id;   while (!self->m_bClosing)   {      #ifdef NO_BUSY_WAITING         self->m_pTimer->tick();//signal Tick_Cond      #endif      // check waiting list, if new socket, insert it to the list      while (self->ifNewEntry())//判断是否有新的传输控制块      {         CUDT* ne = self->getNewEntry();//获取新加入的传输控制块,并将获取的CUDT从原存储结构中删除         if (NULL != ne)         {            self->m_pRcvUList->insert(ne);//将新的传输控制块加入的RcvUList中            self->m_pHash->insert(ne->m_SocketID, ne);//将套接口ID和传输控制块插入到hash表中，以便查找         }      }      // find next available slot for incoming packet      CUnit* unit = self->m_UnitQueue.getNextAvailUnit();//CUnit是用来存CPacket      if (NULL == unit)      {         // no space, skip this packet         CPacket temp;         temp.m_pcData = new char[self->m_iPayloadSize];         temp.setLength(self->m_iPayloadSize);         self->m_pChannel->recvfrom(addr, temp);         delete [] temp.m_pcData;//CUnit空间不够，将收到的数据包删除         goto TIMER_CHECK;      }      unit->m_Packet.setLength(self->m_iPayloadSize);      // reading next incoming packet, recvfrom returns -1 is nothing has been received      if (self->m_pChannel->recvfrom(addr, unit->m_Packet) < 0)         goto TIMER_CHECK;      id = unit->m_Packet.m_iID;      // ID 0 is for connection request, which should be passed to the listening socket or rendezvous sockets      if (0 == id)//握手请求      {         if (NULL != self->m_pListener)//说明还有CUDT在监听            self->m_pListener->listen(addr, unit->m_Packet);         else if (NULL != (u = self->m_pRendezvousQueue->retrieve(addr, id)))         {            // asynchronous connect: call connect here            // otherwise wait for the UDT socket to retrieve this packet            if (!u->m_bSynRecving)               u->connect(unit->m_Packet);            else               self->storePkt(id, unit->m_Packet.clone());         }      }      else if (id > 0)      {         if (NULL != (u = self->m_pHash->lookup(id)))//通过id找到对应的传输控制块，即UDT实例         {            if (CIPAddress::ipcmp(addr, u->m_pPeerAddr, u->m_iIPversion))            {               if (u->m_bConnected && !u->m_bBroken && !u->m_bClosing)               {                  if (0 == unit->m_Packet.getFlag())//如果是数据，则处理数据                     u->processData(unit);                  else                     u->processCtrl(unit->m_Packet);//处理控制消息                  u->checkTimers();//定时器触发                  self->m_pRcvUList->update(u);               }            }         }         else if (NULL != (u = self->m_pRendezvousQueue->retrieve(addr, id)))         {            if (!u->m_bSynRecving)               u->connect(unit->m_Packet);            else               self->storePkt(id, unit->m_Packet.clone());         }      }TIMER_CHECK:      // take care of the timing event for all UDT sockets      uint64_t currtime;      CTimer::rdtsc(currtime);      CRNode* ul = self->m_pRcvUList->m_pUList;      uint64_t ctime = currtime - 100000 * CTimer::getCPUFrequency();      while ((NULL != ul) && (ul->m_llTimeStamp < ctime))      {         CUDT* u = ul->m_pUDT;         if (u->m_bConnected && !u->m_bBroken && !u->m_bClosing)         {            u->checkTimers();            self->m_pRcvUList->update(u);         }         else         {            // the socket must be removed from Hash table first, then RcvUList            self->m_pHash->remove(u->m_SocketID);            self->m_pRcvUList->remove(u);            u->m_pRNode->m_bOnList = false;         }         ul = self->m_pRcvUList->m_pUList;      }      // Check connection requests status for all sockets in the RendezvousQueue.      self->m_pRendezvousQueue->updateConnStatus();   }   if (AF_INET == self->m_UnitQueue.m_iIPversion)      delete (sockaddr_in*)addr;   else      delete (sockaddr_in6*)addr;   #ifndef WIN32      return NULL;   #else      SetEvent(self->m_ExitCond);      return 0;   #endif}

ACK事件流程和EXP事件流程的源代码

void CUDT::checkTimers(){   // update CC parameters   CCUpdate();//包间隔，拥塞窗口   //uint64_t minint = (uint64_t)(m_ullCPUFrequency * m_pSndTimeWindow->getMinPktSndInt() * 0.9);   //if (m_ullInterval < minint)   //   m_ullInterval = minint;   uint64_t currtime;   CTimer::rdtsc(currtime);   if ((currtime > m_ullNextACKTime) || ((m_pCC->m_iACKInterval > 0) && (m_pCC->m_iACKInterval <= m_iPktCount)))//ACK是通过定时器和累积确认双重触发的   {      // ACK timer expired or ACK interval is reached  //ACK interval: How many packets to send one ACK      sendCtrl(2);      CTimer::rdtsc(currtime);      if (m_pCC->m_iACKPeriod > 0)         m_ullNextACKTime = currtime + m_pCC->m_iACKPeriod * m_ullCPUFrequency;      else         m_ullNextACKTime = currtime + m_ullACKInt;      m_iPktCount = 0;//packet counter for ACK,为累计确认做准备      m_iLightACKCount = 1;   }   else if (m_iSelfClockInterval * m_iLightACKCount <= m_iPktCount)   {      //send a "light" ACK      sendCtrl(2, NULL, NULL, 4);      ++ m_iLightACKCount;   }//这里取消了NAK定时器，NAK包只会发送一次，即有丢包的时候会立马发送，不会通过NAK定时器触发   // we are not sending back repeated NAK anymore and rely on the sender's EXP for retransmission   //if ((m_pRcvLossList->getLossLength() > 0) && (currtime > m_ullNextNAKTime))   //{   //   // NAK timer expired, and there is loss to be reported.   //   sendCtrl(3);   //   //   CTimer::rdtsc(currtime);   //   m_ullNextNAKTime = currtime + m_ullNAKInt;   //}   uint64_t next_exp_time;   if (m_pCC->m_bUserDefinedRTO)// if the RTO is defined by users      next_exp_time = m_ullLastRspTime + m_pCC->m_iRTO * m_ullCPUFrequency;   else   {      uint64_t exp_int = (m_iEXPCount * (m_iRTT + 4 * m_iRTTVar) + m_iSYNInterval) * m_ullCPUFrequency;      if (exp_int < m_iEXPCount * m_ullMinExpInt)         exp_int = m_iEXPCount * m_ullMinExpInt;      next_exp_time = m_ullLastRspTime + exp_int;   }   if (currtime > next_exp_time)   {      // Haven't receive any information from the peer, is it dead?!      // timeout: at least 16 expirations and must be greater than 10 seconds      if ((m_iEXPCount > 16) && (currtime - m_ullLastRspTime > 5000000 * m_ullCPUFrequency))//m_iEXPCount在processData中会被设置为1      {         //         // Connection is broken.          // UDT does not signal any information about this instead of to stop quietly.         // Application will detect this when it calls any UDT methods next time.         //         m_bClosing = true;         m_bBroken = true;         m_iBrokenCounter = 30;         // update snd U list to remove this socket         m_pSndQueue->m_pSndUList->update(this);         releaseSynch();         // app can call any UDT API to learn the connection_broken error         s_UDTUnited.m_EPoll.update_events(m_SocketID, m_sPollID, UDT_EPOLL_IN | UDT_EPOLL_OUT | UDT_EPOLL_ERR, true);         CTimer::triggerEvent();         return;      }     //发送端的EXP事件处理      // sender: Insert all the packets sent after last received acknowledgement into the sender loss list.      // recver: Send out a keep-alive packet      if (m_pSndBuffer->getCurrBufSize() > 0)      {         if ((CSeqNo::incseq(m_iSndCurrSeqNo) != m_iSndLastAck) && (m_pSndLossList->getLossLength() == 0))         {            // resend all unacknowledged packets on timeout, but only if there is no packet in the loss list            int32_t csn = m_iSndCurrSeqNo;            int num = m_pSndLossList->insert(m_iSndLastAck, csn);            m_iTraceSndLoss += num;            m_iSndLossTotal += num;         }         m_pCC->onTimeout();         CCUpdate();         // immediately restart transmission         m_pSndQueue->m_pSndUList->update(this);      }      else      {         sendCtrl(1);      }      ++ m_iEXPCount;//这个在processData中会被置为1      // Reset last response time since we just sent a heart-beat.      m_ullLastRspTime = currtime;   }}

数据包处理以及NAK事件流程的源代码

由于NAK事件是在数据包处理的过程中进行触发的，所以这里将两者进行统一的介绍和分析

int CUDT::processData(CUnit* unit){   CPacket& packet = unit->m_Packet;   // Just heard from the peer, reset the expiration count.   m_iEXPCount = 1;//重置EXP   uint64_t currtime;   CTimer::rdtsc(currtime);   m_ullLastRspTime = currtime;// time stamp of last response from the peer   m_pCC->onPktReceived(&packet);//这个函数跳转到哪里去了？   ++ m_iPktCount;   // update time information   m_pRcvTimeWindow->onPktArrival();//记录包到达的时间,记录的目的用于计算包的到达速率，然后将计算的速率通过ACK反馈回去   // check if it is probing packet pair用于估计链路容量，将计算的容量通过ACK反馈回去   if (0 == (packet.m_iSeqNo & 0xF))//检查是否为包对      m_pRcvTimeWindow->probe1Arrival();//记录包对中第一个包的到达时间   else if (1 == (packet.m_iSeqNo & 0xF))      m_pRcvTimeWindow->probe2Arrival();//计算包对的到达时间间隔，并存储在m_piProbeWindow中   ++ m_llTraceRecv;   ++ m_llRecvTotal;   int32_t offset = CSeqNo::seqoff(m_iRcvLastAck, packet.m_iSeqNo);   if ((offset < 0) || (offset >= m_pRcvBuffer->getAvailBufSize()))      return -1;   if (m_pRcvBuffer->addData(unit, offset) < 0)//将数据包加入到接收缓存      return -1;   // Loss detection.   if (CSeqNo::seqcmp(packet.m_iSeqNo, CSeqNo::incseq(m_iRcvCurrSeqNo)) > 0)//incseq静态内敛函数：序列号+1，比较当前包的序列号和期望收到下一个包的序列号的大小   {      // If loss found, insert them to the receiver loss list      m_pRcvLossList->insert(CSeqNo::incseq(m_iRcvCurrSeqNo), CSeqNo::decseq(packet.m_iSeqNo));//如果发现丢包，则将丢失的包的序列号加入到丢失列表中      // pack loss list for NAK,NAK压缩      int32_t lossdata[2];      lossdata[0] = CSeqNo::incseq(m_iRcvCurrSeqNo) | 0x80000000;      lossdata[1] = CSeqNo::decseq(packet.m_iSeqNo);      // Generate loss report immediately.      sendCtrl(3, NULL, lossdata, (CSeqNo::incseq(m_iRcvCurrSeqNo) == CSeqNo::decseq(packet.m_iSeqNo)) ? 1 : 2);//这会不会和发送线程中底层的send产生竞态关系呢      int loss = CSeqNo::seqlen(m_iRcvCurrSeqNo, packet.m_iSeqNo) - 2;      m_iTraceRcvLoss += loss;      m_iRcvLossTotal += loss;   }   // This is not a regular fixed size packet...      //an irregular sized packet usually indicates the end of a message, so send an ACK immediately   这充分体现出它只能用于大数据传输   if (packet.getLength() != m_iPayloadSize)         CTimer::rdtsc(m_ullNextACKTime);    // Update the current largest sequence number that has been received.   // Or it is a retransmitted packet, remove it from receiver loss list.   if (CSeqNo::seqcmp(packet.m_iSeqNo, m_iRcvCurrSeqNo) > 0)      m_iRcvCurrSeqNo = packet.m_iSeqNo;   else      m_pRcvLossList->remove(packet.m_iSeqNo);   return 0;}

控制消息处理processCtrl

其中包括了ACK的处理（ACK2的发送）、ACK2的处理、NAK包的处理、消息丢失请求报文的处理、保活报文的处理、握手报文/断开连接报文的处理、

ACK报文的处理

void CUDT::processCtrl(CPacket& ctrlpkt){   // Just heard from the peer, reset the expiration count.   m_iEXPCount = 1;//不管是收到控制消息还是数据报文，EXPCount都会设置为1,为checktimer中的EXP事件做铺垫   uint64_t currtime;   CTimer::rdtsc(currtime);   m_ullLastRspTime = currtime;//这里和processData是一致的   switch (ctrlpkt.getType())   {   case 2: //010 - Acknowledgement      {      int32_t ack;      // process a lite ACK      if (4 == ctrlpkt.getLength())//      {         ack = *(int32_t *)ctrlpkt.m_pcData;//ack seq         if (CSeqNo::seqcmp(ack, m_iSndLastAck) >= 0)         {            m_iFlowWindowSize -= CSeqNo::seqoff(m_iSndLastAck, ack);            m_iSndLastAck = ack;         }         break;      }       // read ACK seq. no.      ack = ctrlpkt.getAckSeqNo();//获取ACK序列号      // send ACK acknowledgement发送对ACK报文的确认，即发送ACK2      // number of ACK2 can be much less than number of ACK      uint64_t now = CTimer::getTime();      if ((currtime - m_ullSndLastAck2Time > (uint64_t)m_iSYNInterval) || (ack == m_iSndLastAck2))      {         sendCtrl(6, &ack);         m_iSndLastAck2 = ack;//ACK序列号         m_ullSndLastAck2Time = now;//发送ACK2的时间      }      // Got data ACK  注意区分data ACK 和 ACK seq      ack = *(int32_t *)ctrlpkt.m_pcData;//获取数据包的ACK      // check the validation of the ack      if (CSeqNo::seqcmp(ack, CSeqNo::incseq(m_iSndCurrSeqNo)) > 0)      {         //this should not happen: attack or bug         m_bBroken = true;         m_iBrokenCounter = 0;         break;      }      if (CSeqNo::seqcmp(ack, m_iSndLastAck) >= 0)      {         // Update Flow Window Size, must update before and together with m_iSndLastAck         m_iFlowWindowSize = *((int32_t *)ctrlpkt.m_pcData + 3);         m_iSndLastAck = ack;      }      // protect packet retransmission      CGuard::enterCS(m_AckLock);      int offset = CSeqNo::seqoff(m_iSndLastDataAck, ack);      if (offset <= 0)      {         // discard it if it is a repeated ACK         CGuard::leaveCS(m_AckLock);         break;      }      // acknowledge the sending buffer      m_pSndBuffer->ackData(offset);//删除发送缓存中的数据（这些数据接收端已经收到了）      // record total time used for sending      m_llSndDuration += currtime - m_llSndDurationCounter;      m_llSndDurationTotal += currtime - m_llSndDurationCounter;      m_llSndDurationCounter = currtime;      // update sending variables      m_iSndLastDataAck = ack;      m_pSndLossList->remove(CSeqNo::decseq(m_iSndLastDataAck));      CGuard::leaveCS(m_AckLock);      #ifndef WIN32         pthread_mutex_lock(&m_SendBlockLock);         if (m_bSynSending)            pthread_cond_signal(&m_SendBlockCond);         pthread_mutex_unlock(&m_SendBlockLock);      #else         if (m_bSynSending)            SetEvent(m_SendBlockCond);      #endif      // acknowledde any waiting epolls to write      s_UDTUnited.m_EPoll.update_events(m_SocketID, m_sPollID, UDT_EPOLL_OUT, true);      // insert this socket to snd list if it is not on the list yet      m_pSndQueue->m_pSndUList->update(this, false);      // Update RTT      //m_iRTT = *((int32_t *)ctrlpkt.m_pcData + 1);      //m_iRTTVar = *((int32_t *)ctrlpkt.m_pcData + 2);      int rtt = *((int32_t *)ctrlpkt.m_pcData + 1);      m_iRTTVar = (m_iRTTVar * 3 + abs(rtt - m_iRTT)) >> 2;      m_iRTT = (m_iRTT * 7 + rtt) >> 3;      m_pCC->setRTT(m_iRTT);      if (ctrlpkt.getLength() > 16)      {         // Update Estimated Bandwidth and packet delivery rate         if (*((int32_t *)ctrlpkt.m_pcData + 4) > 0)            m_iDeliveryRate = (m_iDeliveryRate * 7 + *((int32_t *)ctrlpkt.m_pcData + 4)) >> 3;//计算包到达速率         if (*((int32_t *)ctrlpkt.m_pcData + 5) > 0)            m_iBandwidth = (m_iBandwidth * 7 + *((int32_t *)ctrlpkt.m_pcData + 5)) >> 3;//估计链路容量，这种计算方法与实际偏差较大         m_pCC->setRcvRate(m_iDeliveryRate);         m_pCC->setBandwidth(m_iBandwidth);      }      m_pCC->onACK(ack);      CCUpdate();      ++ m_iRecvACK;      ++ m_iRecvACKTotal;      break;      }...

ACK2报文的处理

 case 6: //110 - Acknowledgement of Acknowledgement      {      int32_t ack;      int rtt = -1;      // update RTT      rtt = m_pACKWindow->acknowledge(ctrlpkt.getAckSeqNo(), ack);//ACK seq no, DATA seq no      if (rtt <= 0)         break;      //if increasing delay detected...      //   sendCtrl(4);      // RTT EWMA      m_iRTTVar = (m_iRTTVar * 3 + abs(rtt - m_iRTT)) >> 2;      m_iRTT = (m_iRTT * 7 + rtt) >> 3;      m_pCC->setRTT(m_iRTT);//更新RTT      // update last ACK that has been received by the sender      if (CSeqNo::seqcmp(ack, m_iRcvLastAckAck) > 0)         m_iRcvLastAckAck = ack;      break;      }

NAK报文的处理

 case 3: //011 - Loss Report      {      int32_t* losslist = (int32_t *)(ctrlpkt.m_pcData);      m_pCC->onLoss(losslist, ctrlpkt.getLength() / 4);//拥塞参数的计算      CCUpdate();      bool secure = true;      // decode loss list message and insert loss into the sender loss list      for (int i = 0, n = (int)(ctrlpkt.getLength() / 4); i < n; ++ i)      {         if (0 != (losslist[i] & 0x80000000))         {            if ((CSeqNo::seqcmp(losslist[i] & 0x7FFFFFFF, losslist[i + 1]) > 0) || (CSeqNo::seqcmp(losslist[i + 1], m_iSndCurrSeqNo) > 0))            {               // seq_a must not be greater than seq_b; seq_b must not be greater than the most recent sent seq               secure = false;               break;            }            int num = 0;            if (CSeqNo::seqcmp(losslist[i] & 0x7FFFFFFF, m_iSndLastAck) >= 0)               num = m_pSndLossList->insert(losslist[i] & 0x7FFFFFFF, losslist[i + 1]);            else if (CSeqNo::seqcmp(losslist[i + 1], m_iSndLastAck) >= 0)               num = m_pSndLossList->insert(m_iSndLastAck, losslist[i + 1]);            m_iTraceSndLoss += num;            m_iSndLossTotal += num;            ++ i;         }         else if (CSeqNo::seqcmp(losslist[i], m_iSndLastAck) >= 0)         {            if (CSeqNo::seqcmp(losslist[i], m_iSndCurrSeqNo) > 0)            {               //seq_a must not be greater than the most recent sent seq               secure = false;               break;            }            int num = m_pSndLossList->insert(losslist[i], losslist[i]);            m_iTraceSndLoss += num;            m_iSndLossTotal += num;         }      }      if (!secure)      {         //this should not happen: attack or bug         m_bBroken = true;         m_iBrokenCounter = 0;         break;      }      // the lost packet (retransmission) should be sent out immediately      m_pSndQueue->m_pSndUList->update(this);      ++ m_iRecvNAK;      ++ m_iRecvNAKTotal;      break;      }

0 0