H264 RTP与RTCP SR

来源:互联网 发布:淘宝客服旺旺在哪里找 编辑:程序博客网 时间:2024/06/06 04:59
1. 网络抽象层单元类型 (NALU)

NALU 头由一个字节组成, 它的语法如下:

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |F|NRI|  Type   |
      +---------------+

F: 1 个比特.
  forbidden_zero_bit. 在 H.264 规范中规定了这一位必须为 0.

NRI: 2 个比特.
  nal_ref_idc. 取 00 ~ 11, 似乎指示这个 NALU 的重要性, 如 00 的 NALU 解码器可以丢弃它而不影响图像的回放. 不过一般情况下不太关心

这个属性.

Type: 5 个比特.
  nal_unit_type. 这个 NALU 单元的类型. 简述如下:

  0     没有定义
  1-23  NAL单元  单个 NAL 单元包.
  24    STAP-A   单一时间的组合包
  24    STAP-B   单一时间的组合包
  26    MTAP16   多个时间的组合包
  27    MTAP24   多个时间的组合包
  28    FU-A     分片的单元
  29    FU-B     分片的单元
  30-31 没有定义

2. 打包模式

  下面是 RFC 3550 中规定的 RTP 头的结构.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
      |            contributing source (CSRC) identifiers             |
      |                             ....                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  负载类型 Payload type (PT): 7 bits
  序列号 Sequence number (SN): 16 bits
  时间戳 Timestamp: 32 bits
  
  H.264 Payload 格式定义了三种不同的基本的负载(Payload)结构. 接收端可能通过 RTP Payload 
  的第一个字节来识别它们. 这一个字节类似 NALU 头的格式, 而这个头结构的 NAL 单元类型字段
  则指出了代表的是哪一种结构,

  这个字节的结构如下, 可以看出它和 H.264 的 NALU 头结构是一样的.
      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |F|NRI|  Type   |
      +---------------+
  字段 Type: 这个 RTP payload 中 NAL 单元的类型. 这个字段和 H.264 中类型字段的区别是, 当 type
  的值为 24 ~ 31 表示这是一个特别格式的 NAL 单元, 而 H.264 中, 只取 1~23 是有效的值.
   
  24    STAP-A   单一时间的组合包
  24    STAP-B   单一时间的组合包
  26    MTAP16   多个时间的组合包
  27    MTAP24   多个时间的组合包
  28    FU-A     分片的单元
  29    FU-B     分片的单元
  30-31 没有定义

  可能的结构类型分别有:

  1. 单一 NAL 单元模式
     即一个 RTP 包仅由一个完整的 NALU 组成. 这种情况下 RTP NAL 头类型字段和原始的 H.264的
  NALU 头类型字段是一样的.

  2. 组合封包模式
    即可能是由多个 NAL 单元组成一个 RTP 包. 分别有4种组合方式: STAP-A, STAP-B, MTAP16, MTAP24.
  那么这里的类型值分别是 24, 25, 26 以及 27.

  3. 分片封包模式
    用于把一个 NALU 单元封装成多个 RTP 包. 存在两种类型 FU-A 和 FU-B. 类型值分别是 28 和 29.

2.1 单一 NAL 单元模式

  对于 NALU 的长度小于 MTU 大小的包, 一般采用单一 NAL 单元模式.
  对于一个原始的 H.264 NALU 单元常由 [Start Code] [NALU Header] [NALU Payload] 三部分组成, 其中 Start Code 用于标示这是一个

NALU 单元的开始, 必须是 "00 00 00 01" 或 "00 00 01", NALU 头仅一个字节, 其后都是 NALU 单元内容.
  打包时去除 "00 00 01" 或 "00 00 00 01" 的开始码, 把其他数据封包的 RTP 包即可.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |F|NRI|  type   |                                               |
      +-+-+-+-+-+-+-+-+                                               |
      |                                                               |
      |               Bytes 2..n of a Single NAL unit                 |
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               :...OPTIONAL RTP padding        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


  如有一个 H.264 的 NALU 是这样的:

  [00 00 00 01 67 42 A0 1E 23 56 0E 2F ... ]

  这是一个序列参数集 NAL 单元. [00 00 00 01] 是四个字节的开始码, 67 是 NALU 头, 42 开始的数据是 NALU 内容.

  封装成 RTP 包将如下:

  [ RTP Header ] [ 67 42 A0 1E 23 56 0E 2F ]

  即只要去掉 4 个字节的开始码就可以了.


2.2 组合封包模式

  其次, 当 NALU 的长度特别小时, 可以把几个 NALU 单元封在一个 RTP 包中.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                          RTP Header                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         NALU 1 Data                           |
      :                                                               :
      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |               | NALU 2 Size                   | NALU 2 HDR    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                         NALU 2 Data                           |
      :                                                               :
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               :...OPTIONAL RTP padding        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


2.3 Fragmentation Units (FUs).

  而当 NALU 的长度超过 MTU 时, 就必须对 NALU 单元进行分片封包. 也称为 Fragmentation Units (FUs).
  
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | FU indicator  |   FU header   |                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
      |                                                               |
      |                         FU payload                            |
      |                                                               |
      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               :...OPTIONAL RTP padding        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      Figure 14.  RTP payload format for FU-A

   The FU indicator octet has the following format:

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |F|NRI|  Type   |
      +---------------+

   The FU header has the following format:

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |S|E|R|  Type   |
      +---------------+

 

SR: Sender Report RTCP Packet

 
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
header |V=2|P|    RC   |   PT=SR=200   |             length            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         SSRC of sender                        |
       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
sender |              NTP timestamp, most significant word             |
info   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |             NTP timestamp, least significant word             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         RTP timestamp                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                     sender's packet count                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      sender's octet count                     |
       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
report |                 SSRC_1 (SSRC of first source)                 |
block  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  1    | fraction lost |       cumulative number of packets lost       |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           extended highest sequence number received           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                      interarrival jitter                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                         last SR (LSR)                         |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                   delay since last SR (DLSR)                  |
       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
report |                 SSRC_2 (SSRC of second source)                |
block  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  2    :                               ...                             :
       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
       |                  profile-specific extensions                  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
   The sender report packet consists of three sections, possibly
   followed by a fourth profile-specific extension section if defined.
   The first section, the header, is 8 octets long.  The fields have the
   following meaning:
 
   version (V): 2 bits
      Identifies the version of RTP, which is the same in RTCP packets
      as in RTP data packets.  The version defined by this specification
      is two (2).
 
   padding (P): 1 bit
      If the padding bit is set, this individual RTCP packet contains
      some additional padding octets at the end which are not part of
      the control information but are included in the length field.  The
      last octet of the padding is a count of how many padding octets
      should be ignored, including itself (it will be a multiple of
      four).  Padding may be needed by some encryption algorithms with
      fixed block sizes.  In a compound RTCP packet, padding is only
      required on one individual packet because the compound packet is
      encrypted as a whole for the method in Section 9.1.  Thus, padding
      MUST only be added to the last individual packet, and if padding
      is added to that packet, the padding bit MUST be set only on that
      packet.  This convention aids the header validity checks described
      in Appendix A.2 and allows detection of packets from some early
      implementations that incorrectly set the padding bit on the first
      individual packet and add padding to the last individual packet.
 
   reception report count (RC): 5 bits
      The number of reception report blocks contained in this packet.  A
      value of zero is valid.
 
   packet type (PT): 8 bits
      Contains the constant 200 to identify this as an RTCP SR packet.
 
   length: 16 bits
      The length of this RTCP packet in 32-bit words minus one,
      including the header and any padding.  (The offset of one makes
      zero a valid length and avoids a possible infinite loop in
      scanning a compound RTCP packet, while counting 32-bit words
      avoids a validity check for a multiple of 4.)
 
   SSRC: 32 bits
      The synchronization source identifier for the originator of this
      SR packet.
 
   The second section, the sender information, is 20 octets long and is
   present in every sender report packet.  It summarizes the data
   transmissions from this sender.  The fields have the following
   meaning:
 
   NTP timestamp: 64 bits
      Indicates the wallclock time (see Section 4) when this report was
      sent so that it may be used in combination with timestamps
      returned in reception reports from other receivers to measure
      round-trip propagation to those receivers.  Receivers should
      expect that the measurement accuracy of the timestamp may be
      limited to far less than the resolution of the NTP timestamp.  The
      measurement uncertainty of the timestamp is not indicated as it
 
      may not be known.  On a system that has no notion of wallclock
      time but does have some system-specific clock such as "system
      uptime", a sender MAY use that clock as a reference to calculate
      relative NTP timestamps.  It is important to choose a commonly
      used clock so that if separate implementations are used to produce
      the individual streams of a multimedia session, all
      implementations will use the same clock.  Until the year 2036,
      relative and absolute timestamps will differ in the high bit so
      (invalid) comparisons will show a large difference; by then one
      hopes relative timestamps will no longer be needed.  A sender that
      has no notion of wallclock or elapsed time MAY set the NTP
      timestamp to zero.
 
   RTP timestamp: 32 bits
      Corresponds to the same time as the NTP timestamp (above), but in
      the same units and with the same random offset as the RTP
      timestamps in data packets.  This correspondence may be used for
      intra- and inter-media synchronization for sources whose NTP
      timestamps are synchronized, and may be used by media-independent
      receivers to estimate the nominal RTP clock frequency.  Note that
      in most cases this timestamp will not be equal to the RTP
      timestamp in any adjacent data packet.  Rather, it MUST be
      calculated from the corresponding NTP timestamp using the
      relationship between the RTP timestamp counter and real time as
      maintained by periodically checking the wallclock time at a
      sampling instant.
 
   sender's packet count: 32 bits
      The total number of RTP data packets transmitted by the sender
      since starting transmission up until the time this SR packet was
      generated.  The count SHOULD be reset if the sender changes its
      SSRC identifier.
 
   sender's octet count: 32 bits
      The total number of payload octets (i.e., not including header or
      padding) transmitted in RTP data packets by the sender since
      starting transmission up until the time this SR packet was
      generated.  The count SHOULD be reset if the sender changes its
      SSRC identifier.  This field can be used to estimate the average
      payload data rate.
 
   The third section contains zero or more reception report blocks
   depending on the number of other sources heard by this sender since
   the last report.  Each reception report block conveys statistics on
   the reception of RTP packets from a single synchronization source.
   Receivers SHOULD NOT carry over statistics when a source changes its
   SSRC identifier due to a collision.  These statistics are:
 
   SSRC_n (source identifier): 32 bits
      The SSRC identifier of the source to which the information in this
      reception report block pertains.
 
   fraction lost: 8 bits
      The fraction of RTP data packets from source SSRC_n lost since the
      previous SR or RR packet was sent, expressed as a fixed point
      number with the binary point at the left edge of the field.  (That
      is equivalent to taking the integer part after multiplying the
      loss fraction by 256.)  This fraction is defined to be the number
      of packets lost divided by the number of packets expected, as
      defined in the next paragraph.  An implementation is shown in
      Appendix A.3.  If the loss is negative due to duplicates, the
      fraction lost is set to zero.  Note that a receiver cannot tell
      whether any packets were lost after the last one received, and
      that there will be no reception report block issued for a source
      if all packets from that source sent during the last reporting
      interval have been lost.
 
   cumulative number of packets lost: 24 bits
      The total number of RTP data packets from source SSRC_n that have
      been lost since the beginning of reception.  This number is
      defined to be the number of packets expected less the number of
      packets actually received, where the number of packets received
      includes any which are late or duplicates.  Thus, packets that
      arrive late are not counted as lost, and the loss may be negative
      if there are duplicates.  The number of packets expected is
      defined to be the extended last sequence number received, as
      defined next, less the initial sequence number received.  This may
      be calculated as shown in Appendix A.3.
 
   extended highest sequence number received: 32 bits
      The low 16 bits contain the highest sequence number received in an
      RTP data packet from source SSRC_n, and the most significant 16
      bits extend that sequence number with the corresponding count of
      sequence number cycles, which may be maintained according to the
      algorithm in Appendix A.1.  Note that different receivers within
      the same session will generate different extensions to the
      sequence number if their start times differ significantly.
 
   interarrival jitter: 32 bits
      An estimate of the statistical variance of the RTP data packet
      interarrival time, measured in timestamp units and expressed as an
      unsigned integer.  The interarrival jitter J is defined to be the
      mean deviation (smoothed absolute value) of the difference D in
      packet spacing at the receiver compared to the sender for a pair
      of packets.  As shown in the equation below, this is equivalent to
      the difference in the "relative transit time" for the two packets;
 
      the relative transit time is the difference between a packet's RTP
      timestamp and the receiver's clock at the time of arrival,
      measured in the same units.
 
      If Si is the RTP timestamp from packet i, and Ri is the time of
      arrival in RTP timestamp units for packet i, then for two packets
      i and j, D may be expressed as
 
         D(i,j) = (Rj - Ri) - (Sj - Si) = (Rj - Sj) - (Ri - Si)
 
      The interarrival jitter SHOULD be calculated continuously as each
      data packet i is received from source SSRC_n, using this
      difference D for that packet and the previous packet i-1 in order
      of arrival (not necessarily in sequence), according to the formula
 
         J(i) = J(i-1) + (|D(i-1,i)| - J(i-1))/16
 
      Whenever a reception report is issued, the current value of J is
      sampled.
 
      The jitter calculation MUST conform to the formula specified here
      in order to allow profile-independent monitors to make valid
      interpretations of reports coming from different implementations.
      This algorithm is the optimal first-order estimator and the gain
      parameter 1/16 gives a good noise reduction ratio while
      maintaining a reasonable rate of convergence [22].  A sample
      implementation is shown in Appendix A.8.  See Section 6.4.4 for a
      discussion of the effects of varying packet duration and delay
      before transmission.
 
   last SR timestamp (LSR): 32 bits
      The middle 32 bits out of 64 in the NTP timestamp (as explained in
      Section 4) received as part of the most recent RTCP sender report
      (SR) packet from source SSRC_n.  If no SR has been received yet,
      the field is set to zero.
 
   delay since last SR (DLSR): 32 bits
      The delay, expressed in units of 1/65536 seconds, between
      receiving the last SR packet from source SSRC_n and sending this
      reception report block.  If no SR packet has been received yet
      from SSRC_n, the DLSR field is set to zero.
 
      Let SSRC_r denote the receiver issuing this receiver report.
      Source SSRC_n can compute the round-trip propagation delay to
      SSRC_r by recording the time A when this reception report block is
      received.  It calculates the total round-trip time A-LSR using the
      last SR timestamp (LSR) field, and then subtracting this field to
      leave the round-trip propagation delay as (A - LSR - DLSR).  This
 
      is illustrated in Fig. 2.  Times are shown in both a hexadecimal
      representation of the 32-bit fields and the equivalent floating-
      point decimal representation.  Colons indicate a 32-bit field
      divided into a 16-bit integer part and 16-bit fraction part.
 
      This may be used as an approximate measure of distance to cluster
      receivers, although some links have very asymmetric delays.
 
   [10 Nov 1995 11:33:25.125 UTC]       [10 Nov 1995 11:33:36.5 UTC]
   n                 SR(n)              A=b710:8000 (46864.500 s)
   ---------------------------------------------------------------->
                      v                 ^
   ntp_sec =0xb44db705 v               ^ dlsr=0x0005:4000 (    5.250s)
   ntp_frac=0x20000000  v             ^  lsr =0xb705:2000 (46853.125s)
     (3024992005.125 s)  v           ^
   r                      v         ^ RR(n)
   ---------------------------------------------------------------->
                          |<-DLSR->|
                           (5.250 s)
 
   A     0xb710:8000 (46864.500 s)
   DLSR -0x0005:4000 (    5.250 s)
   LSR  -0xb705:2000 (46853.125 s)
   -------------------------------
   delay 0x0006:2000 (    6.125 s)
原创粉丝点击