GOP, scenecut 和 keyframe详解 [场景编码]

来源:互联网 发布:淘宝男士家居服 编辑:程序博客网 时间:2024/05/20 09:24

本文要说三个相关联的概念(GOP, scenecut和keyframe),这几个概念对于理解frame pattern,或者coding structure还是非常基础非常重要的。对于编码参数设置、解码性能优化、流媒体配置都非常重要。

概要

  • GOP: Group of picture 图像组
    这个概念不需要细说。从类型上说,分为Closed GOP和Open GOP两种。具体见内文。
  • scenecut
    检测场景切换的工具,又可以通俗的称为自适应I帧选择。具体内文。
  • keyframe
    常说的关键帧,在x264,里可以等同于IDR帧。

scenecut 场景检测,自适应I帧选择

scenecut 字面意思是场景切换,最终导致的结果是自适应I帧决策(adaptive I-frame decision)。而说起场景切换的依据,则又是结果导向的反推,完全是码率的决策。即,当前帧编码为P帧与编码为I帧差距小于某阈值,(优先)将该帧选择为I帧。(这就是编码器的核心思想:编码过程中任何算法理论的实践都以最终编码性能作为评判标准)

具体,参:akupenguin 2007-01-22
1)encode as (a really fast approximation of) a P-frame and an I-frame. 快速选择阶段

if ((keyframe-distance) > keyint) then    set IDR-frameelse if (1 - (P-frame bits) / (I-frame bits) < (scenecut / 100) * (keyframe-distance) / keyint)    if ((keyframe-distance) >= minkeyint) then        set IDR-frame    else        set I-frameelse    set P-frame//! keyframe-distance: from previous keyframe    距离越大越倾向设置I帧(线性关系不合理)
  • 首先,--keyint设置关键帧的最大间距,达到该间距,设置为IDR帧,没毛病;
  • 其次,满足scenecut,场景切换来了,--min-keyint设置最小关键帧间距,如未达到要求,设置为普通I帧,否则为IDR帧。(顺便插一句,如果插入普通I帧,这个GOP就有两个I帧喽)
  • 关于计算公式:
    1)默认scenecut 40%,即P帧bits > I帧 bits * 60%时,认为scenecut。即设置40%,I帧可以比P帧多用至多2/3的bits。
    2)与上一关键帧间距有关,间距越大,约应该设置为I帧或关键帧。

2)encode for real. 实际编码阶段

信息查看

  • keyframe
    ffprobe -select_streams v -show_frames VIDEONAME |grep -E 'key_frame'
  • scenecut
    -f lavfi 利用filter

x264 最新代码

static int scenecut_internal( x264_t *h,                               x264_mb_analysis_t *a,                               x264_frame_t **frames,                               int p0,                               int p1,                               int real_scenecut)    float f_thresh_max = h->param.i_scenecut_threshold / 100.0;    float f_thresh_min = f_thresh_max * 0.25;    if( h->param.i_keyint_min == h->param.i_keyint_max )        f_thresh_min = f_thresh_max;    if( i_gop_size <= h->param.i_keyint_min / 4 || h->param.b_intra_refresh )        f_bias = f_thresh_min / 4;    else if( i_gop_size <= h->param.i_keyint_min )        f_bias = f_thresh_min * i_gop_size / h->param.i_keyint_min;    else    {        f_bias = f_thresh_min                 + ( f_thresh_max - f_thresh_min )                 * ( i_gop_size - h->param.i_keyint_min )                 / ( h->param.i_keyint_max - h->param.i_keyint_min );    }    res = pcost >= (1.0 - f_bias) * icost;

在实现中可以看到,即便当前的distance小于--keyint-min,也还是有几率识别为scenecut,另外还处理了--keyint==--keyint-min等情况,健壮性提升了。

open GOP和closed GOP:

2.7.5 Open GOP
这里写图片描述
Figure 2.12 shows an example of an open GOP structure. A closed GOP with an IBBP pattern starts with an I frame whereas an open GOP with the same pattern may start with a B frame. Unlike the closed GOP, both I and P frames can be used for forward or backward prediction. In addition, the last P frame in a previous GOP is referenced by B frames in the current GOP. This GOP structure is commonly employed in Apple’s HTTP live streaming (HLS). It ends with a P frame, just like a closed GOP. However, unlike a closed GOP, the open GOP fully exploits the last P frame, which is used as a reference for four B frames. As a consequence, fewer P frames may be employed when compared to closed GOP structures, giving rise to a slight improvement in compression efficiency. Note that the I frame now serves as a reference for more frames (5 frames), possibly as many as the P frame. Hence, interprediction is improved over the closed GOP and both I and P frames may be buffered by the decoder for the same period of time (i.e., a time interval corresponding to 5 frames).

For the same number of B frames in an IBBP GOP, two P frames are used for an open GOP compared to three in a closed GOP, giving rise to a smaller GOP length of 9 for the open GOP. The drawback of an open GOP is that it is no longer self-contained and hence, cannot be decoded independently. This will not apply to the rst GOP of the video, which will start with an I frame. Alternative frame patterns of IBP and IBBBP con rm that an additional P frame can be omitted for the open GOP struc- ture, thereby reducing its length by 1 compared to the closed GOP ([IBPBPBPBP] vs P[BIBPBPBP] and [IBBBPBBBP] vs P[BBBIBBBP]).

这里写图片描述

Another example of an open IBBP GOP structure is shown in Figure 2.13. Again, only two P frames are required for a GOP of length 9. This structure starts with an I frame, just like a closed GOP. In this case, the I frame is used as a reference for four B frames, including two from the previous GOP. Thus, the GOP need not end with a P frame. For the nal GOP of the video, the last two B frames (i.e., B-5 and B-6) are not encoded.

总结下,Open GOP和Closed GOP区别:
1. closed GOP中,I帧仅用于正向预测,open GOP中,I帧既用于正向预测,也可反向预测 ==> I帧被更多的帧参考 参图2.12
2. last P帧被更好的利用 <== 因为被下一个GOP的B帧正向预测 参图2.12
3. open GOP中,GOP第一帧也可以是I,那么GOP最后一帧就不一定是P。(closed GOP最后一帧肯定是P)参图2.13


原创粉丝点击