GOP, scenecut 和 keyframe详解 [场景编码]

来源：互联网发布：淘宝男士家居服编辑：程序博客网时间：2024/05/20 09:24

本文要说三个相关联的概念（GOP, scenecut和keyframe），这几个概念对于理解frame pattern，或者coding structure还是非常基础非常重要的。对于编码参数设置、解码性能优化、流媒体配置都非常重要。

概要

GOP: Group of picture 图像组
这个概念不需要细说。从类型上说，分为Closed GOP和Open GOP两种。具体见内文。
scenecut
检测场景切换的工具，又可以通俗的称为自适应I帧选择。具体内文。
keyframe
常说的关键帧，在x264，里可以等同于IDR帧。

scenecut 场景检测，自适应I帧选择

scenecut 字面意思是场景切换，最终导致的结果是自适应I帧决策（adaptive I-frame decision）。而说起场景切换的依据，则又是结果导向的反推，完全是码率的决策。即，当前帧编码为P帧与编码为I帧差距小于某阈值，（优先）将该帧选择为I帧。（这就是编码器的核心思想：编码过程中任何算法理论的实践都以最终编码性能作为评判标准）

具体，参：akupenguin 2007-01-22
1)encode as (a really fast approximation of) a P-frame and an I-frame. 快速选择阶段

if ((keyframe-distance) > keyint) then    set IDR-frameelse if (1 - (P-frame bits) / (I-frame bits) < (scenecut / 100) * (keyframe-distance) / keyint)    if ((keyframe-distance) >= minkeyint) then        set IDR-frame    else        set I-frameelse    set P-frame//! keyframe-distance: from previous keyframe    距离越大越倾向设置I帧（线性关系不合理）

首先，--keyint设置关键帧的最大间距，达到该间距，设置为IDR帧，没毛病；
其次，满足scenecut，场景切换来了，--min-keyint设置最小关键帧间距，如未达到要求，设置为普通I帧，否则为IDR帧。（顺便插一句，如果插入普通I帧，这个GOP就有两个I帧喽）
关于计算公式：
1）默认scenecut 40%，即P帧bits > I帧 bits * 60%时，认为scenecut。即设置40%，I帧可以比P帧多用至多2/3的bits。
2）与上一关键帧间距有关，间距越大，约应该设置为I帧或关键帧。

2)encode for real. 实际编码阶段

信息查看

keyframe
ffprobe -select_streams v -show_frames VIDEONAME |grep -E 'key_frame'
scenecut
-f lavfi 利用filter

x264 最新代码

static int scenecut_internal( x264_t *h,                               x264_mb_analysis_t *a,                               x264_frame_t **frames,                               int p0,                               int p1,                               int real_scenecut)    float f_thresh_max = h->param.i_scenecut_threshold / 100.0;    float f_thresh_min = f_thresh_max * 0.25;    if( h->param.i_keyint_min == h->param.i_keyint_max )        f_thresh_min = f_thresh_max;    if( i_gop_size <= h->param.i_keyint_min / 4 || h->param.b_intra_refresh )        f_bias = f_thresh_min / 4;    else if( i_gop_size <= h->param.i_keyint_min )        f_bias = f_thresh_min * i_gop_size / h->param.i_keyint_min;    else    {        f_bias = f_thresh_min                 + ( f_thresh_max - f_thresh_min )                 * ( i_gop_size - h->param.i_keyint_min )                 / ( h->param.i_keyint_max - h->param.i_keyint_min );    }    res = pcost >= (1.0 - f_bias) * icost;

在实现中可以看到，即便当前的distance小于--keyint-min，也还是有几率识别为scenecut，另外还处理了--keyint==--keyint-min等情况，健壮性提升了。

open GOP和closed GOP:

2.7.5 Open GOP

Figure 2.12 shows an example of an open GOP structure. A closed GOP with an IBBP pattern starts with an I frame whereas an open GOP with the same pattern may start with a B frame. Unlike the closed GOP, both I and P frames can be used for forward or backward prediction. In addition, the last P frame in a previous GOP is referenced by B frames in the current GOP. This GOP structure is commonly employed in Apple’s HTTP live streaming (HLS). It ends with a P frame, just like a closed GOP. However, unlike a closed GOP, the open GOP fully exploits the last P frame, which is used as a reference for four B frames. As a consequence, fewer P frames may be employed when compared to closed GOP structures, giving rise to a slight improvement in compression efficiency. Note that the I frame now serves as a reference for more frames (5 frames), possibly as many as the P frame. Hence, interprediction is improved over the closed GOP and both I and P frames may be buffered by the decoder for the same period of time (i.e., a time interval corresponding to 5 frames).

For the same number of B frames in an IBBP GOP, two P frames are used for an open GOP compared to three in a closed GOP, giving rise to a smaller GOP length of 9 for the open GOP. The drawback of an open GOP is that it is no longer self-contained and hence, cannot be decoded independently. This will not apply to the rst GOP of the video, which will start with an I frame. Alternative frame patterns of IBP and IBBBP con rm that an additional P frame can be omitted for the open GOP struc- ture, thereby reducing its length by 1 compared to the closed GOP ([IBPBPBPBP] vs P[BIBPBPBP] and [IBBBPBBBP] vs P[BBBIBBBP]).

Another example of an open IBBP GOP structure is shown in Figure 2.13. Again, only two P frames are required for a GOP of length 9. This structure starts with an I frame, just like a closed GOP. In this case, the I frame is used as a reference for four B frames, including two from the previous GOP. Thus, the GOP need not end with a P frame. For the nal GOP of the video, the last two B frames (i.e., B-5 and B-6) are not encoded.

总结下，Open GOP和Closed GOP区别：
1. closed GOP中，I帧仅用于正向预测，open GOP中，I帧既用于正向预测，也可反向预测 ==> I帧被更多的帧参考 参图2.12
2. last P帧被更好的利用 <== 因为被下一个GOP的B帧正向预测参图2.12
3. open GOP中，GOP第一帧也可以是I，那么GOP最后一帧就不一定是P。（closed GOP最后一帧肯定是P）参图2.13

阅读全文

1 0