X264的参考帧管理机制

来源：互联网发布：mac桌面的u盘图标编辑：程序博客网时间：2024/06/05 23:55

X264是一款研究的是H.264编码的开源代码软件，相比JM而言，其编码性能有很大的提高，其支持大多数H.264的特性工具，包括:CABAC和 CAVLC高效嫡编码、多参考帧预测、所有的帧内预测宏块类型(16x6l和4x4)、所有的前向帧间预测P(帧)宏块类型 (16xl6，16x8，8xl6，8x8，8x4，4x8和4x4)、最常用的双向帧间预测(B帧)宏块类型(16xl6，16x8，8xl6和 8x8)、l/4像素精度运动估计、率失真优化、自适应B帧选择且B帧可作为参考帧。从这周开始我将从四个方面去研读x264：参考帧管理，码率控制，运动估计和宏块模式的抉择。

下面就把这周关于参考帧管理这部分的个人理解作为学习笔记记录下来，对于我来说，庞大的代码看起来真的很费劲，不过感谢很多牛人提供的帮助，他们的文献和博客对我帮助很大，特别是我的partner兼学长不屈号的航海长基本上是手把手的教我，再次特别感谢。
基于我前面学习的JM15.1中的参考帧管理部分，JM中的参考帧管理的基本流程如下：
1.参考帧图像列表的初始化；
2.参考图像列表的重排序；重排序的目的主要是为了减少参考帧索引号所需要的编码，对于要不要重排序，就要看sh->b_ref_pic_list_reordering_l0和 sh->b_ref_pic_list_reordering_l1的取值
3.帧编码；
4.参考图像序列的标记；
这里是根据X264编码的流程，将参考帧作为其中的重点来跟踪：
X264的总体流程：有四个层次（用R来表示总路线）：

A.主函数main函数，编码功能的主函数Encode()，其中以 i_frme_total为限制条件的for循环，进入

B.encode frame()，这是编码的第二层，调用x264_encoder_encode()进行帧层的编码，这个函数主要是进行VCL层的编码及部分NAL网络适应层的编码。在这个函数编码的顺序（因为参考帧在里面出现，所以对这个函数的流程进行仔细跟踪，用NO.表示）：

NO.1主要是将要编码的帧存储在fenc中，并对要编码的帧进行排序，fenc和fdec 分别表示表示编码和解码代表的缓存区间，在图像编码前，先检查fdec里面的参考帧，并将需要参考的部分放入参考队列中：
其函数主要是：encoder.c Lin1695
// ok to call this before encoding any frames, since the initial values of fdec have b_kept_as_ref=0
if( x264_reference_update( h ) )
return -1;
进入函数内：
static inline int x264_reference_update( x264_t *h )
{
if( !h->fdec->b_kept_as_ref )
{
if( h->param.i_threads > 1 )
{
x264_frame_push_unused( h, h->fdec );
h->fdec = x264_frame_pop_unused( h, 1 );
if( !h->fdec )
return -1;
}
return 0;
}

/* move frame in the buffer */
x264_frame_push( h->frames.reference, h->fdec );
if( h->frames.reference[h->frames.i_max_dpb] )
x264_frame_push_unused( h, x264_frame_shift( h->frames.reference ) );
h->fdec = x264_frame_pop_unused( h, 1 );
if( !h->fdec )
return -1;
return 0;
}
这一部分的主要目的是完成将fdec中做为参考帧部分取出，为编码帧服务。

NO.2为参考帧建立参考帧序列，也就是相对JM里面的.参考帧图像列表的初始化：
（函数所在位置encoder.c Lin1740）
首先判断h->fenc->i_type所表示的帧的类型：因为进入编码的第一帧是IDR，即进入函数               x264_reference_reset( h );
目的：重置所有参考帧管理，因为IDR是不用参考的，并将被参考的等级设为HIGHEST：
i_nal_type    = NAL_SLICE_IDR;
i_nal_ref_idc = NAL_PRIORITY_HIGHEST;
h->sh.i_type = SLICE_TYPE_I;
在后面进入编码帧中的I帧、p帧、B帧时，操作类似，只是作为参考的等级不是一样的，其中要区分X264_TYPE_BREF和B frame，一个是被做为参考，一个是不作参考的,最后将POC的num.设为frame num.的2倍，将fdec和fenc对应起来，目的就是是在参考帧是在DPB中用缓存作为其他帧的参考；

NO.3在完成了上面关于不同帧的 i_nal_type， i_nal_ref_idc，h->sh.i_type的设定之后，进入参考帧列表的初始化，如同JM里面的前向和后向参考帧列表list0 ,list1（encoder.c Lin1783）
x264_reference_build_list( h, h->fdec->i_poc );
static inline void x264_reference_build_list( x264_t *h, int i_poc )
{
int i;
int b_ok;

/* build ref list 0/1 */
h->i_ref0 = 0;
h->i_ref1 = 0;
for( i = 0; h->frames.reference[i]; i++ )
{//初始化fref0和fref1，并得到fref0和fref1所参考的帧数，在程序中跟踪后发现，fref0会随参数的设置改////变大小的
if( h->frames.reference[i]->i_poc < i_poc )//i_poc=h->fdec->i_poc
{
h->fref0[h->i_ref0++] = h->frames.reference[i];
}
else if( h->frames.reference[i]->i_poc > i_poc )
{
h->fref1[h->i_ref1++] = h->frames.reference[i];
}
}

在JM里面的参考帧是分前向和后向参考的，list0是用来前向参考的，list1是后向参考的，对每一个 list里面放的参考帧，又分短期和长期参考，短期是按降序排列，长期是按升序排列的
而这里，在fref0里面进行整体降序排列，这是和JM不同的地方，jm还会通过设置list0idx的值作为短期和长期的分界
**********这里有没有考虑短期和长期参考的问题了，只是将参考帧的等级设为HIGHEST 或                       DISPOSABLE***********

/* Order ref0 from higher to lower poc */
//相当于JM里面的list0，将ref0中的参考帧按照POC进行排序
do
{
b_ok = 1;
for( i = 0; i < h->i_ref0 - 1; i++ )
{
if( h->fref0[i]->i_poc < h->fref0[i+1]->i_poc )
{
XCHG( x264_frame_t*, h->fref0[i], h->fref0[i+1] );
b_ok = 0;
break;
}
}
} while( !b_ok );
/* Order ref1 from lower to higher poc (bubble sort) for B-frame */
//相当于List1中是后向参考，将ref1中的POC进行升序排列
do
{
b_ok = 1;
for( i = 0; i < h->i_ref1 - 1; i++ )
{
if( h->fref1[i]->i_poc > h->fref1[i+1]->i_poc )
{
XCHG( x264_frame_t*, h->fref1[i], h->fref1[i+1] );
b_ok = 0;
break;
}
}
} while( !b_ok );

// 在上面获的h->i_ref0，h->i_ref1在下面进行重新选择
//因为在参数的初始化设置时，h->frames.i_max_ref1=1，则h->i_ref1会在0和1之间波动，
//而h->frames.i_max_ref0=2，并且会随
//命令行参数h->param.i_frame_reference的设置而发生变化，和都会去变化中的最小值，

h->i_ref1 = X264_MIN( h->i_ref1, h->frames.i_max_ref1 );
h->i_ref0 = X264_MIN( h->i_ref0, h->frames.i_max_ref0 );
h->i_ref0 = X264_MIN( h->i_ref0, h->param.i_frame_reference ); // if reconfig() has lowered the limit
assert( h->i_ref0 + h->i_ref1 <= 16 );

//h->mb.pic.i_fref[0]和h->mb.pic.i_fref[1] 比较重要，会在后面的宏块的编码中利用参考帧使用到
h->mb.pic.i_fref[0] = h->i_ref0;
h->mb.pic.i_fref[1] = h->i_ref1;
}

）

//如果是SLICE_TYPE_B，就是B帧，还要进行初始化宏块双向预测，其函数为：
//h->mb.bipred_weight[i_ref0][i_ref1]，这部分主要双向预测时的权重
if( h->sh.i_type == SLICE_TYPE_B )
x264_macroblock_bipred_init( h );

void x264_macroblock_bipred_init( x264_t *h )
{
int i_ref0, i_ref1;
for( i_ref0 = 0; i_ref0 < h->i_ref0; i_ref0++ )
{//由参考帧的POC来判定的
int poc0 = h->fref0[i_ref0]->i_poc;
for( i_ref1 = 0; i_ref1 < h->i_ref1; i_ref1++ )
{
int dist_scale_factor;//求weight的参考量
int poc1 = h->fref1[i_ref1]->i_poc;
int td = x264_clip3( poc1 - poc0, -128, 127 );
if( td == 0 /* || pic0 is a long-term ref */ )
dist_scale_factor = 256;
else
{
int tb = x264_clip3( h->fdec->i_poc - poc0, -128, 127 );
int tx = (16384 + (abs(td) >> 1)) / td;
dist_scale_factor = x264_clip3( (tb * tx + 32) >> 6, -1024, 1023 );
}
h->mb.dist_scale_factor[i_ref0][i_ref1] = dist_scale_factor;

dist_scale_factor >>= 2;
if( h->param.analyse.b_weighted_bipred
&& dist_scale_factor >= -64
&& dist_scale_factor <= 128 )
{
h->mb.bipred_weight[i_ref0][i_ref1] = 64 - dist_scale_factor;
// ssse3 implementation of biweight doesn't support the extrema.
// if we ever generate them, we'll have to drop that optimization.
assert( dist_scale_factor >= -63 && dist_scale_factor <= 127 );
}
else
h->mb.bipred_weight[i_ref0][i_ref1] = 32;
}
}
//如果存在有宏块帧场自适应选择：
//这里×2是因为field是frame的2倍
if( h->sh.b_mbaff )
{
for( i_ref0 = 2*h->i_ref0-1; i_ref0 >= 0; i_ref0-- )
for( i_ref1 = 2*h->i_ref1-1; i_ref1 >= 0; i_ref1-- )
h->mb.bipred_weight[i_ref0][i_ref1] = h->mb.bipred_weight[i_ref0>>1][i_ref1>>1];
}
}

NO.4 下面这一部分就是对NAL编码部分的分析：对于i_nal_type=NAL_SLICE_IDR，编写获得PPS,SPS,SEI的
比特流，所以作为一个GOP，当编码第一个IDR是，其i_nal=4，因为这里要编写4中类型的NALU：PPS,SPS,SEI，IDR，在后面对每一个非IDR 帧，i_nal=1，即只是需要进行本类型的编码

至此，已经对参考帧进行初始化和排序了， 参考图像的重排序 ，这里分析x264是怎么将要排序的参考帧序列写入码流中，并进行对ref0进行重排序。
在程序的encoder.c Lin1862, 这两个参数在后面会继续调用的，来看是否要进行h->fref0和h->fref1的重排序
h->b_ref_reorder[0] =h->b_ref_reorder[1] = 0;

for( i = 0; i < h->i_ref0 - 1; i++ )
/* P and B-frames use different default orders. */
if( h->sh.i_type == SLICE_TYPE_P ？ h->fref0[i]->i_frame_num < h->fref0[i+1]->i_frame_num
: h->fref0[i]->i_poc < h->fref0[i+1]->i_poc )
{
h->b_ref_reorder[0] = 1;
break;
}

C. 接下来就是针对上面关于 h->b_ref_reorder[0] 和h->b_ref_reorder[1] 这两个参数的设置来看是否进行重排序,于是进入总流程的第三部分：条带层！目的就是找到重排序的函数的语法元素，并写入在slice 头中！
进入x264_slice_init( h, i_nal_type, i_global_qp )，（encoder.c Lin1874）接着进入：
static void x264_slice_header_init （encoder.c L79）
在这个函数中的Lin119中有如下定义，是将是否需要重排序的标志写进 slice_header_init里面
sh->b_num_ref_idx_override = 0;
sh->i_num_ref_idx_l0_active = 1;
sh->i_num_ref_idx_l1_active = 1;

//调用前面的 h->b_ref_reorder[0]和 h->b_ref_reorder[1]
sh->b_ref_pic_list_reordering_l0 = h->b_ref_reorder[0];
sh->b_ref_pic_list_reordering_l1 = h->b_ref_reorder[1]=0;

/* If the ref list isn't in the default order, construct reordering header */
/* List1 reordering isn't needed yet */
if( sh->b_ref_pic_list_reordering_l0 )
{
int pred_frame_num = i_frame;
for( i = 0; i < h->i_ref0; i++ )
{
//这里获得sh->ref_pic_list_order[0][i].idc和 sh->ref_pic_list_order[0][i].arg ，在参考帧重排序作为偏移量
int diff = h->fref0[i]->i_frame_num - pred_frame_num;
if( diff == 0 )
x264_log( h, X264_LOG_ERROR, "diff frame num == 0/n" );
sh->ref_pic_list_order[0][i].idc = ( diff > 0 );
sh->ref_pic_list_order[0][i].arg = abs( diff ) - 1;
pred_frame_num = h->fref0[i]->i_frame_num;
}
}

在执行完slice init 过后，进入：
x264_slices_write( x264_t *h )
这是的第三层编码的重点函数。

参考帧重排序的在函数 x264_slice_write里面，函数所在语句为（encoder.c Lin1601）：
if( x264_stack_align( x264_slice_write, h ) )
return (void *)-1;
再进入函数 x264_slice_write（encoder.c L231）：在这一部分找到前面我们在slice header 里面关于h->b_ref_reorder[0] 和h->b_ref_reorder[1]的部分，其对应在encoder.c Lin1323

x264_slice_header_write( &h->out.bs, &h->sh, h->i_nal_ref_idc );
进入这个函数，encoder.c Lin 231 有如下：目的：重排序：

/* ref pic list reordering */
if( sh->i_type != SLICE_TYPE_I )
{
bs_write1( s, sh->b_ref_pic_list_reordering_l0 );
if( sh->b_ref_pic_list_reordering_l0 )
{
for( i = 0; i < sh->i_num_ref_idx_l0_active; i++ )
{
bs_write_ue( s, sh->ref_pic_list_order[0][i].idc );
bs_write_ue( s, sh->ref_pic_list_order[0][i].arg );

}
bs_write_ue( s, 3 );
}
}
if( sh->i_type == SLICE_TYPE_B )
{
bs_write1( s, sh->b_ref_pic_list_reordering_l1 );
if( sh->b_ref_pic_list_reordering_l1 )
{
for( i = 0; i < sh->i_num_ref_idx_l1_active; i++ )
{
bs_write_ue( s, sh->ref_pic_list_order[1][i].idc );
bs_write_ue( s, sh->ref_pic_list_order[1][i].arg );
}
bs_write_ue( s, 3 );
}
}

上面关于重排序这一部分，还是要和理论知识结合多思考！

完成上面参考帧的冲排序后，回到总流程上面来，
简单介绍条带层的编码,编码的第3部分，如下
最重要的函数是x264_slice_write();(Encoder.c L1601)
if( x264_stack_align( x264_slice_write, h ) )
return (void *)-1;
进入x264_slice_write( x264_t *h )，
这个函数的主要部分是在
while( (mb_xy = i_mb_x + i_mb_y * h->sps->i_mb_width) <= h->sh.i_last_mb )
在循环里面主要是对宏块进行帧内和帧间的预测，运动估计，运动补偿，4×4DCT变换，量化和zig_zag扫描，和P_skip与B_SKIP宏块模式的决定，熵编码等，这是编码的核心部分。
x264_macroblock_analyse( h );是对进行帧内和帧间的运动估计，保存运动矢量；
x264_macroblock_encode( h );进行的对残差的4×4DCT，量化，zig_zag扫描，并重建和解码端同步的参考帧。

D. 回到总流程中的编码的第四部分，对宏块层的编码：
要分析的函数：x264_macroblock_analyse( h )和
x264_macroblock_encode( h )
x264_macroblock_analyse( x264_t *h )（Analyse.c L2337）
用来分析各种可能帧内和帧间预测模式下的编码代价，以寻找最合适的预测模式。

以上主要是针对理论参考帧部分的，实验结果会在下一篇，ing！