从T264代码看帧内预测

来源：互联网发布：mac怎么安装sass 编辑：程序博客网时间：2024/05/19 17:05

越来越感觉到这种看标准和理解代码的方法的好处了，以前单看标准再单看x264都没甚么成果。现在一部分一部分的看代码和标准，虽然还有很多需要贯通起来，但是想一想已经了解一些了。或许前面就是个铺垫，又怎么会知道呢。毕竟以前不会再重来，未来还是需要自己去尝试的。或许过了几个月我又感觉到另外一种方法的好呢，这都不管了，现在我就是乐此不疲了！

首先看一组T264中的定义：

/*16x16模式的亮度4种预测*/

void T264_predict_16x16_mode_0_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_1_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_2_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_20_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_21_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_22_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_16x16_mode_3_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);

// 亮度预测的精确模式，一个9种

4x4 luma (6 functions)
void T264_predict_4x4_mode_0_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_1_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_2_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_20_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_21_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_22_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);

void T264_predict_4x4_mode_3_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_4_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_5_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_6_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_7_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_4x4_mode_8_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);

//色度模式的4种预测

// 8x8 chroma (7 functions)
void T264_predict_8x8_mode_0_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_1_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_2_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_20_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_21_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_22_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);
void T264_predict_8x8_mode_3_c (uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left);

上面的注释的几种是对应的标准中的说明，可以进行参照。而进行实现（T264）中的函数就变得多了，但是类型还是一样的，比如在亮度16x16的20、21、22就是对应的一个类型的预测，就是16x16中的所谓第二种DC预测模式。说起这个类型，在其他两个块中也是有的，就其说明来看，是很牛的：Mode 2 (DC prediction)
is modified depending on which samples A–M have previously been coded; each of the other
modes may only be used if all of the required prediction samples are available。模式2（DC预测）：取决于A-M中在前面已经编码的采样进行修正，而其他的模式只能在其需要的象素全部可得的时候才可以用。这样看，好像其是很厉害的，用了甚么方法可以使得其不用要求固定象素可得来预测呢？其实这样也太为难人家了，谁也不能在甚么都不知道的情况下，做个精确的预测，而模式2其实就是一种很简单的处理，是在象素不可得的时候的一种很无赖的处理：牺牲了精确性。看代码：

// none available 注意：全不可得
//
void
T264_predict_16x16_mode_22_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left)
{
int32_t i, j;

    for(i = 0 ; i < 16 ; i ++)
    {
        for(j = 0 ; j < 16 ; j ++)
        {
            dst[j] = 128;
        }
        dst += dst_stride;
    }
}

原来全不可得的时候就是取个中值（256/2）。这是我个人认为最无赖的一段代码，但是反过来想也只有这样的办法了。而且在仔细想想，这种情况是很少的。虽然会给我们带来时间取理解。其他的两种模式（20、21）还说的过去，但是都比其他方法要粗糙。而在4x4亮度和8x8的色度中的这种方法也是类似的。

其实重点还是4x4的几种方法，在此说一个就够了。我们看这段代码：

//Mode 4 Intra_4x4_DIAGONAL_DOWNRIGHT when Top and left are available
void T264_predict_4x4_mode_4_c(uint8_t* dst, int32_t dst_stride, uint8_t* top, uint8_t* left)
{
uint8_t *cur_dst = dst;

    *(cur_dst + 12) = (left[3] + (left[2] << 1) + left[1] + 2) >> 2;
    *(cur_dst + 8) = *(cur_dst + 13) = (left[2] + (left[1] << 1) + left[0] + 2) >> 2;
    *(cur_dst + 4) = *(cur_dst + 9) = *(cur_dst + 14) = (left[1] + (left[0] << 1) + *(left - 1) + 2) >> 2;
    *(cur_dst) = *(cur_dst + 5) = *(cur_dst + 10) = *(cur_dst + 15) = (left[0] + (*(left - 1) << 1) + top[0] + 2) >> 2;
    *(cur_dst + 1) = *(cur_dst + 6) = *(cur_dst + 11) = (*(top - 1) + (top[0] << 1) + top[1] + 2) >> 2;
    *(cur_dst + 2) = *(cur_dst + 7) = (top[0] + (top[1] << 1) + top[2] + 2) >> 2;
    *(cur_dst + 3) = (top[1] + (top[2] << 1) + top[3] + 2) >> 2;
}
这是在左和上都可得的情况下进行的4x4亮度第四种模式预测：下右。从参数开始看，dst肯定是目的象素地址了；dst_-stride目的象素每两行之间的距离（算完一行后，加上这个就可以到下一行，这段代码没用这个参数）；top和left就是上面和左面的象素了（就是标准种的A-M，而这里用的到的就是ABCD,IJKLM，分别是top0-3,left0-3,而M既是top-1,也是left-1,可对应代码看）。用方块的画法看差值不是很明确，用点的方法就很好了。

。。。。。

。 x x x x

从左上到下右方向45度画斜线，就可以这样分组（目的象素按行分别为0-15）0、5、10、15为一组；1、6、11为一组；2、7为一组；3为一组；4、9、14为一组；8、13为一组；12为一组。同组内的象素使用相同的象素点来预测。以0这组为例，画线表示该组象素和top0,left0,left-1(top-1)有关，那么就是这三个象素计算得到，很明显left-1(top-1)的关系重大一些，那么就是权重大就表示了。最后结果就是round(left0/4+left-1/2+top0/4),对比代码可以更为明确。而其他象素的计算也是类似的，就是中间的象素权重大，表示相关性高。

其他模式也是类似的方法，都是对应可得象素的情况来决定预测方式。

这就是帧内预测了。