x265代码阅读(二):slicetype.cpp

来源:互联网 发布:c语言中的eof 编辑:程序博客网 时间:2024/05/17 17:58

0.下面的注释是在这些参数为前提得出的:分辨率为1280*720,preset = “medium”,tune = “zerolatency”,crf = “20”。设置这些参数会导致没有B帧,但是对于理解算法影响不是很大。

1.读Slicetype.cpp之前需要先读ThreadPool.cpp。这里直接写出结论吧:由ThreadPool特点可知,它初始化会启动一个或多个线程,然后进入等待状态。ThreadPool里面的线程分为执行状态和等待状态。每次调用void JobProvider::tryWakeOne或者int ThreadPool::tryBondPeers时,会唤醒线程。

1). 由tryBondPeers唤醒的线程:其返回值表示找到的线程个数,如果是0表示没找到线程。若ThreadPool里面的所有线程处于执行状态,则无法唤醒任何线程,tryBondPeers就什么都不执行。若tryBondPeers唤醒了N个线程,则这些线程会同时执行processTasks和findJob两个任务。每个线程都执行一遍。
2). 由tryWakeOne唤醒的线程:该函数没有返回值。若ThreadPool里面所有线程都处于执行状态,它会设置变量m_helpWanted表示有任务需要ThreadPool来执行,而ThreadPool会在执行完当前任务后立马分出一个线程去执行该任务。tryWakeOne唤醒的则只执行findJob任务。并且只执行一次。区别上面的每个线程都执行一次。简言之,他们的区别是,tryWakeOne调用后findJob一定会执行一次,而tryBondPeers调用后processTasks和findJob都会被执行多次。至于为什么processTasks和findJob被多次调用而不出错,可以看它们的代码,有专门的变量用于控制任务的执行。

2.搜索Lookahead类可以看出它只在Encoder类里面被使用。它的初始化在void Encoder::create()里面。Lookahead类的使用主要是在int Encoder::encode函数里面。它会调用Lookahead的下面几个函数:

1).m_lookahead->addPicture(*inFrame, sliceType);或m_lookahead->flush();其中addPicture主要是这几句:(参数*inFrame即为下面的curFrame,sliceType恒为0)    m_inputLock.acquire();    m_inputQueue.pushBack(curFrame);    if (m_pool && m_inputQueue.size() >= m_fullQueueSize)        tryWakeOne();//会导致Lookahead::findJob函数执行一次    m_inputLock.release();2).frameEnc = m_lookahead->getDecidedPicture();    //获取已处理的图片,代码如下:(主要任务实际上是由findJob完成的)    m_outputLock.acquire();    Frame *out = m_outputQueue.popFront();    m_outputLock.release();    if (out)        return out;    findJob(-1); /* run slicetypeDecide() if necessary */    m_inputLock.acquire();    bool wait = m_outputSignalRequired = m_sliceTypeBusy;    m_inputLock.release();    if (wait)        m_outputSignal.wait();    return m_outputQueue.popFront();3).m_lookahead->getEstimatedPictureCost(frameEnc);该函数是在encode的最后调用,用于计算给定图片的SATD cost值4).void Lookahead::findJob

会在每次调用tryWakeOne(只在void Lookahead::addPicture里面调用)时执行一次。也会在getDecidedPicture里面被调用。
看代码可以知道,它主要是根据需要来调用slicetypeDecide函数。前面的判断是防止findJob被多次调用时出错。

5).slicetypeDecide

代码见下面,有点疑惑的是判定该帧为I或IDR帧的条件是frm.frameNum-m_lastKeyframe>=250,也就是说如果每秒25帧的话每10秒会有一个I帧。并不是根据信息量来判断的啊。难道这里标记为P或者B后,正式编码时可以再次修改吗?
另外,看代码可以发现,对cost的计算分别调用的是PreLookaheadGroup::processTasks和CostEstimateGroup::singleCost

void Lookahead::slicetypeDecide(){PreLookaheadGroup pre(*this);Lowres* frames[X265_LOOKAHEAD_MAX + X265_BFRAME_MAX + 4];Frame*  list[X265_BFRAME_MAX + 4];memset(frames, 0, sizeof(frames));memset(list, 0, sizeof(list));int maxSearch = X265_MIN(m_param->lookaheadDepth, X265_LOOKAHEAD_MAX);maxSearch = X265_MAX(1, maxSearch);{    ScopedLock lock(m_inputLock);    Frame *curFrame = m_inputQueue.first();    int j;    for (j = 0; j < m_param->bframes + 2; j++)    {        if (!curFrame) break;        list[j] = curFrame;        curFrame = curFrame->m_next;    }    //if(j > 1)printf("\nliyi===========================");    curFrame = m_inputQueue.first();    frames[0] = m_lastNonB;    for (j = 0; j < maxSearch; j++)    {        if (!curFrame) break;        frames[j + 1] = &curFrame->m_lowres;        if (!curFrame->m_lowresInit)            pre.m_preframes[pre.m_jobTotal++] = curFrame;        //else printf("\nliyi=============================");        curFrame = curFrame->m_next;    }    maxSearch = j;}//上面两个循环都只执行了一次,maxSearch == 1,list[0] = m_inputQueue.first(),list[1] = NULL//frames[0] = m_lastNonB(在循环外),frames[1] = &m_inputQueue.first()->m_lowres//pre.m_jobTotal == 1,pre.m_preframes[0] = m_inputQueue.first();/* perform pre-analysis on frames which need it, using a bonded task group */if (pre.m_jobTotal){    //这里tryBondPeers会导致ThreadPool里面的processTasks(m_id)的执行,m_id表示    //ThreadPool里面的WorkerThread的编号,从0开始,依次递增。    //由于tryBondPeers的第一个参数为ThreadPool&类型,所以它默认会唤醒所有的等待线程。    //所有处于等待的线程都执行一次processTasks(m_id),当然如果没有等待线程,什么也不做。    if (m_pool)        pre.tryBondPeers(*m_pool, pre.m_jobTotal);    pre.processTasks(-1);    //waitForExit会等待ThreadPool里面所有线程都执行完才返回。跟tryBondPeers对应。    pre.waitForExit();}if (m_lastNonB && !m_param->rc.bStatRead &&    ((m_param->bFrameAdaptive && m_param->bframes) ||     m_param->rc.cuTree || m_param->scenecutThreshold ||     (m_param->lookaheadDepth && m_param->rc.vbvBufferSize))){    slicetypeAnalyse(frames, false);//这个地方执行不到}int bframes, brefs;for (bframes = 0, brefs = 0;; bframes++){    Lowres& frm = list[bframes]->m_lowres;    //if (frm.sliceType != 0)printf("\nliyi========================");    //所有frm.sliceType都为0    if (frm.sliceType == X265_TYPE_BREF && !m_param->bBPyramid && brefs == m_param->bBPyramid)    {        frm.sliceType = X265_TYPE_B;        x265_log(m_param, X265_LOG_WARNING, "B-ref at frame %d incompatible with B-pyramid\n",                 frm.frameNum);    }    else if (frm.sliceType == X265_TYPE_BREF && m_param->bBPyramid && brefs &&             m_param->maxNumReferences <= (brefs + 3))    {        frm.sliceType = X265_TYPE_B;        x265_log(m_param, X265_LOG_WARNING, "B-ref at frame %d incompatible with B-pyramid and %d reference frames\n",                 frm.sliceType, m_param->maxNumReferences);    }    //里面m_param->bOpenGOP为true,表示I帧后面的帧允许使用I帧前面的帧作参考    //里面m_param->bIntraRefresh为0    //里面m_param->keyframeMax = 250, m_param->keyframeMin = 25    if ((!m_param->bIntraRefresh || frm.frameNum == 0) && frm.frameNum - m_lastKeyframe >= m_param->keyframeMax)    {//只有当frm.frameNum-m_lastKeyframe>=250才会进来,m_lastKeyframe初始值为-250,第一帧会进来     //也正是由于m_lastKeyframe初始值为-250,所以第一帧是IDR        if (frm.sliceType == X265_TYPE_AUTO || frm.sliceType == X265_TYPE_I)            frm.sliceType = m_param->bOpenGOP && m_lastKeyframe >= 0 ? X265_TYPE_I : X265_TYPE_IDR;        bool warn = frm.sliceType != X265_TYPE_IDR;        if (warn && m_param->bOpenGOP)            warn &= frm.sliceType != X265_TYPE_I;        if (warn)        {            x265_log(m_param, X265_LOG_WARNING, "specified frame type (%d) at %d is not compatible with keyframe interval\n",                     frm.sliceType, frm.frameNum);            frm.sliceType = m_param->bOpenGOP && m_lastKeyframe >= 0 ? X265_TYPE_I : X265_TYPE_IDR;        }    }    if (frm.sliceType == X265_TYPE_I && frm.frameNum - m_lastKeyframe >= m_param->keyframeMin)    {        if (m_param->bOpenGOP)        {            m_lastKeyframe = frm.frameNum;            frm.bKeyframe = true;        }        else            frm.sliceType = X265_TYPE_IDR;    }    if (frm.sliceType == X265_TYPE_IDR)    {//第一帧进了这里        /* Closed GOP */        m_lastKeyframe = frm.frameNum;        frm.bKeyframe = true;        if (bframes > 0)        {            list[bframes - 1]->m_lowres.sliceType = X265_TYPE_P;            bframes--;        }    }    if (bframes == m_param->bframes || !list[bframes + 1])    {//所有帧都进了这里        if (IS_X265_TYPE_B(frm.sliceType))            x265_log(m_param, X265_LOG_WARNING, "specified frame type is not compatible with max B-frames\n");        if (frm.sliceType == X265_TYPE_AUTO || IS_X265_TYPE_B(frm.sliceType))            frm.sliceType = X265_TYPE_P;    }    if (frm.sliceType == X265_TYPE_BREF)        brefs++;    if (frm.sliceType == X265_TYPE_AUTO)        frm.sliceType = X265_TYPE_B;    else if (!IS_X265_TYPE_B(frm.sliceType))        break;}//上面循环只有一次,主要是给list[0]->m_lowres.sliceType赋值为X265_TYPE_IDR,X265_TYPE_I或//X265_TYPE_P.如果是I或IDR,另外还有m_lastKeyframe = list[0]->m_lowres.frameNum;和//list[0]->m_lowres.bKeyframe = true;list[bframes]->m_lowres.leadingBframes = bframes;//0//m_lastNonB初始值为null,它的值只能在这里改变m_lastNonB = &list[bframes]->m_lowres;m_histogram[bframes]++;/* insert a bref into the sequence */if (m_param->bBPyramid && bframes > 1 && !brefs)//这里进不去{    list[bframes / 2]->m_lowres.sliceType = X265_TYPE_BREF;    brefs++;}/* calculate the frame costs ahead of time for estimateFrameCost while we still have lowres */if (m_param->rc.rateControlMode != X265_RC_CQP)//这里每次都进去{    int p0, p1, b;    /* estimate new non-B cost */    p1 = b = bframes + 1;//p1 = b = 1    //由最前面可知frames[1] = &(list[0]->m_lowres),而list[0]->m_lowres.sliceType    //前面刚完成赋值。可以为IDR,I,P。    //#define IS_X265_TYPE_I(x) ((x) == X265_TYPE_I || (x) == X265_TYPE_IDR)    p0 = (IS_X265_TYPE_I(frames[bframes + 1]->sliceType)) ? b : 0;    CostEstimateGroup estGroup(*this, frames);    estGroup.singleCost(p0, p1, b);}m_inputLock.acquire();int64_t pts[X265_BFRAME_MAX + 1];for (int i = 0; i <= bframes; i++){    Frame *curFrame;    curFrame = m_inputQueue.popFront();    pts[i] = curFrame->m_pts;    maxSearch--;}m_inputLock.release();//bframes = 0,maxSearch = 0,pts[0] = m_inputQueue.first().m_ptsm_outputLock.acquire();/* add non-B to output queue */int idx = 0;list[bframes]->m_reorderedPts = pts[idx++];m_outputQueue.pushBack(*list[bframes]);m_outputLock.release();}

3.搜索PreLookaheadGroup类可以看出它只在void Lookahead::slicetypeDecide()函数里被使用。初始化参数是*this.它被调用的函数只有:
void PreLookaheadGroup::processTasks(int workerThreadID)
该函数做的事从void Lookahead::slicetypeDecide()函数的角度来看,是如下几行代码:
LookaheadTLD& tld = m_tld[1];//这里也有可能是m_tld[0],因为是多个线程同时执行,谁抢到就是谁
Frame* preFrame = m_inputQueue.first();
preFrame->m_lowres.init(preFrame->m_fencPic, preFrame->m_poc);
tld.calcAdaptiveQuantFrame(preFrame, m_lookahead.m_param);
tld.lowresIntraEstimate(preFrame->m_lowres);
preFrame->m_lowresInit = true;
//calcAdaptiveQuantFrame和lowresIntraEstimate代码就不贴出来了,是计算帧内cost的。

4.搜索CostEstimateGroup类可以看出它都是在Lookahead类的成员函数里面初始化并使用的,初始化使用的第一个参数都是*this.它被调用的函数有:

    1).int64_t CostEstimateGroup::singleCost该函数直接调用了CostEstimateGroup::estimateFrameCost函数    2).int64_t CostEstimateGroup::estimateFrameCost(LookaheadTLD& tld, int p0, int p1, int b, bool bIntraPenalty){//下面的注释默认是从slicetypeDecide函数进入的,这里的m_frames实际上就是slicetypeDecide//函数里面的数组的frames。所以m_frames[0] = m_lastNonB即已经处理了的最后一个非B帧,//m_frames[1] = &m_inputQueue.first()->m_lowres,m_frames[2] = nullLowres*     fenc  = m_frames[b];x265_param* param = m_lookahead.m_param;int64_t     score = 0;//fenc->costEst和fenc->rowSatds数组初始化都是-1,但是如果被赋值了就一定不小于0。//void LookaheadTLD::lowresIntraEstimate函数里面会给costEst[0][0]和rowSatds[0][0][0]//赋值,所以进下面的else分支一定是在p0为0的情况下。p0在I或IDR帧时为1,在P或B帧时为0.if (fenc->costEst[b - p0][p1 - b] >= 0 && fenc->rowSatds[b - p0][p1 - b][0] != -1)    score = fenc->costEst[b - p0][p1 - b];else{    X265_CHECK(p0 != b, "I frame estimates should always be pre-calculated\n");    bool bDoSearch[2];    bDoSearch[0] = p0 < b && fenc->lowresMvs[0][b - p0 - 1][0].x == 0x7FFF;//true    bDoSearch[1] = p1 > b && fenc->lowresMvs[1][p1 - b - 1][0].x == 0x7FFF;//false#if CHECKED_BUILD    X265_CHECK(!(p0 < b && fenc->lowresMvs[0][b - p0 - 1][0].x == 0x7FFE), "motion search batch duplication L0\n");    X265_CHECK(!(p1 > b && fenc->lowresMvs[1][p1 - b - 1][0].x == 0x7FFE), "motion search batch duplication L1\n");    if (bDoSearch[0]) fenc->lowresMvs[0][b - p0 - 1][0].x = 0x7FFE;    if (bDoSearch[1]) fenc->lowresMvs[1][p1 - b - 1][0].x = 0x7FFE;#endif    fenc->weightedRef[b - p0].isWeighted = false;    //param->bEnableWeightedPred = 1    if (param->bEnableWeightedPred && bDoSearch[0])        tld.weightsAnalyse(*m_frames[b], *m_frames[p0]);//一定执行这句    fenc->costEst[b - p0][p1 - b] = 0;    fenc->costEstAq[b - p0][p1 - b] = 0;    if (!m_batchMode && m_lookahead.m_numCoopSlices > 1 && ((p1 > b) || bDoSearch[0] || bDoSearch[1]))    {//这里每次都进这里,下面else进不去        //m_numCoopSlices是在Lookahead构造函数里面初始化的,为:(720/16)/10=45/10= 4        memset(&m_slice, 0, sizeof(Slice) * m_lookahead.m_numCoopSlices);        m_lock.acquire();        X265_CHECK(!m_batchMode, "single CostEstimateGroup instance cannot mix batch modes\n");        m_coop.p0 = p0;// 0        m_coop.p1 = p1;// 1        m_coop.b = b;  // 1        m_coop.bDoSearch[0] = bDoSearch[0];// true        m_coop.bDoSearch[1] = bDoSearch[1];// false        m_jobTotal = m_lookahead.m_numCoopSlices;// 4        m_jobAcquired = 0;        m_lock.release();        tryBondPeers(*m_lookahead.m_pool, m_jobTotal);        processTasks(-1);        waitForExit();        for (int i = 0; i < m_lookahead.m_numCoopSlices; i++)        {            fenc->costEst[b - p0][p1 - b] += m_slice[i].costEst;            fenc->costEstAq[b - p0][p1 - b] += m_slice[i].costEstAq;            if (p1 == b)                fenc->intraMbs[b - p0] += m_slice[i].intraMbs;        }    }    else    {        bool lastRow = true;        for (int cuY = m_lookahead.m_8x8Height - 1; cuY >= 0; cuY--)        {            fenc->rowSatds[b - p0][p1 - b][cuY] = 0;            for (int cuX = m_lookahead.m_8x8Width - 1; cuX >= 0; cuX--)                estimateCUCost(tld, cuX, cuY, p0, p1, b, bDoSearch, lastRow, -1);            lastRow = false;        }    }    score = fenc->costEst[b - p0][p1 - b];    if (b != p1)        score = score * 100 / (130 + param->bFrameBias);    fenc->costEst[b - p0][p1 - b] = score;}if (bIntraPenalty)    // arbitrary penalty for I-blocks after B-frames    score += score * fenc->intraMbs[b - p0] / (tld.ncu * 8);return score;}3).void CostEstimateGroup::processTasks(int workerThreadID)//m_lookahead.m_numRowsPerSlice= 10, m_lookahead.m_8x8Height= 720/16= 45//把整幅图片分为4份(m_jobTotal = 45/10 = 4),每一份10行,//最后一份多于10行但是小于20行。这样就便于多个线程并行处理。4).void CostEstimateGroup::estimateCUCost计算每个CU的帧间cost。可以肯定其输入参数:p0 = 0, p1 = b = 1, bDoSearch[0] = true, bDoSearch[1] = false
0 0
原创粉丝点击