SoundTouch音频处理库源码分析及算法提取(7)

来源：互联网发布：北斗星导航软件编辑：程序博客网时间：2024/05/21 15:43

上一节说到TDStretch类成员函数processSamples，粗略分析了一下大概。至于流程是通过TDStretch类成员函数putsamples调用processSamples进行处理，我们不难联想到前面对于SoundTouch类成员函数putSamples的分析。TDStretch类成员函数putSamples实现如下： // Adds 'numsamples' pcs of samples from the 'samples' memory position into // the input of the object. void TDStretch::putSamples(const SAMPLETYPE *samples, uint nSamples) { // Add the samples into the input buffer inputBuffer.putSamples(samples, nSamples); // Process the samples in input buffer processSamples(); } 先拷贝长度为nSamples的samples数据到inputbuffer，然后调用processSamples进行处理。TDStretch类的核心就是这个成员函数的实现，以下将详细分析一下这个类成员函数的实现。// Processes as many processing frames of the samples 'inputBuffer', store // the result into 'outputBuffer' void TDStretch::processSamples() { int ovlSkip, offset; int temp; while ((int)inputBuffer.numSamples() >= sampleReq) { // If tempo differs from the normal ('SCALE'), scan for the best overlapping // position offset = seekBestOverlapPosition(inputBuffer.ptrBegin()); // Mix the samples in the 'inputBuffer' at position of 'offset' with the // samples in 'midBuffer' using sliding overlapping // ... first partially overlap with the end of the previous sequence // (that's in 'midBuffer') overlap(outputBuffer.ptrEnd((uint)overlapLength), inputBuffer.ptrBegin(), (uint)offset); outputBuffer.putSamples((uint)overlapLength); // ... then copy sequence samples from 'inputBuffer' to output: temp = (seekLength / 2 - offset); // length of sequence temp = (seekWindowLength - 2 * overlapLength); // crosscheck that we don't have buffer overflow... if ((int)inputBuffer.numSamples() < (offset + temp + overlapLength * 2)) { continue; // just in case, shouldn't really happen } outputBuffer.putSamples(inputBuffer.ptrBegin() + channels * (offset + overlapLength), (uint)temp); // Copies the end of the current sequence from 'inputBuffer' to // 'midBuffer' for being mixed with the beginning of the next // processing sequence and so on assert((offset + temp + overlapLength * 2) <= (int)inputBuffer.numSamples()); memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp + overlapLength), channels * sizeof(SAMPLETYPE) * overlapLength); // Remove the processed samples from the input buffer. Update // the difference between integer & nominal skip step to 'skipFract' // in order to prevent the error from accumulating over time. skipFract += nominalSkip; // real skip size ovlSkip = (int)skipFract; // rounded to integer skip skipFract -= ovlSkip; // maintain the fraction part, i.e. real vs. integer skip inputBuffer.receiveSamples((uint)ovlSkip); } } 首先，sampleReq就是上一节提到的计算得到的参数，音频伸缩的长度。先判断一下inputBuffer的长度是否达到sampleReq的长度，如果达到。则通过调用类成员函数seekBestOverlapPosition(inputBuffer.ptrBegin());从输入的buffer中找一个最相似的点。我们看看类成员函数seekBestOverlapPosition的实现： // Seeks for the optimal overlap-mixing position. int TDStretch::seekBestOverlapPosition(const SAMPLETYPE *refPos) { if (channels == 2) { // stereo sound if (bQuickSeek) { return seekBestOverlapPositionStereoQuick(refPos); } else { return seekBestOverlapPositionStereo(refPos); } } else { // mono sound if (bQuickSeek) { return seekBestOverlapPositionMonoQuick(refPos); } else { return seekBestOverlapPositionMono(refPos); } } } 同样以单声道为例，便于理解，通过判断bQuickSeek这个条件变量，分情况调用seekBestOverlapPositionMonoQuick和seekBestOverlapPositionMono。 // Seeks for the optimal overlap-mixing position. The 'mono' version of the // routine // // The best position is determined as the position where the two overlapped // sample sequences are 'most alike', in terms of the highest cross-correlation // value over the overlapping period int TDStretch::seekBestOverlapPositionMonoQuick(const SAMPLETYPE *refPos) { int j; int bestOffs; double bestCorr, corr; int scanCount, corrOffset, tempOffset; // Slopes the amplitude of the 'midBuffer' samples precalcCorrReferenceMono(); bestCorr = FLT_MIN; bestOffs = _scanOffsets[0][0]; corrOffset = 0; tempOffset = 0; // Scans for the best correlation value using four-pass hierarchical search. // // The look-up table 'scans' has hierarchical position adjusting steps. // In first pass the routine searhes for the highest correlation with // relatively coarse steps, then rescans the neighbourhood of the highest // correlation with better resolution and so on. for (scanCount = 0;scanCount < 4; scanCount ++) { j = 0; while (_scanOffsets[scanCount][j]) { tempOffset = corrOffset + _scanOffsets[scanCount][j]; if (tempOffset >= seekLength) break; // Calculates correlation value for the mixing position corresponding // to 'tempOffset' corr = (double)calcCrossCorrMono(refPos + tempOffset, pRefMidBuffer); // heuristic rule to slightly favour values close to mid of the range double tmp = (double)(2 * tempOffset - seekLength) / seekLength; corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp)); // Checks for the highest correlation value if (corr > bestCorr) { bestCorr = corr; bestOffs = tempOffset; } j ++; } corrOffset = bestOffs; } // clear cross correlation routine state if necessary (is so e.g. in MMX routines). clearCrossCorrState(); return bestOffs; } 和 // Seeks for the optimal overlap-mixing position. The 'mono' version of the // routine // // The best position is determined as the position where the two overlapped // sample sequences are 'most alike', in terms of the highest cross-correlation // value over the overlapping period int TDStretch::seekBestOverlapPositionMono(const SAMPLETYPE *refPos) { int bestOffs; double bestCorr, corr; int tempOffset; const SAMPLETYPE *compare; // Slopes the amplitude of the 'midBuffer' samples precalcCorrReferenceMono(); bestCorr = FLT_MIN; bestOffs = 0; // Scans for the best correlation value by testing each possible position // over the permitted range. for (tempOffset = 0; tempOffset < seekLength; tempOffset ++) { compare = refPos + tempOffset; // Calculates correlation value for the mixing position corresponding // to 'tempOffset' corr = (double)calcCrossCorrMono(pRefMidBuffer, compare); // heuristic rule to slightly favour values close to mid of the range double tmp = (double)(2 * tempOffset - seekLength) / seekLength; corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp)); // Checks for the highest correlation value if (corr > bestCorr) { bestCorr = corr; bestOffs = tempOffset; } } // clear cross correlation routine state if necessary (is so e.g. in MMX routines). clearCrossCorrState(); return bestOffs; } 可以看出这两个函数大大的不同，其实却是大同小异，先分析一下TDStretch::seekBestOverlapPositionMono，这是一个中规中矩的实现，同样以浮点为例。留意到函数中有这么一个函数precalcCorrReferenceMono()，函数实现如下： // Slopes the amplitude of the 'midBuffer' samples so that cross correlation // is faster to calculate void TDStretch::precalcCorrReferenceMono() { int i; float temp; for (i=0 ; i < (int)overlapLength ;i ++) { temp = (float)i * (float)(overlapLength - i); pRefMidBuffer[i] = (float)(pMidBuffer[i] * temp); } } 这个可以理解为一个新的窗函数W[i]，i=[0,overlapLength];temp是顶点在(overlapLength/2,overlapLength^2/4)，与x轴相交于(0,0),(overlapLength,0)的二次函数，一个具有对称性的二次函数。pRefMidBuffer[i] = pMidBuffer[i]*W[i]再往下看calcCrossCorrMono计算互相关系数这个函数的实现： double TDStretch::calcCrossCorrMono(const float *mixingPos, const float *compare) const { double corr; double norm; int i; corr = norm = 0; for (i = 1; i < overlapLength; i ++) { corr += mixingPos[i] * compare[i]; norm += mixingPos[i] * mixingPos[i]; } if (norm < 1e-9) norm = 1.0; // to avoid div by zero return corr / sqrt(norm); } 想一下归一化互相关系数计算公式 E为累加，L=0,正负1,正负2,... Rxy = E(x(n)y(n-L)) = E(x(n+L)y(n)) Ryx = E(y(n)x(n-L)) = E(y(n+L)x(n)) Pxy = Rxy / Sqrt(Rxy(0)Ryx(0)) Pxy的值在[-1,1]之间可以看出他的计算方法和传统的互相关系数计算有着形式上的不同。我个人是这么理解的。pMidBuffer就是两个离散信号叠加的中间部分，两个信号叠加为了使叠加部分的更平滑一般的做法就是 .______________ . |. . | . | | . . -y[n] | . | | . . | . | | . . | -> . | | . . | . | | x[n] . . | . y[n] | _____________. . .______________ y[n]的和x[n]叠加的部分应该满足以上这种情况以得到比较好的平滑质量。TDStretch::seekBestOverlapPositionMono类函数实现了这样的叠加过程，只不过做了相当的优化工作，所以在void TDStretch::processSamples()类成员函数中： memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp + overlapLength),channels * sizeof(SAMPLETYPE) * overlapLength); pMidBuffer直接先从x[n]取值，compare就是x[n+overlapLength]，把seekBestOverlapPositionMono做以下变形，方便理解： int i=0,j=0,bestcorr=0; double crosscorr = 0,norm = 0,tmp = 0; for (i = 0; i < seekLength;i++) { for (j = 0; j < overlapLength;j++) { mixingPos[j] = inputBuffer[j] * (overlapLength - j); compare[j] = inputBuffer[i + j] * j; corr += compare[j] * mixingPos[j]; norm += mixingPos[j]*mixingPos[j]; } corr = corr / sqrt(norm); tmp = (double)(2 * i - seekLength) / seekLength; corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp)); if corr > bestcorr) { // found new best offset candidate bestcorr = corr; bestoffset = i; } } 注意到在j的循环里面，mixingPos[j]* (overlapLength - j)和j的值和i值无关，为了提升性能，可以在i的循环外先算出mixingPos[j]* j*(overlapLength - j)的值，seekBestOverlapPositionMono函数就是优化后的算法结构。这样子就可以理解为y[m] = x[m]*w[m]*w[N-m],w[m]的镜像函数是w[N-m]然后和x[n]通过互相关系数计算出最相似的位置作为叠加的位置。 tmp = (double)(2 * i - seekLength) / seekLength; corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp)); 画出(1.0 - 0.25 * tmp * tmp))的图形就很好理解，可以认为人为的对corr进行修正，越靠近叠加区域中点，corr可以取得更大的相关性，把最相似点的位置尽量往中间靠。至此，ST的大部分源码已经分析完毕，将在下一节中提取算法改良，无非就是一个总结。本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/suhetao/archive/2010/09/16/5889102.aspx