PCM混音

来源：互联网发布：叶英捏脸数据编辑：程序博客网时间：2024/04/29 05:00

混音

pcm混音的原理是把两组数据相加，相加后的数据范围不能超过pcm位宽的表示范围，MixFrames写死是int16_t类型（具体查看AudioFrame），所以可以看出webrtc内混音处理是不支持16bit之外的pcm音频。

PCM操作，包括单声道转立体声、立体声转单声道、哑音、音量调整。
音频术语

webrtc中的混音函数在webrtc/modules/audio_conference_mixer/source/audio_conference_mixer_impl.cc，也就是下面这个函数。

// Mix |frame| into |mixed_frame|, with saturation protection and upmixing.// These effects are applied to |frame| itself prior to mixing. Assumes that// |mixed_frame| always has at least as many channels as |frame|. Supports// stereo at most.//// TODO(andrew): consider not modifying |frame| here.void MixFrames(AudioFrame* mixed_frame, AudioFrame* frame, bool use_limiter) {  assert(mixed_frame->num_channels_ >= frame->num_channels_);  if (use_limiter) {    // Divide by two to avoid saturation in the mixing.    // This is only meaningful if the limiter will be used.    *frame >>= 1;  }  if (mixed_frame->num_channels_ > frame->num_channels_) {    // We only support mono-to-stereo.    assert(mixed_frame->num_channels_ == 2 &&           frame->num_channels_ == 1);    AudioFrameOperations::MonoToStereo(frame);  }  *mixed_frame += *frame;}

最后一句代码才是混合的关键所在，它调用了AudioFrame的重载函数+=，也就是进行了下面的操作。也就是把相加后的数据控制在int16_t范围。
文件路径是：webrtc/modules/interface/module_common_types.h

inline AudioFrame& AudioFrame::operator+=(const AudioFrame& rhs) {  ...  if (speech_type_ != rhs.speech_type_) speech_type_ = kUndefined;  if (noPrevData) {    memcpy(data_, rhs.data_,           sizeof(int16_t) * rhs.samples_per_channel_ * num_channels_);  } else {    // IMPROVEMENT this can be done very fast in assembly    for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {      int32_t wrapGuard =          static_cast<int32_t>(data_[i]) + static_cast<int32_t>(rhs.data_[i]);      if (wrapGuard < -32768) {        data_[i] = -32768;      } else if (wrapGuard > 32767) {        data_[i] = 32767;      } else {        data_[i] = (int16_t)wrapGuard;      }    }  }  energy_ = 0xffffffff;  return *this;}

最后的判断可以用宏来写

#define MIXER_MAX(x,y) ((x)>(y)? (x):(y))#define MIXER_MIN(x,y) ((x)<(y)? (x):(y))#define MIXER_CLIP3(a,b,x) (MIXER_MAX(a,MIXER_MIN(x,b)))  /* clip x between a and b */#define MIXER_CLIP(x)  MIXER_CLIP3(-32768,32767,x)for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {    int32_t wrapGuard =        static_cast<int32_t>(data_[i]) + static_cast<int32_t>(rhs.data_[i]);    data_[i] = (int16_t)MIXER_CLIP(wrapGuard);}

Android源码里面是这样写的，用位移的效率要高一些，我仅仅是根据理论知识推测效率要比判断要高，没有进行过对比。

static inline int16_t clamp16(int32_t sample){    if ((sample>>15) ^ (sample>>31))        sample = 0x7FFF ^ (sample>>31);    return sample;}

0 0