自适应回声消除算法
欢迎留言交流
AEC算法早期用在Voip,电话这些场景中,自从智能设备诞生后,智能语音设备也要消除自身的音源,这些音源包括音乐或者TTS机器合成声音。
本文基于开源算法阐述AEC的原理和实现,基于WebRTC和speex两种算法,文末会附上两种算法的matlab实现。
回声消除原理
回声消除的基本原理是使用一个自适应滤波器对未知的回声信道:ω 进行参数辨识,根据扬声器信号与产生的多路回声的相关性为基础,建立远端信号模型,模拟回声路径,通过自适应算法调整,使其冲击响应和真实回声路径相逼近。然后将麦克风接收到的信号减去估计值,即可实现回声消除功能。
echo=x∗ω 1.1
d=s+echo 1.2
y^=x∗ω^ 1.3
e=d−y^ 1.4
式中ω是回声通道的时域冲击响应函数,x是远端语音;echo是所得回声;s是近端说话人语音,d为麦克风采集到的信号,y^是对回声信号的估计值,e为误差。
为了消除较长时间的回声,需要FIR滤波器的阶数较高,时域计算法,有两个问题,一个是实时性较差,一个是计算量大。为了在实时性/计算量以及可以消除的回声时长之间找到使这三个最优的算法,采用了频谱分块自适应滤波算法。
这里用到了很多信号处理算法,为了让算法理解起来容易些,简单罗列涉及到的算法:
FFT/IFFT
循环卷积和线性卷积的关系;重叠保留法
功率谱密度
互相关
NLMS自适应算法
NLMS权重调整
关于NLMS,可以下载http://download.csdn.net/detail/shichaog/9832657
下面直接开始WebRTC的matlab梳理,由于matlab代码和webRTC的c++代码命名几乎一致。所以c++的实现就一笔带过。
首先解释几个名词:
RERL-residual_echo_return_loss
ERL-echo return loss
ERLE echo return loss enhancement
NLP non-linear processing
首先matlab读入远端和近端信号。
%near is micphone captured signalfid=fopen('near.pcm', 'rb'); % Load far endssin=fread(fid,inf,'float32');fclose(fid);%far is speaker played musicfid=fopen('far.pcm', 'rb'); % Load fnear endrrin=fread(fid,inf,'float32');fclose(fid);
然后对一些变量赋初值
fs=16000;NLPon=1; % NLP onM = 16; % Number of partitionsN = 64; % Partition lengthL = M*N; % Filter lengthVADtd=48;alp = 0.15; % Power estimation factor alc = 0.1; % Coherence estimation factorstep = 0.1875;%0.1875; % Downward step size
上述初始化中,M=16和最新的WebRTC代码并不一致,且最新的WebRTC中支持aec3最新一代算法。
len=length(ssin);NN=len;Nb=floor(NN/N)-M;for kk=1:Nb pos = N * (kk-1) + start;
可以看出Nb是麦克风采集到的数据块数-16(分区数),这是因为第一次输入了16块,所以这里减掉了16。pos是每一次添加一块时的地址指针。
%far is speaker played music xk = rrin(pos:pos+N-1); %near is micphone captured signal dk = ssin(pos:pos+N-1);
xk和dk是读取到的64个点,这里是时域信号。
功率计算
%----------------------- far end signal process xx = [xo;xk]; xo = xk; tmp = fft(xx); XX = tmp(1:N+1); dd = [do;dk]; % Overlap do = dk; tmp = fft(dd); % Frequency domain DD = tmp(1:N+1);
将xk和上一次的数据结合在一起,做FFT变换,由于两次组合在一起,那么得到的应该是N=128点,这里可以知道得到的谱分辨率是n∗fs/N,fs前面设置过了,是16k,则得到的每一个bin的频谱分辨率是16000/128=125Hz。这里XX和DD取了前65个点,这是因为N点FFT变换后频谱是关于N/2对称的,对称的原因是奈奎斯特采样定理,如果fs=16000Hz,那么要求采样到的信号的截止频率必然小于等于fs/2=8000Hz,对于实信号,N/2~N,实际上表示的是−fs/2 ~ 0之间的频率。第一个点是直流分量,所以取65个点。和上一帧64个点信号合并在一起的另一个原因是使用重叠保(overlap-save)留法将循环卷积变成线性卷积,这里做的FFT变换,就是为了减少时域里做卷积的计算量的。
计算远端信号功率谱
pn0 = (1 - alp) * pn0 + alp * real(XX.* conj(XX)); pn = pn0;
平滑功率谱,上一次的功率谱占85%(alp=0.15),后面的频域共轭相乘等于功率是有帕斯瓦尔定理支撑的。pn0是65*1的矩阵。
滤波
XFm(:,1) = XX;
首先将远端信号频谱赋给XFm,XFm是65*16的矩阵,16就是前面初始化的M值,这里将XX给第一列,其2~16列对应的是之前的输入频谱。
for mm=0:(M-1) m=mm+1; YFb(:,m) = XFm(:,m) .* WFb(:,m); end
YFb,WFb以及XFm都是65*16的矩阵,WFb是自适应滤波器的频谱表示,XFm是原始的speaker数据,上式的意义对应于插图中的y^的频域值,变换到时域后就可以得到y的估计值y^.
yfk = sum(YFb,2); tmp = [yfk ; flipud(conj(yfk(2:N)))]; ykt = real(ifft(tmp)); ykfb = ykt(end-N+1:end);
首先yfk是65*1的矩阵,sum求和就是将估计的频谱按行求和,也就是yfk包含了最近16个块的远端频谱估计信息,这样,只要近端麦克采集到的信号里有这16个块包含的远端信号,那么就可以消掉,从这里也可以看出来,容许的延迟差 在16*64/16=64ms,也就是说,如果麦克风采集到的speaker信号滞后speaker播放超过64ms,那么这种情况是无法消掉的,当然,延迟差越小越好。
flipud(conj(yfk(2:N))是因为前面计算频谱时利用奈奎斯特定理,也即实数的FFT结果按N/2对称,所以这里为了得到正确的ifft变换结果,先把频谱不全到fs.
ykfb就是y^.后面再看WFb是如何跟新。
误差估计
ekfb = dk - ykfb;
dk是麦克风采集到的信号,ykfb是y^,这样得到的是误差信号,理想情况下,那么得到的误差信号就是需要的人声信号,而完全滤除 掉了speaker信号(远端信号)。
erfb(pos:pos+N-1) = ekfb; tmp = fft([zm;ekfb]); % FD version for cancelling part (overlap-save) Ek = tmp(1:N+1);
erfb是近端信号数组长度×1矩阵,存放的是全部样本对应的误差信号,这个保存仅仅是为了plot用的。
然后补了64个零,然后做FFT,Ek是误差信号FFT的结果。
自适应调节
Ek2 = Ek ./(pn + 0.001); % Normalized error
pn是当前帧远端信号功率谱,Ek是误差信号频谱。Ek2是归一化误差频谱。NLMS公式要求。
absEf = max(abs(Ek2), threshold); absEf = ones(N+1,1)*threshold./absEf; Ek2 = Ek2.*absEf;
max的作用是为了防止归一化后误差频谱过小,最终得到的Ek2是一个限幅矩阵,如果该点的值比门限大,则取门限,如果该点的值比门限小,则保持不变。
mEk = mufb.*Ek2;
mufb是步长,对于16000情况,步长取了0.8.NLMS公式。
PP = conj(XFm).*(ones(M,1) * mEk')'; tmp = [PP ; flipud(conj(PP(2:N,:)))]; IFPP = real(ifft(tmp)); PH = IFPP(1:N,:); tmp = fft([PH;zeros(N,M)]); FPH = tmp(1:N+1,:); WFb = WFb + FPH;
PP是将远端信号的共轭乘以误差信号频谱,这一项用于调节步长,NLMS(步长=参考信号×步长×误差)的可变步长就提现在这里。PH是频域到时域的变换值。这和前面频域到时域的变换原理一样。WFb是权中系数的更新。
if mod(kk, 10*mult) == 0 WFbEn = sum(real(WFb.*conj(WFb))); %WFbEn = sum(abs(WFb)); [tmp, dIdx] = max(WFbEn); WFbD = sum(abs(WFb(:, dIdx)),2); %WFbD = WFbD / (mean(WFbD) + 1e-10); WFbD = min(max(WFbD, 0.5), 4); end dIdxV(kk) = dIdx;
上述的作用是更新dIdx和dIdxV。这里的更新并不是每一次都更新,一来是为了稳定,而来也是变相的减少计算量,提高实时性。就算是每一次都更新dIdx,WebRTC计算速度评估的结果也是很满意的。WFb是权重向量的频谱表示,WFbEn是权重向量按列求和,得到的是16*1的矩阵。这样得到的是16个块对权重的累加和。这样的区分度比直接累加和要大。
[tmp, dIdx] = max(WFbEn);作用就是找到16个块中权重累加和最大值及其对应的索引。
WFbD首先计算了权重最大那个块dIdx的列,然后将其按行求和,得到的就是65*1矩阵。WFbD不能低于0.5也不能高于4,算法中并未使用到,plot性能分析时用到。
最后把索引值dIdx存放到dIdxV(kk)中,这样每来一帧,就会有一个最大索引值放到dIdxV向量中。
功率谱密度和相关性计算
NLP
这里的NLP不是native language processing,而是Non-linear processing的意思。
ee = [eo;ekfb]; eo = ekfb; window = wins;
上述作用是将上次的误差和ekfb组合,其中eo可以理解为error old。eo也确实保存了上一次的误差。window是简单将窗函数赋值。
tmp = fft(xx.*window); xf = tmp(1:N+1); tmp = fft(dd.*window); df = tmp(1:N+1); tmp = fft(ee.*window); ef = tmp(1:N+1);
上述代码是把xx,dd,ee加窗后做FFT变换,并且取了fs/2的频谱部分存放到xf,df以及ef中。加窗的目的是为了减小频谱泄露,提高谱分辨率。
xfwm(:,1) = xf; xf = xfwm(:,dIdx); %fprintf(1,'%d: %f\n', kk, xf(4)); dfm(:,1) = df;
将xf存放到xfwm的第一列,xfwm是65*16的矩阵,df做类似处理。
然后把dIdx指向的那一列传给xf,dIdx是之前计算的权重矩阵能量最大的那块的索引,这样从xfwm矩阵里把真正要处理近端信号对应的远端信号(speaker,参考信号)获取到。
Se = gamma*Se + (1-gamma)*real(ef.*conj(ef)); Sd = gamma*Sd + (1-gamma)*real(df.*conj(df)); Sx = gamma*Sx + (1 - gamma)*real(xf.*conj(xf));
计算ef,df和xf的平滑功率谱,gamma这里取值是0.92.相对于8k信号取值略大。它们都是65*1的矩阵,包括了直流分量的能力,剩下的64点是fs/2及以下的频点能量。
Sxd = gamma*Sxd + (1 - gamma)*xf.*conj(df); Sed = gamma*Sed + (1-gamma)*ef.*conj(df);
计算互功率谱,这里计算了远端信号和近端信号功率谱,如果该值较大,则说明它们的相关性较强,说明近端信号采集到了远端信号的概率很大,这时需要进行噪声抑制,同样的如果误差信号和近端信号功率谱较大,则说明估计的y^是比较准的,误差信号里剩余的远端信号较少,需要进行噪声抑制的概率就小。它们也都是65*1矩阵,对应频点的bin。但是上述获得的是绝对值而非相对值,门限设置不容易,需要一个归一化的过程。归一化的过程可以通过求互相关得到。
cohed = real(Sed.*conj(Sed))./(Se.*Sd + 1e-10); cohedAvg(kk) = mean(cohed(echoBandRange)); cohxd = real(Sxd.*conj(Sxd))./(Sx.*Sd + 1e-10);
如上,计算误差信号和近端信号的互相关,1e-10是为了防止除0报错。cohed越大,表示回声越小,cohxd越大,表示回声越大,这里就可以设置一个统一的门限评判上下限了。
cohedMean = mean(cohed(echoBandRange));
计算设置的echoBandRange里频点的均值,如果echoBandRange设置的过大,则低音部分如鼓点声则不易消掉。
hnled = min(1 - cohxd, cohed);
这里的作用就是把最小值赋值给hnled,该值越大,则说明越不需要消回声。之所以取二者判断,是为了最大可能性的消除掉回声。
[hnlSort, hnlSortIdx] = sort(1-cohxd(echoBandRange)); [xSort, xSortIdx] = sort(Sx);
对1-cohxd(echoBandRange)进行升序排序,同样对Sx也进行升序排序。
hnlSortQ = mean(1 - cohxd(echoBandRange));
对远端和近端信号的带内互相关求均值。hnlSortQ表示远端和近端不相关性的平均值,其值越大约没有回声,与cohed含义一致。
[hnlSort2, hnlSortIdx2] = sort(hnled(echoBandRange));
对hnled进行升序排序。
hnlQuant = 0.75; hnlQuantLow = 0.5; qIdx = floor(hnlQuant*length(hnlSort2)); qIdxLow = floor(hnlQuantLow*length(hnlSort2)); hnlPrefAvg = hnlSort2(qIdx); hnlPrefAvgLow = hnlSort2(qIdxLow);
这里主要取了两个值,一个值取在了排序后的3/4处,一个值取在了排序后的1/2处。
if cohedMean > 0.98 & hnlSortQ > 0.9 suppState = 0; elseif cohedMean < 0.95 | hnlSortQ < 0.8 suppState = 1; end
如果误差和近端信号的互相关均值大于0.98,且远端和近端频带内的互不相关大于0.9,则说明不需要进行回声抑制,将suppState标志设置成0,如果误差和近端信号小于0.95或者远端和近端频带内信号不相关性小于0.8则需要进行印制,在这个范围之外的,回声抑制标志保持前一帧的状态。
if hnlSortQ < cohxdLocalMin & hnlSortQ < 0.75 cohxdLocalMin = hnlSortQ; end
cohxdLocalMin的初始值是1,表示远端和近端完全不相关,这里判断计算得到的远端近端不相关性是否小于前一次的不相关性,如果小于且不相关性小于0.75,则更新cohxdLocalMin。
if cohxdLocalMin == 1 ovrd = 3; hnled = 1-cohxd; hnlPrefAvg = hnlSortQ; hnlPrefAvgLow = hnlSortQ; end
如果cohxdLocalMin=1,则说明要么是发现远端和近端完全不相关,要么就是cohxdLocalMin一直没有更新,既然不相关性很大,那么也说明有回声的可能性小,那么使用较小的ovrd(over-driven)值,和较大的hnled(65*1)值。
if suppState == 0 hnled = cohed; hnlPrefAvg = cohedMean; hnlPrefAvgLow = cohedMean; end
如果suppState==0,则说明不需要进行回声消除,直接用误差近端相关性修正误差信号,hnl的两个均值读取cohed的均值,在这种情况下hnled的值接近于1.
if hnlPrefAvgLow < hnlLocalMin & hnlPrefAvgLow < 0.6 hnlLocalMin = hnlPrefAvgLow; hnlMin = hnlPrefAvgLow; hnlNewMin = 1; hnlMinCtr = 0; if hnlMinCtr == 0 hnlMinCtr = hnlMinCtr + 1; else hnlMinCtr = 0; hnlMin = hnlLocalMin; SeLocalMin = SeQ; SdLocalMin = SdQ; SeLocalAvg = 0; minCtr = 0; ovrd = max(log(0.0001)/log(hnlMin), 2); divergeFact = hnlLocalMin; end end
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
hnlLocalMin是hnl系数的最小值,在满足这条判断的情况下发现了更小的值,需要对其进行更新,同时表标志设置成1,计数清0,这种情况下回声的概率较大,所以把ovrd设大以增强抑制能力。
if hnlMinCtr == 2 hnlNewMin = 0; hnlMinCtr = 0; ovrd = max(log(0.00000001)/(log(hnlMin + 1e-10) + 1e-10), 5); end
hnlMinCtr==2,说明之前有满足<0.6的块使得hnlMinCtr=2,然后其下一块又没有满足<0.6的条件,进而更新了ovrd值。默认是和3比较取最大值,这里调节成5是为了增加抑制效果,实际情况还需要针对系统调试。
hnlLocalMin = min(hnlLocalMin + 0.0008/mult, 1); cohxdLocalMin = min(cohxdLocalMin + 0.0004/mult, 1);
跟新上述两个值,mult是fs/8000,对于16kHz,就是2.就是0.0004和0.0002的差异。
if ovrd < ovrdSm ovrdSm = 0.99*ovrdSm + 0.01*ovrd; else ovrdSm = 0.9*ovrdSm + 0.1*ovrd; end
平滑更新ovrdSm,上述结果倾向于保持ovrdSm是一个较大的值,这个较大的值是为了尽量抑制回声。
ekEn = sum(Se); dkEn = sum(Sd);
按行求和,物理意义分别是误差能量和近端信号能量。
发散处理
if divergeState == 0 if ekEn > dkEn ef = df; divergeState = 1; end else if ekEn*1.05 < dkEn divergeState = 0; else ef = df; end end
如果不进行发散处理,误差能量大于近端能力,则用近端频谱更新误差频谱并把发散状态设置成1,如果误差能量的1.05倍小于近端能量,则发散处理标志设置成0.
if ekEn > dkEn*19.95 WFb=zeros(N+1,M); end
如果误差能量大于近端能量×19.95则,将权重系数矩阵设置成0.
ekEnV(kk) = ekEn; dkEnV(kk) = dkEn;
相应能量存放在相应的向量中。
hnlLocalMinV(kk) = hnlLocalMin; cohxdLocalMinV(kk) = cohxdLocalMin; hnlMinV(kk) = hnlMin;
上述三个向量保存对应值。
平滑滤波器系数和抑制指数
wCurve = [0; aggrFact*sqrt(linspace(0,1,N))' + 0.1];
权重系数曲线生成,线性均匀分布。
hnled = weight.*min(hnlPrefAvg, hnled) + (1 - weight).*hnled;
使用权重平滑hnled,wCurve分布是让65点中频率越高的点本次hnled占的比重越大,反之则以前的平滑结果所占比重大。
od = ovrdSm*(sqrt(linspace(0,1,N+1))' + 1);
产生65*1的曲线,用作更新hnled的幂指数。
hnled = hnled.^(od.*sshift);
od作为幂指数跟新hnled。
输出回声消除后的信号
hnl = hnled; ef = ef.*(hnl);
用hnl系数与误差频谱相乘,即频域卷积,就是将误差信号通过了传递函数为hnnl的滤波器。
ovrdV(kk) = ovrdSm; hnledAvg(kk) = 1-mean(1-cohed(echoBandRange)); hnlxdAvg(kk) = 1-mean(cohxd(echoBandRange)); hnlSortQV(kk) = hnlPrefAvgLow; hnlPrefAvgV(kk) = hnlPrefAvg;
相关值的暂存
没有开启舒适噪声产生,则Fmix=ef。
% Overlap and add in time domain for smoothness tmp = [Fmix ; flipud(conj(Fmix(2:N)))]; mixw = wins.*real(ifft(tmp)); mola = mbuf(end-N+1:end) + mixw(1:N); mbuf = mixw; ercn(pos:pos+N-1) = mola;
则使用重叠想加法获得时域平滑信号。
XFm(:,2:end) = XFm(:,1:end-1); YFm(:,2:end) = YFm(:,1:end-1); xfwm(:,2:end) = xfwm(:,1:end-1); dfm(:,2:end) = dfm(:,1:end-1);
全体后移一个块,进入下一块迭代。
执行完了之后,如果想听回声消除后信号的声音,使用如下命令:
sound(10*ercn,16000)
其中16000是输入信号的频率。
整体的Matlab代码如下:
% Partitioned block frequency domain adaptive filtering NLMS and% standard time-domain sample-based NLMS%near is micphone captured signalfid=fopen('near.pcm', 'rb'); % Load far endssin=fread(fid,inf,'float32');fclose(fid);%far is speaker played musicfid=fopen('far.pcm', 'rb'); % Load fnear endrrin=fread(fid,inf,'float32');fclose(fid);rand('state',13);fs=16000;mult=fs/8000;if fs == 8000cohRange = 2:3;elseif fs==16000cohRange = 2;end% FlagsNLPon=1; % NLP onCNon=0; % Comfort noise onPLTon=0; % Plotting onM = 16; % Number of partitionsN = 64; % Partition lengthL = M*N; % Filter lengthif fs == 8000 mufb = 0.6;else mufb = 0.8;endVADtd=48;alp = 0.15; % Power estimation factor alc = 0.1; % Coherence estimation factorbeta = 0.9; % Plotting factor%% Changed a little %%step = 0.1875;%0.1875; % Downward step size%%if fs == 8000 threshold=2e-6; % DTrob thresholdelse %threshold=0.7e-6; threshold=1.5e-6; endif fs == 8000 echoBandRange = ceil(300*2/fs*N):floor(1800*2/fs*N);else echoBandRange = ceil(60*2/fs*N):floor(1500*2/fs*N);endsuppState = 1;transCtr = 0;Nt=1;vt=1;ramp = 1.0003; % Upward ramprampd = 0.999; % Downward rampcvt = 20; % Subband VAD threshold;nnthres = 20; % Noise thresholdshh=logspace(-1.3,-2.2,N+1)';sh=[shh;flipud(shh(2:end-1))]; % Suppression profilelen=length(ssin);w=zeros(L,1); % Sample-based TD(time domain) NLMSWFb=zeros(N+1,M); % Block-based FD(frequency domain) NLMSWFbOld=zeros(N+1,M); % Block-based FD NLMSYFb=zeros(N+1,M);erfb=zeros(len,1);erfb3=zeros(len,1);ercn=zeros(len,1);zm=zeros(N,1);XFm=zeros(N+1,M);YFm=zeros(N+1,M);pn0=10*ones(N+1,1);pn=zeros(N+1,1);NN=len;Nb=floor(NN/N)-M;erifb=zeros(Nb+1,1)+0.1;erifb3=zeros(Nb+1,1)+0.1;ericn=zeros(Nb+1,1)+0.1;dri=zeros(Nb+1,1)+0.1;start=1;xo=zeros(N,1);do=xo;eo=xo;echoBands=zeros(Nb+1,1);cohxdAvg=zeros(Nb+1,1);cohxdSlow=zeros(Nb+1,N+1);cohedSlow=zeros(Nb+1,N+1);%overdriveM=zeros(Nb+1,N+1);cohxdFastAvg=zeros(Nb+1,1);cohxdAvgBad=zeros(Nb+1,1);cohedAvg=zeros(Nb+1,1);cohedFastAvg=zeros(Nb+1,1);hnledAvg=zeros(Nb+1,1);hnlxdAvg=zeros(Nb+1,1);ovrdV=zeros(Nb+1,1);dIdxV=zeros(Nb+1,1);SLxV=zeros(Nb+1,1);hnlSortQV=zeros(Nb+1,1);hnlPrefAvgV=zeros(Nb+1,1);mutInfAvg=zeros(Nb+1,1);%overdrive=zeros(Nb+1,1);hnled = zeros(N+1, 1);weight=zeros(N+1,1);hnlMax = zeros(N+1, 1);hnl = zeros(N+1, 1);overdrive = ones(1, N+1);xfwm=zeros(N+1,M);dfm=zeros(N+1,M);WFbD=ones(N+1,1);fbSupp = 0;hnlLocalMin = 1;cohxdLocalMin = 1;hnlLocalMinV=zeros(Nb+1,1);cohxdLocalMinV=zeros(Nb+1,1);hnlMinV=zeros(Nb+1,1);dkEnV=zeros(Nb+1,1);ekEnV=zeros(Nb+1,1);ovrd = 2;ovrdPos = floor((N+1)/4);ovrdSm = 2;hnlMin = 1;minCtr = 0;SeMin = 0;SdMin = 0;SeLocalAvg = 0;SeMinSm = 0;divergeFact = 1;dIdx = 1;hnlMinCtr = 0;hnlNewMin = 0;divergeState = 0;Sy=ones(N+1,1);Sym=1e7*ones(N+1,1);wins=[0;sqrt(hanning(2*N-1))];ubufn=zeros(2*N,1);ebuf=zeros(2*N,1);ebuf2=zeros(2*N,1);ebuf4=zeros(2*N,1);mbuf=zeros(2*N,1);cohedFast = zeros(N+1,1);cohxdFast = zeros(N+1,1);cohxd = zeros(N+1,1);Se = zeros(N+1,1);Sd = zeros(N+1,1);Sx = zeros(N+1,1);SxBad = zeros(N+1,1);Sed = zeros(N+1,1);Sxd = zeros(N+1,1);SxdBad = zeros(N+1,1);hnledp=[];cohxdMax = 0;hh=waitbar(0,'Please wait...');%progressbar(0);%spaces = ' ';%spaces = repmat(spaces, 50, 1);%spaces = ['[' ; spaces ; ']'];%fprintf(1, spaces);%fprintf(1, '\n');for kk=1:Nb pos = N * (kk-1) + start; % FD block method % ---------------------- Organize data %far is speaker played music xk = rrin(pos:pos+N-1); %near is micphone captured signal dk = ssin(pos:pos+N-1); %----------------------- far end signal process xx = [xo;xk]; xo = xk; tmp = fft(xx); XX = tmp(1:N+1); dd = [do;dk]; % Overlap do = dk; tmp = fft(dd); % Frequency domain DD = tmp(1:N+1); % ------------------------far end Power estimation pn0 = (1 - alp) * pn0 + alp * real(XX.* conj(XX)); pn = pn0;% pn = (1 - alp) * pn + alp * M * pn0; % ---------------------- Filtering XFm(:,1) = XX; for mm=0:(M-1) m=mm+1; YFb(:,m) = XFm(:,m) .* WFb(:,m); end yfk = sum(YFb,2); tmp = [yfk ; flipud(conj(yfk(2:N)))]; ykt = real(ifft(tmp)); ykfb = ykt(end-N+1:end); % ---------------------- Error estimation ekfb = dk - ykfb; %if sum(abs(ekfb)) < sum(abs(dk)) %ekfb = dk - ykfb; % erfb(pos:pos+N-1) = ekfb; %else %ekfb = dk; % erfb(pos:pos+N-1) = dk; %end%(kk-1)*(N*2)+1 erfb(pos:pos+N-1) = ekfb; tmp = fft([zm;ekfb]); % FD version for cancelling part (overlap-save) Ek = tmp(1:N+1); % ------------------------ Adaptation %Ek2 = Ek ./(M*pn + 0.001); % Normalized error Ek2 = Ek ./(pn + 0.001); % Normalized error absEf = max(abs(Ek2), threshold); absEf = ones(N+1,1)*threshold./absEf; Ek2 = Ek2.*absEf; mEk = mufb.*Ek2; PP = conj(XFm).*(ones(M,1) * mEk')'; tmp = [PP ; flipud(conj(PP(2:N,:)))]; IFPP = real(ifft(tmp)); PH = IFPP(1:N,:); tmp = fft([PH;zeros(N,M)]); FPH = tmp(1:N+1,:); WFb = WFb + FPH;% if mod(kk, 10*mult) == 0 WFbEn = sum(real(WFb.*conj(WFb))); %WFbEn = sum(abs(WFb)); [tmp, dIdx] = max(WFbEn); WFbD = sum(abs(WFb(:, dIdx)),2); %WFbD = WFbD / (mean(WFbD) + 1e-10); WFbD = min(max(WFbD, 0.5), 4);% end dIdxV(kk) = dIdx; % NLP if (NLPon) ee = [eo;ekfb]; eo = ekfb; window = wins; if fs == 8000 gamma = 0.9; else gamma = 0.93; end tmp = fft(xx.*window); xf = tmp(1:N+1); tmp = fft(dd.*window); df = tmp(1:N+1); tmp = fft(ee.*window); ef = tmp(1:N+1); xfwm(:,1) = xf; xf = xfwm(:,dIdx); %fprintf(1,'%d: %f\n', kk, xf(4)); dfm(:,1) = df; SxOld = Sx; Se = gamma*Se + (1-gamma)*real(ef.*conj(ef)); Sd = gamma*Sd + (1-gamma)*real(df.*conj(df)); Sx = gamma*Sx + (1 - gamma)*real(xf.*conj(xf)); %xRatio = real(xfwm(:,1).*conj(xfwm(:,1))) ./ ... % (real(xfwm(:,2).*conj(xfwm(:,2))) + 1e-10); %xRatio = Sx ./ (SxOld + 1e-10); %SLx = log(1/(N+1)*sum(xRatio)) - 1/(N+1)*sum(log(xRatio)); %SLxV(kk) = SLx;% freqSm = 0.9;% Sx = filter(freqSm, [1 -(1-freqSm)], Sx);% Sx(end:1) = filter(freqSm, [1 -(1-freqSm)], Sx(end:1));% Se = filter(freqSm, [1 -(1-freqSm)], Se);% Se(end:1) = filter(freqSm, [1 -(1-freqSm)], Se(end:1));% Sd = filter(freqSm, [1 -(1-freqSm)], Sd);% Sd(end:1) = filter(freqSm, [1 -(1-freqSm)], Sd(end:1)); %SeFast = ef.*conj(ef); %SdFast = df.*conj(df); %SxFast = xf.*conj(xf); %cohedFast = 0.9*cohedFast + 0.1*SeFast ./ (SdFast + 1e-10); %cohedFast(find(cohedFast > 1)) = 1; %cohedFast(find(cohedFast > 1)) = 1 ./ cohedFast(find(cohedFast>1)); %cohedFastAvg(kk) = mean(cohedFast(echoBandRange)); %cohedFastAvg(kk) = min(cohedFast); %cohxdFast = 0.8*cohxdFast + 0.2*log(SdFast ./ (SxFast + 1e-10)); %cohxdFastAvg(kk) = mean(cohxdFast(echoBandRange)); % coherence Sxd = gamma*Sxd + (1 - gamma)*xf.*conj(df); Sed = gamma*Sed + (1-gamma)*ef.*conj(df);% Sxd = filter(freqSm, [1 -(1-freqSm)], Sxd);% Sxd(end:1) = filter(freqSm, [1 -(1-freqSm)], Sxd(end:1));% Sed = filter(freqSm, [1 -(1-freqSm)], Sed);% Sed(end:1) = filter(freqSm, [1 -(1-freqSm)], Sed(end:1)); cohed = real(Sed.*conj(Sed))./(Se.*Sd + 1e-10); cohedAvg(kk) = mean(cohed(echoBandRange)); %cohedAvg(kk) = cohed(6); %cohedAvg(kk) = min(cohed); cohxd = real(Sxd.*conj(Sxd))./(Sx.*Sd + 1e-10); freqSm = 0.6; cohxd(2:end) = filter(freqSm, [1 -(1-freqSm)], cohxd(2:end)); cohxd(end:2) = filter(freqSm, [1 -(1-freqSm)], cohxd(end:2)); cohxdAvg(kk) = mean(cohxd(echoBandRange)); %cohxdAvg(kk) = (cohxd(32)); %cohxdAvg(kk) = max(cohxd); %xf = xfm(:,dIdx); %SxBad = gamma*SxBad + (1 - gamma)*real(xf.*conj(xf)); %SxdBad = gamma*SxdBad + (1 - gamma)*xf.*conj(df); %cohxdBad = real(SxdBad.*conj(SxdBad))./(SxBad.*Sd + 0.01); %cohxdAvgBad(kk) = mean(cohxdBad); %for j=1:N+1 % mutInf(j) = 0.9*mutInf(j) + 0.1*information(abs(xfm(j,:)), abs(dfm(j,:))); %end %mutInfAvg(kk) = mean(mutInf); %hnled = cohedFast; %xIdx = find(cohxd > 1 - cohed); %hnled(xIdx) = 1 - cohxd(xIdx); %hnled = 1 - max(cohxd, 1-cohedFast); hnled = min(1 - cohxd, cohed); %hnled = 1 - cohxd; %hnled = max(1 - (cohxd + (1-cohedFast)), 0); %hnled = 1 - max(cohxd, 1-cohed); if kk > 1 cohxdSlow(kk,:) = 0.99*cohxdSlow(kk-1,:) + 0.01*cohxd'; cohedSlow(kk,:) = 0.99*cohedSlow(kk-1,:) + 0.01*(1-cohed)'; end if 0 %if kk > 50 %idx = find(hnled > 0.3); hnlMax = hnlMax*0.9999; %hnlMax(idx) = max(hnlMax(idx), hnled(idx)); hnlMax = max(hnlMax, hnled); %overdrive(idx) = max(log(hnlMax(idx))/log(0.99), 1); avgHnl = mean(hnlMax(echoBandRange)); if avgHnl > 0.3 overdrive = max(log(avgHnl)/log(0.99), 1); end weight(4:end) = max(hnlMax) - hnlMax(4:end); end %[hg, gidx] = max(hnled); %fnrg = Sx(gidx) / (Sd(gidx) + 1e-10); %[tmp, bidx] = find((Sx / Sd + 1e-10) > fnrg); %hnled(bidx) = hg; %cohed1 = mean(cohed(cohRange)); % range depends on bandwidth %cohed1 = cohed1^2; %echoBands(kk) = length(find(cohed(echoBandRange) < 0.25))/length(echoBandRange); %if (fbSupp == 0) % if (echoBands(kk) > 0.8) % fbSupp = 1; % end %else % if (echoBands(kk) < 0.6) % fbSupp = 0; % end %end %overdrive(kk) = 7.5*echoBands(kk) + 0.5;% Factor by which to weight other bands%if (cohed1 < 0.1)% w = 0.8 - cohed1*10*0.4;%else% w = 0.4;%end% Weight coherence subbands%hnled = w*cohed1 + (1 - w)*cohed;%hnled = (hnled).^2;%cohed(floor(N/2):end) = cohed(floor(N/2):end).^2; %if fbSupp == 1 % cohed = zeros(size(cohed)); %end %cohed = cohed.^overdrive(kk); %hnled = gamma*hnled + (1 - gamma)*cohed;% Additional hf suppression%hnledp = [hnledp ; mean(hnled)];%hnled(floor(N/2):end) = hnled(floor(N/2):end).^2;%ef = ef.*((weight*(min(1 - hnled)).^2 + (1 - weight).*(1 - hnled)).^2); cohedMean = mean(cohed(echoBandRange)); %aggrFact = 4*(1-mean(hnled(echoBandRange))) + 1; %[hnlSort, hnlSortIdx] = sort(hnled(echoBandRange)); [hnlSort, hnlSortIdx] = sort(1-cohxd(echoBandRange)); [xSort, xSortIdx] = sort(Sx); %aggrFact = (1-mean(hnled(echoBandRange))); %hnlSortQ = hnlSort(qIdx); hnlSortQ = mean(1 - cohxd(echoBandRange)); %hnlSortQ = mean(1 - cohxd); [hnlSort2, hnlSortIdx2] = sort(hnled(echoBandRange)); %[hnlSort2, hnlSortIdx2] = sort(hnled); hnlQuant = 0.75; hnlQuantLow = 0.5; qIdx = floor(hnlQuant*length(hnlSort2)); qIdxLow = floor(hnlQuantLow*length(hnlSort2)); hnlPrefAvg = hnlSort2(qIdx); hnlPrefAvgLow = hnlSort2(qIdxLow); %hnlPrefAvgLow = mean(hnled); %hnlPrefAvg = max(hnlSort2); %hnlPrefAvgLow = min(hnlSort2); %hnlPref = hnled(echoBandRange); %hnlPrefAvg = mean(hnlPref(xSortIdx((0.5*length(xSortIdx)):end))); %hnlPrefAvg = min(hnlPrefAvg, hnlSortQ); %hnlSortQIdx = hnlSortIdx(qIdx); %SeQ = Se(qIdx + echoBandRange(1) - 1); %SdQ = Sd(qIdx + echoBandRange(1) - 1); %SeQ = Se(qIdxLow + echoBandRange(1) - 1); %SdQ = Sd(qIdxLow + echoBandRange(1) - 1); %propLow = length(find(hnlSort < 0.1))/length(hnlSort); %aggrFact = min((1 - hnlSortQ)/2, 0.5); %aggrTerm = 1/aggrFact; %hnlg = mean(hnled(echoBandRange)); %hnlg = hnlSortQ; %if suppState == 0 % if hnlg < 0.05 % suppState = 2; % transCtr = 0; % elseif hnlg < 0.75 % suppState = 1; % transCtr = 0; % end %elseif suppState == 1 % if hnlg > 0.8 % suppState = 0; % transCtr = 0; % elseif hnlg < 0.05 % suppState = 2; % transCtr = 0; % end %else % if hnlg > 0.8 % suppState = 0; % transCtr = 0; % elseif hnlg > 0.25 % suppState = 1; % transCtr = 0; % end %end %if kk > 50 if cohedMean > 0.98 & hnlSortQ > 0.9 %if suppState == 1 % hnled = 0.5*hnled + 0.5*cohed; % %hnlSortQ = 0.5*hnlSortQ + 0.5*cohedMean; % hnlPrefAvg = 0.5*hnlPrefAvg + 0.5*cohedMean; %else % hnled = cohed; % %hnlSortQ = cohedMean; % hnlPrefAvg = cohedMean; %end suppState = 0; elseif cohedMean < 0.95 | hnlSortQ < 0.8 %if suppState == 0 % hnled = 0.5*hnled + 0.5*cohed; % %hnlSortQ = 0.5*hnlSortQ + 0.5*cohedMean; % hnlPrefAvg = 0.5*hnlPrefAvg + 0.5*cohedMean; %end suppState = 1; end if hnlSortQ < cohxdLocalMin & hnlSortQ < 0.75 cohxdLocalMin = hnlSortQ; end if cohxdLocalMin == 1 ovrd = 3; hnled = 1-cohxd; hnlPrefAvg = hnlSortQ; hnlPrefAvgLow = hnlSortQ; end if suppState == 0 hnled = cohed; hnlPrefAvg = cohedMean; hnlPrefAvgLow = cohedMean; end %if hnlPrefAvg < hnlLocalMin & hnlPrefAvg < 0.6 if hnlPrefAvgLow < hnlLocalMin & hnlPrefAvgLow < 0.6 %hnlLocalMin = hnlPrefAvg; %hnlMin = hnlPrefAvg; hnlLocalMin = hnlPrefAvgLow; hnlMin = hnlPrefAvgLow; hnlNewMin = 1; hnlMinCtr = 0; if hnlMinCtr == 0 hnlMinCtr = hnlMinCtr + 1; else hnlMinCtr = 0; hnlMin = hnlLocalMin; SeLocalMin = SeQ; SdLocalMin = SdQ; SeLocalAvg = 0; minCtr = 0; ovrd = max(log(0.0001)/log(hnlMin), 2); divergeFact = hnlLocalMin; end end if hnlNewMin == 1 hnlMinCtr = hnlMinCtr + 1; end if hnlMinCtr == 2 hnlNewMin = 0; hnlMinCtr = 0; %ovrd = max(log(0.0001)/log(hnlMin), 2);% ovrd = max(log(0.00001)/(log(hnlMin + 1e-10) + 1e-10), 3); ovrd = max(log(0.00000001)/(log(hnlMin + 1e-10) + 1e-10), 5); %ovrd = max(log(0.0001)/log(hnlPrefAvg), 2); %ovrd = max(log(0.001)/log(hnlMin), 2); end hnlLocalMin = min(hnlLocalMin + 0.0008/mult, 1); cohxdLocalMin = min(cohxdLocalMin + 0.0004/mult, 1); %divergeFact = hnlSortQ; %if minCtr > 0 & hnlLocalMin < 1 % hnlMin = hnlLocalMin; % %SeMin = 0.9*SeMin + 0.1*sqrt(SeLocalMin); % SdMin = sqrt(SdLocalMin); % %SeMin = sqrt(SeLocalMin)*hnlSortQ; % SeMin = sqrt(SeLocalMin); % %ovrd = log(100/SeMin)/log(hnlSortQ); % %ovrd = log(100/SeMin)/log(hnlSortQ); % ovrd = log(0.01)/log(hnlMin); % ovrd = max(ovrd, 2); % ovrdPos = hnlSortQIdx; % %ovrd = max(ovrd, 1); % %SeMin = sqrt(SeLocalAvg/5); % minCtr = 0; %else % %SeLocalMin = 0.9*SeLocalMin +0.1*SeQ; % SeLocalAvg = SeLocalAvg + SeQ; % minCtr = minCtr + 1; %end if ovrd < ovrdSm ovrdSm = 0.99*ovrdSm + 0.01*ovrd; else ovrdSm = 0.9*ovrdSm + 0.1*ovrd; end %end% ekEn = sum(real(ekfb.^2));% dkEn = sum(real(dk.^2)); ekEn = sum(Se); dkEn = sum(Sd); if divergeState == 0 if ekEn > dkEn ef = df; divergeState = 1; %hnlPrefAvg = hnlSortQ; %hnled = (1 - cohxd); end else %if ekEn*1.1 < dkEn %if ekEn*1.26 < dkEn if ekEn*1.05 < dkEn divergeState = 0; else ef = df; end end if ekEn > dkEn*19.95 WFb=zeros(N+1,M); % Block-based FD NLMS end ekEnV(kk) = ekEn; dkEnV(kk) = dkEn; hnlLocalMinV(kk) = hnlLocalMin; cohxdLocalMinV(kk) = cohxdLocalMin; hnlMinV(kk) = hnlMin; %cohxdMaxLocal = max(cohxdSlow(kk,:)); %if kk > 50 %cohxdMaxLocal = 1-hnlSortQ; %if cohxdMaxLocal > 0.5 % %if cohxdMaxLocal > cohxdMax % odScale = max(log(cohxdMaxLocal)/log(0.95), 1); % %overdrive(7:end) = max(log(cohxdSlow(kk,7:end))/log(0.9), 1); % cohxdMax = cohxdMaxLocal; % end %end %end %cohxdMax = cohxdMax*0.999; %overdriveM(kk,:) = max(overdrive, 1); %aggrFact = 0.25; aggrFact = 0.3; %aggrFact = 0.5*propLow; %if fs == 8000 % wCurve = [0 ; 0 ; aggrFact*sqrt(linspace(0,1,N-1))' + 0.1]; %else % wCurve = [0; 0; 0; aggrFact*sqrt(linspace(0,1,N-2))' + 0.1]; %end wCurve = [0; aggrFact*sqrt(linspace(0,1,N))' + 0.1]; % For sync with C %if fs == 8000 % wCurve = wCurve(2:end); %else % wCurve = wCurve(1:end-1); %end %weight = aggrFact*(sqrt(linspace(0,1,N+1)')); %weight = aggrFact*wCurve; weight = wCurve; %weight = aggrFact*ones(N+1,1); %weight = zeros(N+1,1); %hnled = weight.*min(hnled) + (1 - weight).*hnled; %hnled = weight.*min(mean(hnled(echoBandRange)), hnled) + (1 - weight).*hnled; %hnled = weight.*min(hnlSortQ, hnled) + (1 - weight).*hnled; %hnlSortQV(kk) = mean(hnled); %hnlPrefAvgV(kk) = mean(hnled(echoBandRange)); hnled = weight.*min(hnlPrefAvg, hnled) + (1 - weight).*hnled; %od = aggrFact*(sqrt(linspace(0,1,N+1)') + aggrTerm); %od = 4*(sqrt(linspace(0,1,N+1)') + 1/4); %ovrdFact = (ovrdSm - 1) / sqrt(ovrdPos/(N+1)); %ovrdFact = ovrdSm / sqrt(echoBandRange(floor(length(echoBandRange)/2))/(N+1)); %od = ovrdFact*sqrt(linspace(0,1,N+1))' + 1; %od = ovrdSm*ones(N+1,1).*abs(WFb(:,dIdx))/(max(abs(WFb(:,dIdx)))+1e-10); %od = ovrdSm*ones(N+1,1); %od = ovrdSm*WFbD.*(sqrt(linspace(0,1,N+1))' + 1); od = ovrdSm*(sqrt(linspace(0,1,N+1))' + 1); %od = 4*(sqrt(linspace(0,1,N+1))' + 1); %od = 2*ones(N+1,1); %od = 2*ones(N+1,1); %sshift = ((1-hnled)*2-1).^3+1; sshift = ones(N+1,1); hnled = hnled.^(od.*sshift); %if hnlg > 0.75 %if (suppState ~= 0) % transCtr = 0; %end % suppState = 0; %elseif hnlg < 0.6 & hnlg > 0.2 % suppState = 1; %elseif hnlg < 0.1 %hnled = zeros(N+1, 1); %if (suppState ~= 2) % transCtr = 0; %end % suppState = 2; %else % if (suppState ~= 2) % transCtr = 0; % end % suppState = 2; %end %if suppState == 0 % hnled = ones(N+1, 1); %elseif suppState == 2 % hnled = zeros(N+1, 1); %end %hnled(find(hnled < 0.1)) = 0; %hnled = hnled.^2; %if transCtr < 5 %hnl = 0.75*hnl + 0.25*hnled; % transCtr = transCtr + 1; %else hnl = hnled; %end %hnled(find(hnled < 0.05)) = 0; ef = ef.*(hnl); %ef = ef.*(min(1 - cohxd, cohed).^2); %ef = ef.*((1-cohxd).^2); ovrdV(kk) = ovrdSm; %ovrdV(kk) = dIdx; %ovrdV(kk) = divergeFact; %hnledAvg(kk) = 1-mean(1-cohedFast(echoBandRange)); hnledAvg(kk) = 1-mean(1-cohed(echoBandRange)); hnlxdAvg(kk) = 1-mean(cohxd(echoBandRange)); %hnlxdAvg(kk) = cohxd(5); %hnlSortQV(kk) = mean(hnled); hnlSortQV(kk) = hnlPrefAvgLow; hnlPrefAvgV(kk) = hnlPrefAvg; %hnlAvg(kk) = propLow; %ef(N/2:end) = 0; %ner = (sum(Sd) ./ (sum(Se.*(hnl.^2)) + 1e-10)); % Comfort noise if (CNon) snn=sqrt(Sym); snn(1)=0; % Reject LF noise Un=snn.*exp(j*2*pi.*[0;rand(N-1,1);0]); % Weight comfort noise by suppression Un = sqrt(1-hnled.^2).*Un; Fmix = ef + Un; else Fmix = ef; end % Overlap and add in time domain for smoothness tmp = [Fmix ; flipud(conj(Fmix(2:N)))]; mixw = wins.*real(ifft(tmp)); mola = mbuf(end-N+1:end) + mixw(1:N); mbuf = mixw; ercn(pos:pos+N-1) = mola;%%%%%-------------you can hear the effect by sound(10*ercn,16000),add by Shichaog end % NLPon % Filter update % Ek2 = Ek ./(12*pn + 0.001); % Normalized error % Ek2 = Ek2 * divergeFact; Ek2 = Ek ./(pn + 0.001); % Normalized error %Ek2 = Ek ./(100*pn + 0.001); % Normalized error %divergeIdx = find(abs(Ek) > abs(DD)); %divergeIdx = find(Se > Sd); %threshMod = threshold*ones(N+1,1); %if length(divergeIdx) > 0 %if sum(abs(Ek)) > sum(abs(DD)) %WFb(divergeIdx,:) = WFb(divergeIdx,:) .* repmat(sqrt(Sd(divergeIdx)./(Se(divergeIdx)+1e-10))),1,M); %Ek2(divergeIdx) = Ek2(divergeIdx) .* sqrt(Sd(divergeIdx)./(Se(divergeIdx)+1e-10)); %Ek2(divergeIdx) = Ek2(divergeIdx) .* abs(DD(divergeIdx))./(abs(Ek(divergeIdx))+1e-10); %WFb(divergeIdx,:) = WFbOld(divergeIdx,:); %WFb = WFbOld; %threshMod(divergeIdx) = threshMod(divergeIdx) .* abs(DD(divergeIdx))./(abs(Ek(divergeIdx))+1e-10); % threshMod(divergeIdx) = threshMod(divergeIdx) .* sqrt(Sd(divergeIdx)./(Se(divergeIdx)+1e-10)); %end%absEf = max(abs(Ek2), threshold);%absEf = ones(N+1,1)*threshold./absEf;%absEf = max(abs(Ek2), threshMod);%absEf = threshMod./absEf;%Ek2 = Ek2.*absEf; %if sum(Se) <= sum(Sd) % mEk = mufb.*Ek2; % PP = conj(XFm).*(ones(M,1) * mEk')'; % tmp = [PP ; flipud(conj(PP(2:N,:)))]; % IFPP = real(ifft(tmp)); % PH = IFPP(1:N,:); % tmp = fft([PH;zeros(N,M)]); % FPH = tmp(1:N+1,:); % %WFbOld = WFb; % WFb = WFb + FPH; %else % WF = WFbOld; %end% Shift old FFTs XFm(:,2:end) = XFm(:,1:end-1); YFm(:,2:end) = YFm(:,1:end-1); xfwm(:,2:end) = xfwm(:,1:end-1); dfm(:,2:end) = dfm(:,1:end-1);%if mod(kk, floor(Nb/50)) == 0 % fprintf(1, '.');%endif mod(kk, floor(Nb/100)) == 0%if mod(kk, floor(Nb/500)) == 0 %progressbar(kk/Nb); %figure(5) %plot(abs(WFb)); %legend('1','2','3','4','5','6','7','8','9','10','11','12'); %title(kk*N/fs); %figure(6) %plot(WFbD); %figure(6) %plot(threshMod) %if length(divergeIdx) > 0 % plot(abs(DD)) % hold on % plot(abs(Ek), 'r') % hold off %plot(min(sqrt(Sd./(Se+1e-10)),1)) %axis([0 N 0 1]); %end %figure(6) %plot(cohedFast); %axis([1 N+1 0 1]); %plot(WFbEn); %figure(7) %plot(weight); %plot([cohxd 1-cohed]); %plot([cohxd 1-cohed 1-cohedFast hnled]); %plot([cohxd cohxdFast/max(cohxdFast)]); %legend('cohxd', '1-cohed', '1-cohedFast'); %axis([1 65 0 1]); %pause(0.5); %overdrive endend%progressbar(1);%figure(2);%plot([feat(:,1) feat(:,2)+1 feat(:,3)+2 mfeat+3]);%plot([feat(:,1) mfeat+1]);%figure(3);%plot(10*log10([dri erifb erifb3 ericn]));%legend('Near-end','Error','Post NLP','Final',4);% Compensate for delay%ercn=[ercn(N+1:end);zeros(N,1)];%ercn_=[ercn_(N+1:end);zeros(N,1)];%figure(11);%plot(cohxdSlow);%figure(12);%surf(cohxdSlow);%shading interp;%figure(13);%plot(overdriveM);%figure(14);%surf(overdriveM);%shading interp;figure(10);t = (0:Nb)*N/fs;rrinSubSamp = rrin(N*(1:(Nb+1)));plot(t, rrinSubSamp/max(abs(rrinSubSamp)),'b');hold onplot(t, hnledAvg, 'r');plot(t, hnlxdAvg, 'g');plot(t, hnlSortQV, 'y');plot(t, hnlLocalMinV, 'k');plot(t, cohxdLocalMinV, 'c');plot(t, hnlPrefAvgV, 'm');%plot(t, cohxdAvg, 'r');%plot(cohxdFastAvg, 'r');%plot(cohxdAvgBad, 'k');%plot(t, cohedAvg, 'k');%plot(t, 1-cohedFastAvg, 'k');%plot(ssin(N*(1:floor(length(ssin)/N)))/max(abs(ssin)));%plot(echoBands,'r');%plot(overdrive, 'g');%plot(erfb(N*(1:floor(length(erfb)/N)))/max(abs(erfb)));hold off%tight x;% figure(11)% plot(t, ovrdV);%tightx;%plot(mfeat,'r');%plot(1-cohxyp_,'r');%plot(Hnlxydp,'y');%plot(hnledp,'k');%plot(Hnlxydp, 'c');%plot(ccohpd_,'k');%plot(supplot_, 'g');%plot(ones(length(mfeat),1)*rr1_, 'k');%plot(ones(length(mfeat),1)*rr2_, 'k');%plot(N*(1:length(feat)), feat);%plot(Sep_,'r');%axis([1 floor(length(erfb)/N) -1 1])%hold off%plot(10*log10([Se_, Sx_, Seu_, real(sf_.*conj(sf_))]));%legend('Se','Sx','Seu','S');%figure(5)%plot([ercn ercn_]);% figure(12)% plot(t, dIdxV);%plot(t, SLxV);%tightx;%figure(13)%plot(t, [ekEnV dkEnV]);%plot(t, dkEnV./(ekEnV+1e-10));%tightx;%close(hh);%spclab(fs,ssin,erfb,ercn,'outxd.pcm');%spclab(fs,rrin,ssin,erfb,1.78*ercn,'vqeOut-1.pcm');%spclab(fs,erfb,'aecOutLp.pcm');%spclab(fs,rrin,ssin,erfb,1.78*ercn,'aecOut25.pcm','vqeOut-1.pcm');%spclab(fs,rrin,ssin,erfb,ercn,'aecOut-mba.pcm');%spclab(fs,rrin,ssin,erfb,ercn,'aecOut.pcm');%spclab(fs, ssin, erfb, ercn, 'out0.pcm');
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 591
- 592
- 593
- 594
- 595
- 596
- 597
- 598
- 599
- 600
- 601
- 602
- 603
- 604
- 605
- 606
- 607
- 608
- 609
- 610
- 611
- 612
- 613
- 614
- 615
- 616
- 617
- 618
- 619
- 620
- 621
- 622
- 623
- 624
- 625
- 626
- 627
- 628
- 629
- 630
- 631
- 632
- 633
- 634
- 635
- 636
- 637
- 638
- 639
- 640
- 641
- 642
- 643
- 644
- 645
- 646
- 647
- 648
- 649
- 650
- 651
- 652
- 653
- 654
- 655
- 656
- 657
- 658
- 659
- 660
- 661
- 662
- 663
- 664
- 665
- 666
- 667
- 668
- 669
- 670
- 671
- 672
- 673
- 674
- 675
- 676
- 677
- 678
- 679
- 680
- 681
- 682
- 683
- 684
- 685
- 686
- 687
- 688
- 689
- 690
- 691
- 692
- 693
- 694
- 695
- 696
- 697
- 698
- 699
- 700
- 701
- 702
- 703
- 704
- 705
- 706
- 707
- 708
- 709
- 710
- 711
- 712
- 713
- 714
- 715
- 716
- 717
- 718
- 719
- 720
- 721
- 722
- 723
- 724
- 725
- 726
- 727
- 728
- 729
- 730
- 731
- 732
- 733
- 734
- 735
- 736
- 737
- 738
- 739
- 740
- 741
- 742
- 743
- 744
- 745
- 746
- 747
- 748
- 749
- 750
- 751
- 752
- 753
- 754
- 755
- 756
- 757
- 758
- 759
- 760
- 761
- 762
- 763
- 764
- 765
- 766
- 767
- 768
- 769
- 770
- 771
- 772
- 773
- 774
- 775
- 776
- 777
- 778
- 779
- 780
- 781
- 782
- 783
- 784
- 785
- 786
- 787
- 788
- 789
- 790
- 791
- 792
- 793
- 794
- 795
- 796
- 797
- 798
- 799
- 800
- 801
- 802
- 803
- 804
- 805
- 806
- 807
- 808
- 809
- 810
- 811
- 812
- 813
- 814
- 815
- 816
- 817
- 818
- 819
- 820
- 821
- 822
- 823
- 824
- 825
- 826
- 827
- 828
- 829
- 830
- 831
- 832
- 833
- 834
- 835
- 836
- 837
- 838
- 839
- 840
- 841
- 842
- 843
- 844
- 845
- 846
- 847
- 848
- 849
- 850
- 851
- 852
- 853
- 854
- 855
- 856
- 857
- 858
- 859
- 860
- 861
- 862
- 863
- 864
- 865
- 866
- 867
- 868
- 869
- 870
- 871
- 872
- 873
- 874
- 875
- 876
- 877
- 878
- 879
- 880
- 881
- 882
- 883
- 884
- 885
- 886
- 887
- 888
- 889
- 890
- 891
- 892
- 893
- 894
- 895
- 896
- 897
- 898
- 899
- 900
- 901
- 902
- 903
- 904
- 905
- 906
- 907
- 908
- 909
- 910
- 911
- 912
- 913
- 914
- 915
- 916
- 917
- 918
- 919
- 920
- 921
- 922
- 923
- 924
- 925
- 926
- 927
- 928
- 929
- 930
speex AEC算法
和WebRTC一样也是采用频域分块自适应滤波方法,不同的是权重调整的方式变化,我这边测试效果是计算量比WebRTC的大,且效果调节的没有WebRTC的好。这里也给出speex的源代码和测试方法。
% Copyright (C) 2012 Waves Audio LTD% Copyright (C) 2003-2008 Jean-Marc Valin%% File: speex_mdf.m% Echo canceller based on the MDF algorithm (see below)% % Redistribution and use in source and binary forms, with or without% modification, are permitted provided that the following conditions are% met:% % 1. Redistributions of source code must retain the above copyright notice,% this list of conditions and the following disclaimer.% % 2. Redistributions in binary form must reproduce the above copyright% notice, this list of conditions and the following disclaimer in the% documentation and/or other materials provided with the distribution.% % 3. The name of the author may not be used to endorse or promote products% derived from this software without specific prior written permission.% % THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR% IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES% OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE% DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,% INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES% (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR% SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)% HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,% STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN% ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE% POSSIBILITY OF SUCH DAMAGE.%% Notes from original mdf.c:%% The echo canceller is based on the MDF algorithm described in:% % J. S. Soo, K. K. Pang Multidelay block frequency adaptive filter, % IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-38, No. 2, % February 1990.% % We use the Alternatively Updated MDF (AUMDF) variant. Robustness to % double-talk is achieved using a variable learning rate as described in:% % Valin, J.-M., On Adjusting the Learning Rate in Frequency Domain Echo % Cancellation With Double-Talk. IEEE Transactions on Audio,% Speech and Language Processing, Vol. 15, No. 3, pp. 1030-1034, 2007.% http://people.xiph.org/~jm/papers/valin_taslp2006.pdf% % There is no explicit double-talk detection, but a continuous variation% in the learning rate based on residual echo, double-talk and background% noise.% % Another kludge that seems to work good: when performing the weight% update, we only move half the way toward the "goal" this seems to% reduce the effect of quantization noise in the update phase. This% can be seen as applying a gradient descent on a "soft constraint"% instead of having a hard constraint.% % Notes for this file:%% Usage: %% speex_mdf_out = speex_mdf(Fs, u, d, filter_length, frame_size, dbg_var_name);% % Fs sample rate% u speaker signal, column vector in range [-1; 1]% d microphone signal, column vector in range [-1; 1]% filter_length typically 250ms, i.e. 4096 @ 16k FS % must be a power of 2% frame_size typically 8ms, i.e. 128 @ 16k Fs % must be a power of 2% dbg_var_name internal state variable name to trace. % Default: 'st.leak_estimate'.%% Jonathan Rouach <jonr@waves.com>% function speex_mdf_out = speex_mdf(Fs, u, d, filter_length, frame_size, dbg_var_name)fprintf('Starting Speex MDF (PBFDAF) algorithm.\n');st = speex_echo_state_init_mc_mdf(frame_size, filter_length, 1, 1, Fs);% which variable to traceif nargin<6 dbg_var_name = 'st.leak_estimate';enddbg = init_dbg(st, length(u));[e, dbg] = main_loop(st, float_to_short(u), float_to_short(d), dbg);speex_mdf_out.e = e/32768.0;speex_mdf_out.var1 = dbg.var1; function x = float_to_short(x) x = x*32768.0; x(x< -32767.5) = -32768; x(x> 32766.5) = 32767; x = floor(0.5+x); end function [e, dbg] = main_loop(st, u, d, dbg) e = zeros(size(u)); y = zeros(size(u)); % prepare waitbar try h_wb = waitbar(0, 'Processing...'); catch; end end_point = length(u); for n = 1:st.frame_size:end_point nStep = floor(n/st.frame_size)+1; if mod(nStep, 128)==0 && update_waitbar_check_wasclosed(h_wb, n, end_point, st.sampling_rate) break; end u_frame = u(n:n+st.frame_size-1); d_frame = d(n:n+st.frame_size-1); [out, st] = speex_echo_cancellation_mdf(st, d_frame, u_frame); e(n:n+st.frame_size-1) = out*2; y(n:n+st.frame_size-1) = d_frame - out; dbg.var1(:, nStep) = reshape( eval(dbg_var_name), numel(eval(dbg_var_name)), 1); end try close(h_wb); catch; end end function st = speex_echo_state_init_mc_mdf(frame_size, filter_length, nb_mic, nb_speakers, sample_rate) st.K = nb_speakers; st.C = nb_mic; C=st.C; K=st.K; st.frame_size = frame_size; st.window_size = 2*frame_size; N = st.window_size; st.M = fix((filter_length+st.frame_size-1)/frame_size); M = st.M; st.cancel_count=0; st.sum_adapt = 0; st.saturated = 0; st.screwed_up = 0; % /* This is the default sampling rate */ st.sampling_rate = sample_rate; st.spec_average = (st.frame_size)/( st.sampling_rate); st.beta0 = (2.0*st.frame_size)/st.sampling_rate; st.beta_max = (.5*st.frame_size)/st.sampling_rate; st.leak_estimate = 0; st.e = zeros(N, C); st.x = zeros(N, K); st.input = zeros(st.frame_size, C); st.y = zeros(N, C); st.last_y = zeros(N, C); st.Yf = zeros(st.frame_size+1, 1); st.Rf = zeros(st.frame_size+1, 1); st.Xf = zeros(st.frame_size+1, 1); st.Yh = zeros(st.frame_size+1, 1); st.Eh = zeros(st.frame_size+1, 1); st.X = zeros(N, K, M+1); st.Y = zeros(N, C); st.E = zeros(N, C); st.W = zeros(N, K, M, C); st.foreground = zeros(N, K, M, C); st.PHI = zeros(frame_size+1, 1); st.power = zeros(frame_size+1, 1); st.power_1 = ones((frame_size+1), 1); st.window = zeros(N, 1); st.prop = zeros(M, 1); st.wtmp = zeros(N, 1); st.window = .5-.5*cos(2*pi*((1:N)'-1)/N); % /* Ratio of ~10 between adaptation rate of first and last block */ decay = exp(-1/M); st.prop(1, 1) = .7; for i=2:M st.prop(i, 1) = st.prop(i-1, 1) * decay; end st.prop = (.8 * st.prop)./sum(st.prop); st.memX = zeros(K, 1); st.memD = zeros(C, 1); st.memE = zeros(C, 1); st.preemph = .98; if (st.sampling_rate<12000) st.notch_radius = .9; elseif (st.sampling_rate<24000) st.notch_radius = .982; else st.notch_radius = .992; end st.notch_mem = zeros(2*C, 1); st.adapted = 0; st.Pey = 1; st.Pyy = 1; st.Davg1 = 0; st.Davg2 = 0; st.Dvar1 = 0; st.Dvar2 = 0; end function dbg = init_dbg(st, len) dbg.var1 = zeros(numel(eval(dbg_var_name)), fix(len/st.frame_size)); end function [out, st] = speex_echo_cancellation_mdf(st, in, far_end) N = st.window_size; M = st.M; C = st.C; K = st.K; Pey_cur = 1; Pyy_cur = 1; out = zeros(st.frame_size, C); st.cancel_count = st.cancel_count + 1; %ss=.35/M; ss = 0.5/M; ss_1 = 1-ss; for chan = 1:C % Apply a notch filter to make sure DC doesn't end up causing problems [st.input(:, chan), st.notch_mem(:, chan)] = filter_dc_notch16(in(:, chan), st.notch_radius, st.frame_size, st.notch_mem(:, chan)); % Copy input data to buffer and apply pre-emphasis for i=1:st.frame_size tmp32 = st.input(i, chan)- (st.preemph* st.memD(chan)); st.memD(chan) = st.input(i, chan); st.input(i, chan) = tmp32; end end for speak = 1:K for i =1:st.frame_size st.x(i, speak) = st.x(i+st.frame_size, speak); tmp32 = far_end(i, speak) - st.preemph * st.memX(speak); st.x(i+st.frame_size, speak) = tmp32; st.memX(speak) = far_end(i, speak); end end % Shift memory st.X = circshift(st.X, [0, 0, 1]); for speak = 1:K % Convert x (echo input) to frequency domain % MATLAB_MATCH: we divide by N to get values as in speex st.X(:, speak, 1) = fft(st.x(:, speak)) /N; end Sxx = 0; for speak = 1:K Sxx = Sxx + sum(st.x(st.frame_size+1:end, speak).^2); st.Xf = abs(st.X(1:st.frame_size+1, speak, 1)).^2; end Sff = 0; for chan = 1:C % Compute foreground filter st.Y(:, chan) = 0; for speak=1:K for j=1:M st.Y(:, chan) = st.Y(:, chan) + st.X(:, speak, j) .* st.foreground(:, speak, j, chan); end end % MATLAB_MATCH: we multiply by N to get values as in speex st.e(:, chan) = ifft(st.Y(:, chan)) * N; st.e(1:st.frame_size, chan) = st.input(:, chan) - st.e(st.frame_size+1:end, chan); % st.e : [out foreground | leak foreground ] Sff = Sff + sum(abs(st.e(1:st.frame_size, chan)).^2); end % Adjust proportional adaption rate */ if (st.adapted) st.prop = mdf_adjust_prop (st.W, N, M, C, K); end % Compute weight gradient */ if (st.saturated == 0) for chan = 1:C for speak = 1:K for j=M:-1:1 st.PHI = [st.power_1; st.power_1(end-1:-1:2)] .* st.prop(j) .* conj(st.X(:, speak, (j+1))) .* st.E(:, chan); st.W(:, j) = st.W(:, j) + st.PHI; end end end else st.saturated = st.saturated -1; end %FIXME: MC conversion required */ % Update weight to prevent circular convolution (MDF / AUMDF) for chan = 1:C for speak = 1:K for j = 1:M % This is a variant of the Alternatively Updated MDF (AUMDF) */ % Remove the "if" to make this an MDF filter */ if (j==1 || mod(2+st.cancel_count,(M-1)) == j) st.wtmp = ifft(st.W(:, speak, j, chan)); st.wtmp(st.frame_size+1:N) = 0; st.W(:, speak, j, chan) = fft(st.wtmp); end end end end % So we can use power_spectrum_accum */ st.Yf = zeros(st.frame_size+1, 1); st.Rf = zeros(st.frame_size+1, 1); st.Xf = zeros(st.frame_size+1, 1); Dbf = 0; for chan = 1:C st.Y(:, chan) = 0; for speak=1:K for j=1:M st.Y(:, chan) = st.Y(:, chan) + st.X(:, speak, j) .* st.W(:, speak, j, chan); end end % MATLAB_MATCH: we multiply by N to get values as in speex st.y(:,chan) = ifft(st.Y(:,chan)) * N; % st.y : [ ~ | leak background ] end See = 0; % Difference in response, this is used to estimate the variance of our residual power estimate */ for chan = 1:C st.e(1:st.frame_size, chan) = st.e(st.frame_size+1:N, chan) - st.y(st.frame_size+1:N, chan); Dbf = Dbf + 10 + sum(abs(st.e(1:st.frame_size, chan)).^2); st.e(1:st.frame_size, chan) = st.input(:, chan) - st.y(st.frame_size+1:N, chan); % st.e : [ out background | leak foreground ] See = See + sum(abs(st.e(1:st.frame_size, chan)).^2); end % Logic for updating the foreground filter */ % For two time windows, compute the mean of the energy difference, as well as the variance */ VAR1_UPDATE = .5; VAR2_UPDATE = .25; VAR_BACKTRACK = 4; MIN_LEAK = .005; st.Davg1 = .6*st.Davg1 + .4*(Sff-See); st.Davg2 = .85*st.Davg2 + .15*(Sff-See); st.Dvar1 = .36*st.Dvar1 + .16*Sff*Dbf; st.Dvar2 = .7225*st.Dvar2 + .0225*Sff*Dbf; update_foreground = 0; % Check if we have a statistically significant reduction in the residual echo */ % Note that this is *not* Gaussian, so we need to be careful about the longer tail */ if (Sff-See)*abs(Sff-See) > (Sff*Dbf) update_foreground = 1; elseif (st.Davg1* abs(st.Davg1) > (VAR1_UPDATE*st.Dvar1)) update_foreground = 1; elseif (st.Davg2* abs(st.Davg2) > (VAR2_UPDATE*(st.Dvar2))) update_foreground = 1; end % Do we update? */ if (update_foreground) st.Davg1 = 0; st.Davg2 = 0; st.Dvar1 = 0; st.Dvar2 = 0; st.foreground = st.W; % Apply a smooth transition so as to not introduce blocking artifacts */ for chan = 1:C st.e(st.frame_size+1:N, chan) = (st.window(st.frame_size+1:N) .* st.e(st.frame_size+1:N, chan)) + (st.window(1:st.frame_size) .* st.y(st.frame_size+1:N, chan)); end else reset_background=0; % Otherwise, check if the background filter is significantly worse */ if (-(Sff-See)*abs(Sff-See)> VAR_BACKTRACK*(Sff*Dbf)) reset_background = 1; end if ((-st.Davg1 * abs(st.Davg1))> (VAR_BACKTRACK*st.Dvar1)) reset_background = 1; end if ((-st.Davg2* abs(st.Davg2))> (VAR_BACKTRACK*st.Dvar2)) reset_background = 1; end if (reset_background) % Copy foreground filter to background filter */ st.W = st.foreground; % We also need to copy the output so as to get correct adaptation */ for chan = 1:C st.y(st.frame_size+1:N, chan) = st.e(st.frame_size+1:N, chan); st.e(1:st.frame_size, chan) = st.input(:, chan) - st.y(st.frame_size+1:N, chan); end See = Sff; st.Davg1 = 0; st.Davg2 = 0; st.Dvar1 = 0; st.Dvar2 = 0; end end Sey = 0; Syy = 0; Sdd = 0; for chan = 1:C % Compute error signal (for the output with de-emphasis) */ for i=1:st.frame_size tmp_out = st.input(i, chan)- st.e(i+st.frame_size, chan); tmp_out = tmp_out + st.preemph * st.memE(chan); % This is an arbitrary test for saturation in the microphone signal */ if (in(i,chan) <= -32000 || in(i,chan) >= 32000) if (st.saturated == 0) st.saturated = 1; end end out(i, chan) = tmp_out; st.memE(chan) = tmp_out; end % Compute error signal (filter update version) */ st.e(st.frame_size+1:N, chan) = st.e(1:st.frame_size, chan); st.e(1:st.frame_size, chan) = 0; % st.e : [ zeros | out background ] % Compute a bunch of correlations */ % FIXME: bad merge */ Sey = Sey + sum(st.e(st.frame_size+1:N, chan) .* st.y(st.frame_size+1:N, chan)); Syy = Syy + sum(st.y(st.frame_size+1:N, chan).^2); Sdd = Sdd + sum(st.input.^2); % Convert error to frequency domain */ % MATLAB_MATCH: we divide by N to get values as in speex st.E = fft(st.e) / N; st.y(1:st.frame_size, chan) = 0; % MATLAB_MATCH: we divide by N to get values as in speex st.Y = fft(st.y) / N; % Compute power spectrum of echo (X), error (E) and filter response (Y) */ st.Rf = abs(st.E(1:st.frame_size+1,chan)).^2; st.Yf = abs(st.Y(1:st.frame_size+1,chan)).^2; end % Do some sanity check */ if (~(Syy>=0 && Sxx>=0 && See >= 0)) % Things have gone really bad */ st.screwed_up = st.screwed_up + 50; out = out*0; elseif Sff > Sdd+ N*10000 % AEC seems to add lots of echo instead of removing it, let's see if it will improve */ st.screwed_up = st.screwed_up + 1; else % Everything's fine */ st.screwed_up=0; end if (st.screwed_up>=50) disp('Screwed up, full reset'); st = speex_echo_state_reset_mdf(st); end % Add a small noise floor to make sure not to have problems when dividing */ See = max(See, N* 100); for speak = 1:K Sxx = Sxx + sum(st.x(st.frame_size+1:end, speak).^2); st.Xf = abs(st.X(1:st.frame_size+1, speak, 1)).^2; end % Smooth far end energy estimate over time */ st.power = ss_1*st.power+ 1 + ss*st.Xf; % Compute filtered spectra and (cross-)correlations */ Eh_cur = st.Rf - st.Eh; Yh_cur = st.Yf - st.Yh; Pey_cur = Pey_cur + sum(Eh_cur.*Yh_cur) ; Pyy_cur = Pyy_cur + sum(Yh_cur.^2); st.Eh = (1-st.spec_average)*st.Eh + st.spec_average*st.Rf; st.Yh = (1-st.spec_average)*st.Yh + st.spec_average*st.Yf; Pyy = sqrt(Pyy_cur); Pey = Pey_cur/Pyy; % Compute correlation updatete rate */ tmp32 = st.beta0*Syy; if (tmp32 > st.beta_max*See) tmp32 = st.beta_max*See; end alpha = tmp32/ See; alpha_1 = 1- alpha; % Update correlations (recursive average) */ st.Pey = alpha_1*st.Pey + alpha*Pey; st.Pyy = alpha_1*st.Pyy + alpha*Pyy; if st.Pyy<1 st.Pyy =1; end % We don't really hope to get better than 33 dB (MIN_LEAK-3dB) attenuation anyway */ if st.Pey< MIN_LEAK * st.Pyy st.Pey = MIN_LEAK * st.Pyy; end if (st.Pey> st.Pyy) st.Pey = st.Pyy; end % leak_estimate is the linear regression result */ st.leak_estimate = st.Pey/st.Pyy; % This looks like a stupid bug, but it's right (because we convert from Q14 to Q15) */ if (st.leak_estimate > 16383) st.leak_estimate = 32767; end % Compute Residual to Error Ratio */ RER = (.0001*Sxx + 3.*st.leak_estimate*Syy) / See; % Check for y in e (lower bound on RER) */ if (RER < Sey*Sey/(1+See*Syy)) RER = Sey*Sey/(1+See*Syy); end if (RER > .5) RER = .5; end % We consider that the filter has had minimal adaptation if the following is true*/ if (~st.adapted && st.sum_adapt > M && st.leak_estimate*Syy > .03*Syy) st.adapted = 1; end if (st.adapted) % Normal learning rate calculation once we're past the minimal adaptation phase */ for i=1:st.frame_size+1 % Compute frequency-domain adaptation mask */ r = st.leak_estimate*st.Yf(i); e = st.Rf(i)+1; if (r>.5*e) r = .5*e; end r = 0.7*r + 0.3*(RER*e); %st.power_1[i] = adapt_rate*r/(e*(1+st.power[i]));*/ st.power_1(i) = (r/(e*st.power(i)+10)); end else % Temporary adaption rate if filter is not yet adapted enough */ adapt_rate=0; if (Sxx > N* 1000) tmp32 = 0.25* Sxx; if (tmp32 > .25*See) tmp32 = .25*See; end adapt_rate = tmp32/ See; end st.power_1 = adapt_rate./(st.power+10); % How much have we adapted so far? */ st.sum_adapt = st.sum_adapt+adapt_rate; end % FIXME: MC conversion required */ st.last_y(1:st.frame_size) = st.last_y(st.frame_size+1:N); if (st.adapted) % If the filter is adapted, take the filtered echo */ st.last_y(st.frame_size+1:N) = in-out; end end function [out,mem] = filter_dc_notch16(in, radius, len, mem) out = zeros(size(in)); den2 = radius*radius + .7*(1-radius)*(1-radius); for i=1:len vin = in(i); vout = mem(1) + vin; mem(1) = mem(2) + 2*(-vin + radius*vout); mem(2) = vin - (den2*vout); out(i) = radius*vout; end end function prop = mdf_adjust_prop(W, N, M, C, K) prop = zeros(M,1); for i=1:M tmp = 1; for chan=1:C for speak=1:K tmp = tmp + sum(abs(W(1:N/2+1, K, i, C)).^2); end end prop(i) = sqrt(tmp); end max_sum = max(prop, 1); prop = prop + .1*max_sum; prop_sum = 1+sum(prop); prop = .99*prop / prop_sum; end % Resets echo canceller state */ function st = speex_echo_state_reset_mdf(st) st.cancel_count=0; st.screwed_up = 0; N = st.window_size; M = st.M; C=st.C; K=st.K; st.e = zeros(N, C); st.x = zeros(N, K); st.input = zeros(st.frame_size, C); st.y = zeros(N, C); st.last_y = zeros(N, C); st.Yf = zeros(st.frame_size+1, 1); st.Rf = zeros(st.frame_size+1, 1); st.Xf = zeros(st.frame_size+1, 1); st.Yh = zeros(st.frame_size+1, 1); st.Eh = zeros(st.frame_size+1, 1); st.X = zeros(N, K, M+1); st.Y = zeros(N, C); st.E = zeros(N, C); st.W = zeros(N, K, M, C); st.foreground = zeros(N, K, M, C); st.PHI = zeros(N, 1); st.power = zeros(st.frame_size+1, 1); st.power_1 = ones((st.frame_size+1), 1); st.window = zeros(N, 1); st.prop = zeros(M, 1); st.wtmp = zeros(N, 1); st.memX = zeros(K, 1); st.memD = zeros(C, 1); st.memE = zeros(C, 1); st.saturated = 0; st.adapted = 0; st.sum_adapt = 0; st.Pey = 1; st.Pyy = 1; st.Davg1 = 0; st.Davg2 = 0; st.Dvar1 = 0; st.Dvar2 = 0; end function was_closed = update_waitbar_check_wasclosed(h, n, end_point, Fs) was_closed = 0; % update waitbar try waitbar(n/end_point, h, ['Processing... ', num2str(n/Fs, '%.2f'), 's / ', num2str(end_point/Fs, '%.2f'), 's' ]); catch ME % if it's no longer there (closed by user) if (strcmp(ME.identifier(1:length('MATLAB:waitbar:')), 'MATLAB:waitbar:')) was_closed = 1; % then get out of the loop end end endend
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 591
- 592
- 593
- 594
- 595
- 596
- 597
- 598
- 599
- 600
- 601
- 602
- 603
- 604
- 605
- 606
- 607
- 608
- 609
- 610
- 611
- 612
- 613
- 614
- 615
- 616
- 617
- 618
- 619
- 620
- 621
- 622
- 623
- 624
- 625
- 626
- 627
- 628
- 629
- 630
- 631
- 632
- 633
- 634
- 635
- 636
- 637
- 638
- 639
- 640
- 641
- 642
- 643
- 644
- 645
- 646
- 647
- 648
- 649
- 650
- 651
- 652
- 653
- 654
- 655
- 656
- 657
- 658
- 659
- 660
- 661
- 662
- 663
- 664
- 665
- 666
- 667
- 668
- 669
- 670
- 671
- 672
- 673
- 674
- 675
- 676
- 677
- 678
- 679
- 680
- 681
- 682
- 683
- 684
- 685
- 686
- 687
- 688
- 689
- 690
- 691
测试方法
首先需要自己读取文件并设置相关的初始值
给出自己的一个样例
fid=fopen('near.pcm', 'rb'); % Load far endssin=fread(fid,inf,'float32');fid=fopen('far.pcm', 'rb'); % Load fnear endrrin=fread(fid,inf,'float32');ssin=ssin(1:4096*200);rrin=rrin(1:4096*200);Fs=16000;filter_length=4096;frame_size=128;speex_mdf_out = speex_mdf(Fs, rrin, ssin, filter_length, frame_size);
执行完之后,需要播放出来听:
sound(speex_mdf_out.e,16000)
代码里名词术语
RERL:ERL+ERLERERL:residual_echo_return_lossERL:echo_return_lossERLE:echo_return_loss_enhancementpsd:power spectral density 功率谱密度x: far endd: near ende: errors: psdnlp:non-linear processin