基于内容的音频场景分割

来源:互联网 发布:oracle sql rollup 编辑:程序博客网 时间:2024/06/07 11:33

如需转载,请注明出处!


从音频场景分割的复杂度上来讲,最有难度的为 speech+song VS song 之间的分割。

speech+song: 前景为人说话,背景为singing song

speech+song  VS song  而且的基本构成都可视为  speech+instrument,场景很为相似,所以在分割上增加很大难度。


下表为实验结果:


1:music  0:speech


21.952000, 0 -> 1
25.632000, 1 -> 0
54.400000, 0 -> 1
59.488000, 1 -> 0
114.848000, 0 -> 1
121.984000, 1 -> 0
167.456000, 0 -> 1
173.216000, 1 -> 0
197.568000, 0 -> 1
202.912000, 1 -> 0
229.728000, 0 -> 1
233.024000, 1 -> 0
256.384000, 0 -> 1
263.200000, 1 -> 0
308.000000, 0 -> 1
315.712000, 1 -> 0
332.480000, 0 -> 1
336.160000, 1 -> 0
360.512000, 0 -> 1
368.640000, 1 -> 0
397.184000, 0 -> 1
399.584000, 1 -> 0
404.928000, 0 -> 1
408.000000, 1 -> 0
441.984000, 0 -> 1
450.688000, 1 -> 0
468.608000, 0 -> 1
475.232000, 1 -> 0
506.112000, 0 -> 1
511.008000, 1 -> 0
544.608000, 0 -> 1
546.496000, 1 -> 0
548.480000, 0 -> 1
552.000000, 1 -> 0
592.288000, 0 -> 1
601.504000, 1 -> 0
653.472000, 0 -> 1
660.032000, 1 -> 0



文件地址:

http://pan.baidu.com/s/1sjzfVE9

0 0
原创粉丝点击