Opensmile 简介

来源：互联网发布：热传导有限元软件编辑：程序博客网时间：2024/04/28 02:01

OpenSMILE软件简介

--此部分内容转载自他人，并进行笔记梳理。

--装载连接：http://blog.sina.com.cn/s/blog_8d351dfc0102w85j.html

一、简介

1. OpenSMILE软件介绍

openSMILE软件是一种以命令行形式运行的而不是图形界面的操作软件，通过配置config文件对音频进行特征提取。现在openSMILE 被世界上的研究学者和公司广泛应用。

openSMILE适用的领域有：speech recognition (feature extraction front-end, keyword spotting, etc.), the area of affective computing（情感计算） (emotion recognition, affect sensitive virtual agents, etc.), Music Information Retrieval_r(chord labeling（和弦标记）, beat tracking）（节拍追踪）, onset detection（起始点检测） etc.). With the 2.0 open-source release we target the wider multi-media community by including the popular openCV library for video processing and video feature extraction.

[转载]OpenSMILE软件简介

Figure.1 语音识别系统基本原理框图及openSMILE的应用

2. OpenSMILE软件的输入输出文件格式

Data input: openSMILE can read data from the following file formats

– RIFF-WAVE (PCM) (for MP3, MP4, OGG, etc. a converter needs to be used)

– Comma Separated Value (CSV)

– HTK parameter _les

– WEKA's ARFF format（由htk工具产生）

– Video streams via openCV.（opencv产生的视频流数据）

Data output: For writing data data to _les, the same formats as on the input side are supported, except for an additional binary matrix format:

– RIFF-WAVE (PCM uncompressed audio)

– Comma Separated Value (CSV)

– HTK parameter _le

– WEKA ARFF _le （WEKA 工具中的 ARFF 文件）

– LibSVM feature _le format（LibSVM 工具的 feature 信息）

– Binary float matrix format

3. OpenSMILE可以对数据进行以下四类的特征提取操作：

1) Signal Processing: The following functionality is provided for general signal processing or signal pre-processing (prior to feature extraction):

– Windowing-functions (Rectangular, Hamming, Hann (raised cosine), Gauss, Sine, Triangular,Bartlett, Bartlett-Hann, Blackmann, Blackmann-Harris, Lanczos)（WF）

– Pre-/De-emphasis (i.e. 1st order high/low-pass)

– Re-sampling (spectral domain algorithm)

– FFT (magnitude, phase, complex) and inverse（快速傅里叶变换--幅度、相和 complex fft--及反变换）

– Scaling of spectral axis via spline interpolation (open-source version only)（通过样条插值进行频谱轴的缩放）

– dbA weighting of magnitude spectrum（幅度谱加权）

– Autocorrelation function (ACF) (via IFFT of power spectrum)（自相关函数）

– Average magnitude difference function (AMDF)（平均幅值差分函数）

2) Data Processing: openSMILE can perform a number of operations for feature normalization, modification, and differentiation:

– Mean-Variance normalization (o_-line and on-line)(均值方差标准化)

– Range normalization (o_-line and on-line)（幅度标准化）

– Delta-Regression coefficients (and simple differential)（Delta 回归系数和简易的微分）

– Weighted Differential（加权微分）

– Various vector operations: length, element-wise addition, multiplication, logarithm, and power.（各种各样的向量运算）

– Moving average filter for smoothing of contour over time.（？）

3) Audio features (low-level): The following (audio specific) low-level descriptors can be computed by openSMILE:

– Frame Energy（帧能量）

– Frame Intensity / Loudness (approximation)（帧强度）

– Critical Band spectra (Mel/Bark/Octave, triangular masking filters)（临界频带谱）

– Mel-/Bark-Frequency-Cepstral Coefficients (MFCC)（倒谱系数）

– Auditory Spectra（听觉谱）

– Loudness approximated from auditory spectra.（听觉谱近似强度）

– Perceptual Linear Predictive (PLP) Coe_cients（？）

– Perceptual Linear Predictive Cepstral Coe_cients (PLP-CC)（？）

– Linear Predictive Coefficients (LPC)（线性预测系数）

– Line Spectral Pairs (LSP, aka. LSF)）（线光谱对）

– Fundamental Frequency (via ACF/Cepstrum method and via Subharmonic-Summation (SHS))（基础频率）

– Probability of Voicing from ACF and SHS spectrum peak）（ACF 和 SHS 谱峰的概率）

– Voice-Quality: Jitter and Shimmer）（声音质量：紧张和支支吾吾）

– Formant frequencies and bandwidths（共振频率和带宽）

– Zero- and Mean-Crossing rate（过零率和平均穿越率）

– Spectral features (arbitrary band energies--任意波段能量, roll-off points--转出点, centroid--几何中心, entropy--熵, maxpos, minpos, variance (=spread), skewness--偏度, kurtosis--峰值, slope--斜率)（声谱特征）

– Psychoacoustic sharpness, spectral harmonicity（心理声学锐度和声谱调和性）

– CHROMA (octave warped semitone spectra) and CENS features (energy normalised and smoothed CHROMA)arbitrary band energies（？）

– CHROMA-derived Features for Chord and Key recognition（用于和弦、声调识别的 CHROMA 产生的特征）

4) Functionals: In order to map contours of audio and video low-level descriptors onto a vector of fixed dimensionality, the following functionals can be applied:

– Extreme values and positions

– Means (arithmetic, quadratic, geometric)

– Moments (standard deviation, variance, kurtosis, skewness)

– Percentiles and percentile ranges

– Regression (linear and quadratic approximation, regression error)

– Centroid

– Peaks

– Segments

– Sample values

– Times/durations

– Onsets/Offsets

– Discrete Cosine Transformation (DCT)（离散余弦变换）

– Zero-Crossings

– Linear Predictive Coding (LPC) coefficients and gain

4. config文件格式和运行方式

1) config文件格式

[转载]OpenSMILE软件简介

Figure.2 Overview on openSMILE's component types and openSMILE's basic architecture

Figure.2 shows the overall data-flow architecture of openSMILE, where the data memory is the central link between all dataSource, dataProcessor, and dataSink components.

[转载]OpenSMILE软件简介

Figure.3 Incremental processing with ring-buffers. Partially filled buffers (left) and filled

buffers with warped read/write pointers (right).

The ring-buffer based incremental processing is illustrated in Figure 3. Three levels are present in this setup: wave, frames, and pitch. A cWaveSource component writes samples to the 'wave' level. The write positions in the levels are indicated by a red arrow. A cFramer produces frames of size 3 from the wave samples (non-overlapping), and writes these frames to the 'frames ' level. A cPitch (a component with this name does not exist, it has been chosen here only for illustration purposes) component extracts pitch features from the frames and writes them to the 'pitch' level. In figure 3 (right) the buffers have been filled, and the write pointers have been warped. Data that lies more than 'buffersize' frames in the past has been overwritten.

2) openSMILE执行方式

openSMILE软件是通过命令行形式运行提取音频特征的。命令行格式如下：

SMILExtract -C config/demo/demo1nenergy.conf -I wav_samples/speech01.wav -O speech01.energy.csv

其中，-C 说明提取特征的配置文件，-I 说明输入的数据源，-O 说明输出的特征文件，另，执行 SMILExtraction –h 命令，可以显示openSMILE软件所有使用信息并退出。

3) config文件示例

openSMILE软件的配置文件示例如下：

[ component Instances : cComponentManager ] < don't change this
; configure the default data memory :
instance [ dataMemory ] . type=cDataMemory
;configure an example data source(name = source1 ) :
instance [ source1 ] . type= cWaveSource
instance [frame ] . type= cFramer
instance[pe].type=cVectorPreemphasis
……
/////////////// component configuration ////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////////////
; the following sections configure the components listed above
[ source1 : cWaveSource ]
; the following sets the level this component writes to
; the leval will be created by this component
; no other components may write to a level having the same name
writer . dmLevel = wave
filename = input .wav

[frame : cFramer ]
reader . dmLevel=wave
writer . dmLevel=frames
frameSize = 0.0250
frameStep = 0.010

[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0
……
////////////////data output configuration //////////////////////
// ----- you might need to customize the arff output to suit your needs: ------
[arffsink:cArffSink]
reader.dmLevel= framespe
; do not print "frameIndex" attribute to ARFF file
frameIndex=0
frameTime=1
; name of output file as commandline option
filename=cm[arffout(O){output.arff}:name of WEKA Arff output file]
; name of @relation in the ARFF file
relation=cm[corpus{SMILEfeatures}:corpus name, arff relation]

; name of the current instance (usually file name of input wave file)
instanceName=cm[instname(N){noname}:name of arff instance]
;; name of class label
class[0].name = emotion
class[0].type = cm[classes{unknown}:all classes for arff file attribute]
target[0].all = cm[classlabel(a){unknown}:instance class label]
; append to an existing file, so multiple calls of SMILExtract on different
; input files append to the same output ARFF file
append=1

通过以上简单的config文件示例，可以清楚的看到配置文件的书写方式，根据自己想要的音频特征修改配置文件可以提取相应的音频特征。其中，各类特征提取的参数可以根据的需要进行修改。

5. 延伸拓展

openSMILE软件是一个开源的数据库，所有的程序都是由C++语言编写，并且openSMILE软件可以适用于分析各种时序数据。只要根据自己的数据信息，可以修改openSMILE软件的源代码生成自己的.exe程序就可以用于处理相应数据。

openSMILE软件对于音频处理的特征提取是一款很有效的工具，我们可以借助工具找到自己的创新点，而不是仅仅局限于开发一个特征提取程序，有了这些有效工具的帮助我们可以很快的找到自己需要着重研究的点。在各个领域内，我们都要善于利用各种工具用于自己的开发研究，站在巨人的肩膀上开拓创新一定会比闭门造车更能收获成功。

注：更多关于openSMILE软件的信息，可以在官网http://openSMILE.sourceforge.net/下载openSMILE_book_2.0-rc1.pdf查阅。

openSMILE 开发站点：http://audeering.com/research/opensmile/

0 0