sphinx中自己提取特征参数训练声学模型参数方法探讨

来源：互联网发布：sharding-jdbc mysql 编辑：程序博客网时间：2024/05/19 18:39

CMUSphinx 将特征参数保存在后缀为.mfc的文件中，进行声学模型训练时候，首先就是提取特征参数。需要注意的是，在训练和解码过程中，不但用到了静态MEL参数，还用到了deltas和delta-deltas参数。在文件中，仅保存了MEL静态参数，动态参数将在程序运行时计算。计算方式可以通过-feat配置，例如：-feat 1s_c_d_dd 表示读取向量，并计算 deltas和delta-deltas参数然后将它们合并为1维特征向量。还有一种情况和上述不同，例如 's2_4x' 表示计算deltas, delta-deltas, delta-deltas 的二介参数，并综合为特殊的4维向量。如果读者需要特殊的特征参数而非MFCC，则读者在sphinxbase中添加代码提取所需特征参数。如果读者配置参数为-feat1s_c 表示仅读取或者使用原始向量数据，不使用delta和delta-deltas等。

.mfc文件保存数据为二进制文件，每个数据保存为float型，4个字节。起始的4个字节作为文件头保存的是文件中总的参数个数。例如：我们有N帧特征参数，假设每帧为M维向量，因此，总数据个数为N×M，该数据为int32型而非float型，这一点需要注意。特征向量将被一帧一帧的存储，因此第一帧（M个数据）首先被存储，然后是第二帧....

.mfc个数保存如下：

header (int32 length)features of the frame 1 (13 floats or 13 * 4 bytes)features of the frame 2 (13 floats or 13 * 4 bytes)features of the frame ... (13 floats or 13 * 4 bytes)features of the frame N (13 floats or 13 * 4 bytes)

如果读者想在trainer中配置特征向量长度，可以在几乎所用可执行文件中使用-veclen来完成，也可以在sphinx_train.cfg中进行配置。

以上为参考下面文字得来。

下面我们看一下如何添加自己的特征参数吧。

首先我们已经确定了自己的特征参数的提取方法，并且已经编写为C代码，例如 myfeatures* = get_features();

然后我们首先需要将此函数添加到sphinxbase中，完成特征参数提取。本人未使用该方法，而是在continuous.c中直接修改。然后向量长度的变化可以通过在运行pocketsphinx_continuous是通过-i N来控制。

其次我们需要训练该参数的声学模型，首先提取特征参数，并保存为上述的格式。训练方法和训练MFCC方法相同，向量维度变化需要特别注意，否则不会训练成功。

最后就可以测试一下新的特征参数是否可以达到要求了。

参考文献：

MFC files

CMUSphinx stores features in files with the extension .mfc. Please note that during training and decodingwe usually use static features like mel-cepstrum and dynamic features like mel-cepstrum deltas or delta-deltas. On the filesystem we store only static features and dynamic features are computed on the fly. The computatoin is configured with the-feat option. For example-feat 1s_c_d_dd means to read the vector and compute deltas and delta-deltas and combine them with 1-stream feature vector. There are different types like 's2_4x' which means to compute deltas, delta-deltas, delta-deltas of the second order and combine them in a special 4-stream feature vector. If you need a specific feature arrangement you can implement your own feature type in sphinxbase. If you want to use features as is use1s_c feature type which means to read the vector unmodified.

The file stores the values in binary format and each value is stored as a float i.e. 4 bytes. The first 4 bytes in the file specify the header which is nothing but the total number of values stored. For example, if we have N number of frames, and let’s assume each frame is represented as a M (usually 13) dimensional vector. So, the header which is basically the total number of distinct values in feature matrix is N*M. Note that it is stored as an int32. The feature vectors will be stored according to the column number or frame number. So, 1st frame (M values in total) will be stored first, followed by second and so on.

It looks like

header (int32 length)features of the frame 1 (13 floats or 13 * 4 bytes)features of the frame 2 (13 floats or 13 * 4 bytes)features of the frame ... (13 floats or 13 * 4 bytes)features of the frame N (13 floats or 13 * 4 bytes)

If you need to configure the vector length in trainer you can use -veclen option which works with almost all binaries. It's also possible to configure this globally for the training database in sphinx_train.cfg file.

If you want to implement your own feature type you need to write a tool to dump features in the format above. You can easily do that given you know basics of the programming.