声学模型训练----Acoustic Modeling
来源:互联网 发布:autodesk 打印 软件 编辑:程序博客网 时间:2024/05/16 01:49
General Framework for AM:
Building ASR system incrementally
Context-independent ➔ Context-dependent modeling 上下文无关文本➔上下文相关模型
Mono-phone ➔ Tri-phone HMM 单音素➔三音素
Single Gaussian mixture per state ➔ Multiple Gaussian mixtures per state 单高斯➔混合高斯
Data Preparation:
Acoustic Unit Selection:
Criteria
Accurate: accurately represent the acoustic realization that appears in different contexts
Trainable: have enough data to estimate the parameters of the unit
Generalizable: any new word can be derived from a predefined unit inventory for task-independent speech recognition
准确性:准确地表示出现在不同上下文中的声学实现
可训练的:有足够的数据估计参数
可概括的:可以从任务无关语音识别的预定单位清单中导出任何新单词
Units available
- Word
- Syllable 音节
- Initial/Final (Chinese-specific)
- Phoneme 音素
Word
60,000
Syllable
420(1200+ with tone)
Initial/Final
60(22+38)
Phoneme
39
Note: 97 tone-dependent phonemes are designed by Microsoft
What is Phoneme and Phone:
Phoneme(phn)
Denote the minimal units of speech sound in a language 语音的最小单位
Serve to distinguish one word from another 用来区分一个单词和另一个单词
pat vs. bat
Phone
denote a phoneme’s acoustic realization 音素的声学实现
dependent on gender, speech rate, context, accent etc. 取决于性别,语度,语境,口音等。
sat and meter (distinct phones)
Context-independent Modeling :
Context-dependent Modeling:
Why context-dependent?
Co-articulation(协同发音)
The process by which neighboring sounds influence one another is called co-articulation 相邻声音相互影响的过程称为共同关系
Why triphone?
Only immediately proceeding and succeeding phonemes are taken into account 只有立即进行和后续的音素被考虑在内
Compromise between performance, complexity and data available. 性能,复杂性和可用数据之间的妥协。
Most widely used acoustic unit for CD modeling. 用于CD建模的最广泛使用的声学单元。
some issue:
How to obtain triphone-based transcription? 如何获取基于triphone的转录?
Based on monophone-based transcription 基于单音素转录
Taking neighboring phonemes into account 考虑相邻的音素
How to deal with data insufficiency? 如何处理数据不足?
possible triphone combinations 三音素组合
Some with sufficient occurrences can be robustly modeled 一些有足够的事件可以强大地建模
Some with fewer occurrences may be poorly modeled. 发生次数较少的部分可能模拟不佳。
Some may never occur in training data 有些可能永远不会发生在训练数据中
How to deal with data insufficiency?
Sharing (Tying) strategy 共享(绑定)策略
Model level sharing 模型级共享
State level sharing
Mixture level sharing 混合水平共享
Transition matrix sharing 转换矩阵共享
Mean/Variance sharing 平均值/方差分布
How to increase the numbers of Gaussian Mixture per state 如何增加每个状态的高斯混合数
Mixture splitting based on single Gaussian mixture 基于单一高斯混合的混合分裂
Iterations is necessary after each splitting 每次拆分后都需要迭代
Transcription and HMM list:
At present, all the triphones appear in training data
Sharing Strategies:分享策略:
Transition Matrix Sharing 转换矩阵共享
Assumption 假设
Transition matrix plays a less significant role in performance 转换矩阵在性能上起着不太重要的作用
Solution 解
All the Transition matrix from triphones with identical central unit are shared 来自具有相同中央单元的三音素的所有转换矩阵被共享
Ti T_ah {*-ah+*.Transp}
That is, the number of transition matrix is equal to number of monophones 也就是说,过渡矩阵的数量等于单声道的数量
State Sharing
Classification method 分类方法
Decision Tree-A classification method 决策树A分类方法
Goal: Merge the similar states of related models while keep the dissimilar states distinct 目标:合并相关模型的类似状态,同时保持不同的状态不同
Solution
Phonetic-related 语音相关
State-dependent 状态依赖
Language-dependent 与语言相关的
Question set 问题集
Decision-tree based state clustering:
Model Sharing
To deal with triphone combinations that never occur in training data
Descending the previously constructed trees for that phone and answering the questions at each node based on the new unseen context
根据新的看不见的上下文,降低先前构建的那部phone树,并回答每个节点的问题
sh-ang+s (not existing) : sh-ang+sh (existing)
Mixture Increment:
LVCSR typically consists of multiple mixture component context-dependent HMMs 对于三音HMM,在一个状态下8〜14高斯混合是理想的
Till now, Triphone HMMs with single Gaussian mixture per state were used in different sharing strategies 状态中的高斯混合过多可能会导致数据不足
A mechanism, called mixture splitting, can increase the number of mixtures within a state 在Autotrain中平均采用了一种状态下的12(8〜14)个高斯混合
The approach (mixture splitting) is extremely flexible since it allows the number of mixture to be repeatedly increased until the desired performance is achieved
Some iterations are necessary after mixture splitting is performed 该方法(混合物分离)是非常灵活的,因为它允许混合物的数量重复增加,直到达到期望的性能
混合分裂后,需要进行一些迭代
Flowchart for CD Modeling:
- 声学模型训练----Acoustic Modeling
- Kaldi声学模型训练
- 声学模型训练-嵌入式训练
- 声学模型训练-LDA算法
- 声学模型GMM-HMM训练
- 声学模型(一) hmm声学训练流程
- Pocketsphinx语音识别--重新训练声学模型
- 七、训练自己的声学模型
- Kaldi 训练一个 DNN 声学模型
- 训练自已的中文语言模型与声学模型
- 使用训练好的语言模型与声学模型
- CMU sphinx训练命令控制声学模型问题
- PocketSphinx语音识别系统声学模型的训练与使用
- 语音识别工具包pocketsphinx-0.8声学模型训练
- PocketSphinx语音识别系统声学模型的训练与使用
- PocketSphinx语音识别系统----声学模型的训练与使用
- 语音识别工具包pocketsphinx-0.8声学模型训练
- pocketSpinix 训练自己的声学模型(一)
- python3 pika之连接断开问题
- JS实现HTML实体与字符的相互转换(二)
- Linux Centos6.8 安装配置Tomcat-7.0.79
- 【拜小白opencv】15-利用ROI将一幅图像叠加到另一幅图像的指定位置
- IP地址
- 声学模型训练----Acoustic Modeling
- curl
- Android开发初体验之百度地图开发(1)
- Git提交无法检测到邮箱问题!
- Android版:验证手机号码的正则表达式
- 夏暑七月云南行
- 关于磁盘管理的例子
- 河南省多校连萌(四)
- java程序员阿里面试通过后总结:你会这些,你也能去阿里巴巴