阅读"Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder"
来源:互联网 发布:ipad更新不了软件 编辑:程序博客网 时间:2024/06/05 05:32
- Stacked-Joint-Autoencoder(SJAE)架构:找到平行语料中源和目的特征的共同编码。
- 最终DNN的组成:SJAE中源编码,目的解码。
- 在非监督学习中,使用大量无关说话者的数据,会减少在监督学习中需要的平行语料的数据量。在此基础上,使用与源说话人和目的说话人相似的说话人的数据,进行非监督学习。
- 半监督学习系统中包含如下几个部分:
- 在general-purposes database 上训练一个多帧的Stacked Autoencoder(SAE)
- 从SAE中构建Stacked-Joint-AE(SJAE),使用与源和目的说话人相近的语音,用来重建joint feature vector,并保持中间层编码的相对独立
- 从SJAE构建DNN
- ANN:
hk+1=fk(Wkhk+bk) ,其中hk ,hk+1 是输入、输出,Wk 是权重,bk 是偏置量,fk 是激励函数。第一层称为输入层(h1 =x),最后一层称为输出层(y=hK+1 ),中间层为隐层。目标是最小化代价函数,例如E=||y−y′||2 。ANN的层数超过3层时称为DNN。 - Autoencoder:非监督学习,只需要知道输入值,输出值和输入值相等,因此问题转化成重建问题。AE可以学习数据低纬度的编码。这个技术可用于DNN监督学习前对网络参数的初始化。这个过程中会使用一些dropout方法避免过拟合。也可以构建更深层的AE,将AE叠加起来构成stacked-AE(SAE)。参考论文Figure 1: SAE, SJAE, and DNN architectures.
- Joint-Autoencoder:非监督学习,分别训练source和target的AE,采用新的损失函数
E=||x−x′||2+||y−y′||2+α||hx−hy||2 。其中每个AE可以进行多层叠加。 - DNN:以上一步训练出的source和target的JAE的参数值作为初始化参数进行训练。
- 实验流程:
- 考虑CLB–>SLT, RMS–>BDL
- 删除
0th coefficient的24th MCEP,使用SPTK 10 ms frame shift 和25ms frame size - 每次训练15帧,当前帧加前7帧和后7帧,因此每帧共有15*24=360个特征。
- 随机选取630个说话人中的80%,剩下10%用来验证,10%用来测试。
- SAE:drop out corruption level: 0.1
- 激励函数: 除第一层AE,使用tangent hyperbolic, 第一层AE使用g of Equation 3.
- 从TIMIT中分别选取10个最相近的说话人,作为目标和源说话人训练,使用说话人识别方法,每对说话人有两句平行语料。
- 4层DNN,[360N 1000N 500N 1000N 360L],其中N表示非线性激活函数,L表示线性激活函数。
使用Theano做实验
- test_wav(feature_type=’MCEP’, order=24, delta=False, neighbours=7,emphasis=0.9, frame_size=0.020, frame_rate=0.010)
train all TIMIT AE : 对源、目标人在TIMIT里分别寻找10个与之最接近的说话人进行网络初始化训练,每人10句话。
def ae_all(out_file, hidden_layers_sizes=None, corruption_levels=None, pretrain_lr=None, batch_size=None, training_epochs=None): # ae all on TIMITprint '... loading the data'data=load_vc_all_speakers()print '... loaded data with dimensions', str(data.shape[0]),'x', str(data.shape[1])print '... normalizing the data'mins, ranges = compute_normalization_factors(data)import picklef=open('norm_male.pkl','w+')pickle.dump(mins, f)#[24*7:24*7+24]##$pickle.dump(ranges, f)#[24*7:24*7+24]##$f.flush()f.close()new_data = normalize_data(data, mins, ranges)numpy_rng = np.random.RandomState(89677)import theanon_train_batches = int(0.9*new_data.shape[0])n_train_batches /= batch_size#new_data = new_data.astype(np.float32)[:,24*7:24*7+24]#mins = mins[24*7:24*7+24]#ranges = ranges[24*7:24*7+24]train_set = theano.shared(new_data[:int(0.9*new_data.shape[0]), :])test_set = theano.shared(new_data[int(0.9*new_data.shape[0]):, :])test_set_unnormalized = unnormalize_data(new_data[int(0.9*new_data.shape[0]):, :], mins, ranges)[:,24*7:24*7+24]print '... building the model'from ae_stacked import SdAsda = SdA( numpy_rng=numpy_rng, n_ins=new_data.shape[1], hidden_layers_sizes=hidden_layers_sizes,#[1000, 1000],)print '... getting the pretraining functions'pretraining_fns = sda.pretraining_functions(train_set_x=train_set, batch_size=batch_size)print '...training the model'import timeimport picklestart_time = time.clock()reconstruct = theano.function( inputs=[ ], outputs=sda.dA_layers[0].xrec, givens={ sda.dA_layers[0].x: test_set } )#corruption_levels = [0.2, 0.3]lr = pretrain_lrfor i in xrange(sda.n_layers): # go through pretraining epochs for epoch in xrange(training_epochs): # go through the training set c = [] for batch_index in xrange(n_train_batches): c.append(pretraining_fns[i](index=batch_index, corruption=corruption_levels[i], lr=pretrain_lr)) if i==0: XH = reconstruct() XH = unnormalize_data(XH, mins, ranges)[:,24*7:24*7+24] print 'melCD', melCD(XH, test_set_unnormalized) lr *= 0.99 if lr < 0.01: lr = 0.01 import pickle f=open(out_file,'w+') pickle.dump(sda, f) pickle.dump(mins, f) pickle.dump(ranges, f) f.flush() f.close()print 'Pre-training layer %i, epoch %d, cost ' % (i, epoch),print np.mean(c)end_time = time.clock()print 'The pretraining code for file ' +\ os.path.split(__file__)[1] +\ ' ran for %.2fm' % ((end_time - start_time) / 60.)
子函数
def load_vc_all_speakers():from glob import iglobfrom os import path, popenfrom os.path import existsimport pickledata = np.zeros((630*3500,24*15),dtype=np.float32)st=0cnt = 0if exists('../TIMIT_code/spk_wav/'):iter_directory = iglob('../TIMIT_code/spk_wav/M*.pkl')else:iter_directory = iglob('../gitlab/voice-conversion/src/spk_wav/M*.pkl')for fid in iter_directory:print'read_TIMIT_append_all: reaing file '+ fidf=open(fid, 'r')cur_fx=pickle.load(f)f.close()data[st:st+cur_fx.shape[0],:] = cur_fxst += cur_fx.shape[0]cnt+=1#if cnt > 10:# breakdata = data[:st,:]return data
子函数2
- load norm
- load xy20
- load xy
11.
0 0
- 阅读"Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder"
- 阅读《A Survey of Monocular Simultaneous Localization and Mapping》
- 阅读"voice conversion using deep bidirectional long short-term memory based recurrent neural networks"
- Anatomy of a function
- Word representations: A simple and general method for semi-supervised learning
- Poor Man’s Training Data: Graph-Based Semi-Supervised Learning
- Voice Conversion using Convolutional Neural Networks 翻译
- Training a deep autoencoder or a classifier on MNIST digits_之调试运行与理解
- Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(python)
- Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(Matlab)
- Training a deep autoencoder or a classifier on MNIST digits_Rbm训练(Matlab)
- Training a deep autoencoder or a classifier on MNIST digits_之调试运行与理解
- Blind Image Quality Assessment using Semi-supervised Rectifier Networks(泛读)
- Vehicle Type Classification Using Semi-Supervised Convolutional Neural Network(泛读)
- 论文读书笔记-Supervised machine learning:a review of classification techniques
- Structure of a Function Group
- Semi-supervised Segmentation of Optic Cup in Retinal Fundus
- Unsupervised, Semi-Supervised, Supervised Learning
- iOS 判断是否有新版本更新
- hibernate+MySQL性能测试
- 【bzoj2555】SubString
- mac 上 卸载 android studio
- 关于VS连接MYSQL
- 阅读"Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder"
- vs断点不起作用
- 大白话讲解如何给github上项目贡献代码
- 【VisualStudioCode】VSCode隐藏文件夹ignore folder
- Oracle如何根据一个日期计算同比环比的日期
- Hbase 学习笔记一 》Data Manipulation
- iOSUI关系基础知识
- Syntax Error v.s. Exception, (try, except, finally) - Python
- 将实例状态存储在实例对象中,方法存储在原型中