sidekit中GMM-UBM中speaker-adaptation部分

来源:互联网 发布:c语言与程序设计 编辑:程序博客网 时间:2024/05/23 14:24

----2016.12.18 已补充 ----
sidekit还是挺不错的,很简单,文档更是直接把源码给你,如果能顺利搭好环境,如果有基础的话,一天之内跑通应该是没有问题的。
下面开始对GMM-UBM中说话人自适应调整以及计算得分进行详细的分析,其中也会有代码改写的部分,因为那么多h5文件,看着挺烦的, 在看下面之前首先保证已经熟悉了sidekit, 并且对里边的h5文件的格式都很清楚,否则没有必要继续往下看。

下面这是自适应部分的源码,utils是自己写的,gmm-score 和 EER 部分暂时请忽略,后面会涉及到,重点看MAP部分:

import sidekitimport numpy as npfrom utils import EER, gmm_scoreimport h5py'''this stand version can run the predicted result'''enroll_idmap = sidekit.IdMap('task/enroll_spks2utt.h5')ubm = sidekit.Mixture()ubm.read("task/ubm.h5")nj = 10server_eval = sidekit.FeaturesServer(feature_filename_structure="./mfcc_eval/{}.h5",                                     dataset_list=["energy", "cep", "vad"],                                     mask=None,                                     feat_norm="cmvn",                                     keep_all_features=False,                                     delta=True,                                     double_delta=True,                                     rasta=True,                                     context=None)print('Compute the sufficient statistics')enroll_stat = sidekit.StatServer(enroll_idmap, ubm)enroll_stat.accumulate_stat(ubm=ubm, feature_server=server_eval,\ seg_indices=range(enroll_stat.segset.shape[0]), num_thread=nj)enroll_stat.write('task/stat_enroll_stand.h5')print('MAP adaptation of the speaker models')regulation_factor = 3  # MAP regulation factorenroll_sv = enroll_stat.adapt_mean_map_multisession(ubm, regulation_factor)enroll_sv.write('task/map_enroll_stand.h5')print('Compute trial scores')enroll = sidekit.StatServer('task/map_enroll_stand.h5')s = np.zeros((59, 1024))gscore = gmm_score(ubm, enroll, server_eval, s)scores = gscore.compute_scores()eer = EER(scores)eer.compute_eer()

上面这段主要是两个方法一个是计算统计量accumulate_stat(), 还有一个是MAP部分更新统计量adapt_mean_map_multisession(), 下面分别看一下这两个方法,其中有写参数传递跟源码不太一样,本来想重写,但是写的不如人家好:

    def accumulate_stat(self, feature_server):        '''        reuslt: enroll_stat.write('task/stat_enroll.h5')        stat0.shape = (228, 1024)        stat1.shape = (228, 64512)        start.shape = (228, ) #discard, start,stop对结果没影响        stop.shape = (228, ) #discard        segset.shape = (228, )        modelset.shape = (228, )        这里是计算每句话的统计量,然后在后面的map部分计算每个人的统计量,除了注释部分,这个函数除了sum_log_probabilities(),其它都还是比较好理解的,根据论文也基本都能看懂        '''        for idx in range(self.segset.shape[0]):            print('Compute statistics for {}'.format(self.segset[idx]))            show = self.segset[idx]            cep, vad = feature_server.load(show)            # Verify that frame dimension is equal to gmm dimension            lp = self.ubm.compute_log_posterior_probabilities(cep)            pp, foo = self.sum_log_probabilities(lp)            # Compute 0th-order statistics, 是            self.stat0[idx, :] = pp.sum(0) #stat0_{i} = ni = \sum{t=1}^{T} Pr(\lambda_{i} | x_{t})            # Compute 1st-order statistics, 其中1024是num_components, 63是特征的维数            self.stat1[idx, :] = np.reshape(np.transpose(np.dot(cep.transpose(), pp)), 1024 * 63)    #其中求lp的实现    def compute_log_posterior_probabilities(self, cep, mu=None):        """ Compute log posterior probabilities for a set of feature frames.        :param cep: a set of feature frames in a ndarray, one feature per row        :param mu: a mean super-vector to replace the ubm's one. If it is an empty               vector, use the UBM        :return: A ndarray of log-posterior probabilities corresponding to the               input feature set.        """        if cep.ndim == 1:            cep = cep[numpy.newaxis, :]        A = self.A        if mu is None:            mu = self.mu        else:            # for MAP, Compute the data independent term            A = (numpy.square(mu.reshape(self.mu.shape)) * self.invcov).sum(1) \               - 2.0 * (numpy.log(self.w) + numpy.log(self.cst))        # Compute the data independent term        B = numpy.dot(numpy.square(cep), self.invcov.T) \            - 2.0 * numpy.dot(cep, numpy.transpose(mu.reshape(self.mu.shape) * self.invcov))        # Compute the exponential term        lp = -0.5 * (B + A)        return lpQ1:这里的lp矩阵就应该是 wi * P(x_{t} | \lambda_{i})的(nframes, num_components)矩阵, 不明白为什么要加log进行对数运算? 这还只是计算统计量,还没到计算得分。(已发邮件,等作者回复) 刚才看了一下问作者的那几个问题, 昨天问过柯老师, 貌似都明白了:A: 这些w, u, \sigma 的值都很小, 如果直接计算 P(xt|\lambda_{i}) 的话可能会出现下溢出,在计算 bayes 公式的时候分母 \sum{i=1}^{M} w_{i}*P(x_{t}|\lambda_{i}),由于有的项可能是特别小,有的项比较大(0.1 + 10^-10 = 0.1),会造成很多数据的丢失, 在计算过程中数据的丢失是不可以的,因此要取对数运算 这里lp计算完了之后需要用贝叶斯公式,转化成Pr(\lambda_{t} | x_{t}) , 按理说pp矩阵就是这样的:    def sum_log_probabilities(self, lp):        '''        Args:            lp:  ( nframes , num_components)  wi * P(xt | \lambda_i)  还不是 ni        Returns:        '''        pp_max = np.max(lp, axis=1)  # 每一行取最大组成        log_lk = pp_max + np.log(np.sum(np.exp((lp.transpose() - pp_max).transpose()), axis=1))        ind = ~np.isfinite(pp_max)        #print("ind: ", ind) #全是false        if sum(ind) != 0:            log_lk[ind] = pp_max[ind]        pp = np.exp((lp.transpose() - log_lk).transpose())        return pp, log_lk   Q2: 但是这个pp没有看懂这么做的依据是什么? (已发邮件,等作者回复)    A:  知道上面那个问题之后就可以理解了,可以在纸上仔细推导一遍, 都转化成了对数运算,     看起来不直观,不过却是最后是Pr( \lambda_{i}|xt)

第二个函数中一些numpy的方法的使用很巧妙,都转化成矩阵运算,速度提升很大,弄懂了这个小方法,这里边没有很难理解的地方,具体的看里边的注释:

  def adapt_mean_map_multisession(self, regulation_factor):        gsv_statserver = enroll_stat()        gsv_statserver.modelset = np.unique(self.modelset)        gsv_statserver.segset = np.unique(self.modelset)        gsv_statserver.stat0 = np.ones((np.unique(self.modelset).shape[0], 1))        num_components = 1024        dim_feature = 63        index_map = np.repeat(np.arange(num_components), dim_feature)        # Sum the statistics per model        #modelStat = self.sum_stat_per_model()[0]        modelStat = self.sum_stat_per_model()        # Adapt mean vectors        alpha = modelStat.stat0 / (modelStat.stat0 + regulation_factor)  # ni = stat0        '''        a = np.array([['a','b','c'],['d','e','f']])        >>> a[:, [1,1,2,2]]        array([['b', 'b', 'c', 'c'],                ['e', 'e', 'f', 'f']],        dtype='|S1')        这里把运算都用矩阵代替了, 不太好直观的理解        由于modelStat.stat0是(59, 1024), 公式中的ni = \sum{t=1}^{T} Pr(i | xt)        由于modelStat.stat1是(59, 1024*63), 公式中的 E_{i}(x) = \sum{t=1}^{T} Pr(i | xt) * (xt) / ni        stat1中的每一个model的均值超矢量的都要除以 ni        modelStat.stat0[:, index_map] 将原先的1024维的ni系数 每一个系数都复制63遍,然后扩展成1024 * 63维        这样第一个component的的63维的均值都会除以第一个系数, 以此类推, 这种实现方法真的很高效        >>> c        array(['d', 'e', 'f'],        dtype='|S1')        >>> np.tile(c, (3,1))        array([['d', 'e', 'f'],        ['d', 'e', 'f'],        ['d', 'e', 'f']],        dtype='|S1')        >>> np.tile(c, 3)        array(['d', 'e', 'f', 'd', 'e', 'f', 'd', 'e', 'f'],        dtype='|S1')        关于第二个参数是元组的时候感受一下        M 的更新那一行就可以理解了        NOTE:        但是这样的话, 整个更新的过程就值更新了一次啊, 并且句子之间并没有前后的联系        '''        M = modelStat.stat1 / modelStat.stat0[:, index_map]  # (59, 1024*63)        M[np.isnan(M)] = 0  # Replace NaN due to divide by zeros        M = alpha[:, index_map] * M \            + (1 - alpha[:, index_map]) * np.tile(self.ubm.mu.flatten(), (M.shape[0], 1))        gsv_statserver.stat1 = M        return gsv_statserver    def sum_stat_per_model(self):        """Sum the zero- and first-order statistics per model and store them        in a new StatServer.        :return: a StatServer with the statistics summed per model        """        sts_per_model = enroll_stat()        sts_per_model.modelset = np.unique(self.modelset)        sts_per_model.segset = sts_per_model.modelset        sts_per_model.stat0 = np.zeros((sts_per_model.modelset.shape[0], self.stat0.shape[1]))  # (59, 1024)        sts_per_model.stat1 = np.zeros((sts_per_model.modelset.shape[0], self.stat1.shape[1]))  # (59, 1024*63)        #session_per_model = np.zeros(np.unique(self.modelset).shape[0])        '''            print("idx is ", idx, "model is ", model)            print("stat0 --> ", self.stat0.shape) #(228, 1024) enrollment一共有228句话, 计算的时候是每句话计算统计量            print("how to sum", self.stat0[self.modelset == model, :].shape) #(4, 1024) 选出species为model的四句话,然后加和            print("sum to sts_per_model",sts_per_model.stat0) #加和之后赋给 sts_per_model.stat0 的第idx行            这样思路就清楚多了        '''        for idx, model in enumerate(sts_per_model.modelset):            sts_per_model.stat0[idx, :] = self.stat0[self.modelset == model, :].sum(axis=0)            sts_per_model.stat1[idx, :] = self.stat1[self.modelset == model, :].sum(axis=0)            #session_per_model[idx] += self.stat1[self.modelset == model, :].shape[0]        #return sts_per_model, session_per_model        return sts_per_model

这里的speaker model只更新了一次,但是结果却是很不错,不知道其他的框架中怎么实现的,或者是有没有更标准的GMM-UBM的实现,如果有的话,麻烦各位大神给评论或者私信,如果文中有错误的地方,也请指出,非常感谢!

–OK

1 0