theano学习指南--玻尔兹曼机(RBM)（翻译）

来源：互联网发布：手机淘宝怎么看总金额编辑：程序博客网时间：2024/05/29 08:29

欢迎fork我的github：https://github.com/zhaoyu611/DeepLearningTutorialForChinese

最近在学习Git，所以正好趁这个机会，把学习到的知识实践一下~ 看完DeepLearning的原理，有了大体的了解，但是对于theano的代码，还是自己撸一遍印象更深所以照着deeplearning.net上的代码，重新写了一遍，注释部分是原文翻译和自己的理解。感兴趣的小伙伴可以一起完成这个工作哦~ 有问题欢迎联系我 Email: zhaoyuafeu@gmail.com QQ: 3062984605

基于能量的模型(EBM)

基于能量的模型是将每个变量的能量进行整合。通过学习，可以使模型拥有期望的属性。例如，我们想要变量有较低的能量，则定义基于能量的概率模型根据能量函数定义概率分布如下：

$这里写图片描述$ （1）

其中正则化因子 $这里写图片描述$ 称为配分函数：

$这里写图片描述$

基于能量的模型的训练可以是对训练数据的负对数似然函数进行(随机)梯度计算。对于logistic回归，首先定义log似然函数，然后损失函数为负对数函数。

$这里写图片描述$

随机梯度为 $这里写图片描述$ ，其中 $这里写图片描述$ 为模型的参数。

带隐藏单元的EBMs

通常情况下，不需要获得完整的x，或者想要考虑一些非观察量来提高模型的表达力，所以用观察部分(仍然用x表示)和隐藏部分h，可以写成：
$这里写图片描述$ （2）
该公式与公式(1)相似。我们引入(从物理学的启发)自由能的概念，定义如下：

$这里写图片描述$ （3）
因此，有下列公式：
$这里写图片描述$
数据的负对数似然函数的梯度有特殊的形式：

$这里写图片描述$ （4）
注意到上述梯度包含两部分，分别为正项和负项。正项和负项不代表公式中各项的符号，而是代表模型中它们对概率密度的影响。第一项增加了训练数据的概率(减少自由能的相关性)，第二项较少了概率。
通常很难解析该梯度，因为它包含 $这里写图片描述$ 的计算。因为根据模型中的分布P，不能求解输入量x的所有配置形式。
计算过程中第一步是固定模型样本数量下估计期望。样本用来估计负数部分梯度，我们用 $这里写图片描述$ 来表示。梯度可以写成：

$这里写图片描述$ （5）
我们根据P从 $这里写图片描述$ 中采样 $这里写图片描述$ (例如蒙特卡洛采样)。根据上述公式，我们几乎得到了一个实际的、随机的算法来学习EBM。唯一缺少的因素是如何提取负粒子 $这里写图片描述$ 。
关于采样方法的相关文献中，马尔科夫链蒙特卡洛法特别适用类似受限玻尔兹曼机(RBM)的模型，即一个具体的EBM模型。

受限玻尔兹曼机(RBM)

受限玻尔兹曼机是对数线性马尔科夫随机场(MRF)的特殊形式。例如，能量模型是线性的，而其中参数是可变的。为了让参数能更好的表示复杂分布(例如从有限的参数设置到无参数设置)，我们考虑部分变量不做观察(它们称为隐藏)。为了获得更多的隐藏变量(也称作隐藏单元)，我们可以扩充玻尔兹曼机(BM)的模型容量。受限玻尔兹曼机是BM的受限形式，它不包括可见-可见和隐藏-隐藏之间的连接。RBM的图片描述如下所示：
这里写图片描述
RBM的能量函数E(v,h)定义如下：
$这里写图片描述$ (6)
其中，W代表连接隐层和可见层的权重，b，c分别代表可见层和隐层的偏置。
自由能的公式可表示为：
$这里写图片描述$
考虑到RBMs的特殊结构，可见层和隐层是条件独立的，即给定其中一个，可知另一个。利用该属性，得到以下公式：
$这里写图片描述$

二进制的RBMs

在通常的二进制单元(vj和hi∈0,1)的学习过程中，根据公式(6)和 (2)，神经元激活函数的概率形式为：
P(hi=1|v)=sigm(ci+Wiv) (7)
P(vj=1|h)=sigm(bj+W′jh) (8)
二进制的RBM的自由能可以简化为：
$这里写图片描述$ (9)

二进制RBM的更新函数

比较公式(5)和(9)，我们得到一个二进制RBM的对数似然函数的梯度计算：

$这里写图片描述$ (10)
如果想了解上述公式的更多细节，建议读者阅读以下网页，或者 Learning Deep Architectures for AI的第五部分。我们不使用上述公式，而是根据公式(4)利用Theano T.grad得到梯度。

RBM的采样

p(x)的样本可以通过运行马尔科夫链直至收敛获得，使用Gibbs采样进行转换操作。
N个随机变量 $这里写图片描述$ 的联合的Gibbs采样是通过N个子步骤 $这里写图片描述$ 完成的。其中， $这里写图片描述$ 是S集合中除了 $这里写图片描述$ 的其余N−1个变量。
对于RBMs，S由可见单元和隐藏单元的集合组成。然而，因为它们是条件独立的，所以可以使用块Gibbs采样。在这一背景下，对可见单元进行采样同时给定隐藏单元的固定值。相似的，对隐藏单元进行采样同时给定可见单元的固定值。一步马尔科夫链可以表示为：
$这里写图片描述$
其中 $这里写图片描述$ 代表第n步马尔科夫链中所有隐藏单元的集合。例如， $这里写图片描述$ 以 $这里写图片描述$ 的概率随机选择为0(相对为1)。相似的， $这里写图片描述$ 以 $这里写图片描述$ 概率随机选择为1(相对为0)。
下图为说明示例：

当 $这里写图片描述$ 时，样本 $这里写图片描述$ 是概率 $这里写图片描述$ 选择的样本。
理论上，学习过程中每个参数的更新要求运行这样的链直至收敛。毫无疑问，进行该操作是十分耗时耗力的。因此，从RBMs中衍生出若干算法，能够在学习过程中有效的从 $这里写图片描述$ 进行采样。

对比散度(CD-k)

对比散度有两个技巧可以加速采样过程：

因为我们最终目的是 $这里写图片描述$ (得到真正的数据分布)，用训练数据初始化马尔科夫链(例如，一个分布期望接近p，那么马尔科夫链就趋向最终分布p)。
CD不需要等待链式收敛。只需要进行k步Gibbs采样，就能获取样本。实际上，k=1就能表示出很好的效果。

persisitent CD

persisitent CD [Tieleman08] 使用另一种类似方法从p(v,h)中采样。它依赖单马尔科夫链，该链具有固定状态(例如，不会对每个观察案例重启一个链)。对于每个参数的更新，通过k步链式运算，提取新的样本。链的状态保存随后的更新。
直观感受是相比链的混合速率，如果参数更新足够小，马尔科夫链不能捕获模型中的改变。

执行

我们构造一个RBM类。网络的参数可以在初始化时确定，也可以作为参数传入类。当把RBM作为深度网络的一个模块时，这一可选类型是十分有用的：权重矩阵和隐层偏置与MLP网络的sigmoid层可以共享参数。

class RBM(object):    """Restricted Boltzmann Machine (RBM)  """    def __init__(        self,        input=None,        n_visible=784,        n_hidden=500,        W=None,        hbias=None,        vbias=None,        numpy_rng=None,        theano_rng=None    ):        """        RBM constructor. Defines the parameters of the model along with        basic operations for inferring hidden from visible (and vice-versa),        as well as for performing CD updates.        :param input: None for standalone RBMs or symbolic variable if RBM is        part of a larger graph.        :param n_visible: number of visible units        :param n_hidden: number of hidden units        :param W: None for standalone RBMs or symbolic variable pointing to a        shared weight matrix in case RBM is part of a DBN network; in a DBN,        the weights are shared between RBMs and layers of a MLP        :param hbias: None for standalone RBMs or symbolic variable pointing        to a shared hidden units bias vector in case RBM is part of a        different network        :param vbias: None for standalone RBMs or a symbolic variable        pointing to a shared visible units bias        """        self.n_visible = n_visible        self.n_hidden = n_hidden        if numpy_rng is None:            # create a number generator            numpy_rng = numpy.random.RandomState(1234)        if theano_rng is None:            theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))        if W is None:            # W is initialized with `initial_W` which is uniformely            # sampled from -4*sqrt(6./(n_visible+n_hidden)) and            # 4*sqrt(6./(n_hidden+n_visible)) the output of uniform if            # converted using asarray to dtype theano.config.floatX so            # that the code is runable on GPU            initial_W = numpy.asarray(                numpy_rng.uniform(                    low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),                    high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),                    size=(n_visible, n_hidden)                ),                dtype=theano.config.floatX            )            # theano shared variables for weights and biases            W = theano.shared(value=initial_W, name='W', borrow=True)        if hbias is None:            # create shared variable for hidden units bias            hbias = theano.shared(                value=numpy.zeros(                    n_hidden,                    dtype=theano.config.floatX                ),                name='hbias',                borrow=True            )        if vbias is None:            # create shared variable for visible units bias            vbias = theano.shared(                value=numpy.zeros(                    n_visible,                    dtype=theano.config.floatX                ),                name='vbias',                borrow=True            )        # initialize input layer for standalone RBM or layer0 of DBN        self.input = input        if not input:            self.input = T.matrix('input')        self.W = W        self.hbias = hbias        self.vbias = vbias        self.theano_rng = theano_rng        # **** WARNING: It is not a good idea to put things in this list        # other than shared variables created in this function.        self.params = [self.W, self.hbias, self.vbias]

下一步是根据公式(7)-(8)构造函数，代码如下：

def propup(self, vis):        '''This function propagates the visible units activation upwards to        the hidden units        Note that we return also the pre-sigmoid activation of the        layer. As it will turn out later, due to how Theano deals with        optimizations, this symbolic variable will be needed to write        down a more stable computational graph (see details in the        reconstruction cost function)        '''        pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

def sample_h_given_v(self, v0_sample):        ''' This function infers state of hidden units given visible units '''        # compute the activation of the hidden units given a sample of        # the visibles        pre_sigmoid_h1, h1_mean = self.propup(v0_sample)        # get a sample of the hiddens given their activation        # Note that theano_rng.binomial returns a symbolic sample of dtype        # int64 by default. If we want to keep our computations in floatX        # for the GPU we need to specify to return the dtype floatX        h1_sample = self.theano_rng.binomial(size=h1_mean.shape,                                             n=1, p=h1_mean,                                             dtype=theano.config.floatX)        return [pre_sigmoid_h1, h1_mean, h1_sample]

def propdown(self, hid):        '''This function propagates the hidden units activation downwards to        the visible units        Note that we return also the pre_sigmoid_activation of the        layer. As it will turn out later, due to how Theano deals with        optimizations, this symbolic variable will be needed to write        down a more stable computational graph (see details in the        reconstruction cost function)        '''        pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

def sample_v_given_h(self, h0_sample):        ''' This function infers state of visible units given hidden units '''        # compute the activation of the visible given the hidden sample        pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)        # get a sample of the visible given their activation        # Note that theano_rng.binomial returns a symbolic sample of dtype        # int64 by default. If we want to keep our computations in floatX        # for the GPU we need to specify to return the dtype floatX        v1_sample = self.theano_rng.binomial(size=v1_mean.shape,                                             n=1, p=v1_mean,                                             dtype=theano.config.floatX)        return [pre_sigmoid_v1, v1_mean, v1_sample]

我们可以用上述函数描述Gibbs采样过程。这里，定义两个函数：

gibbs_vhv从可见单元开始执行一步采样过程，该函数对于RBM的采样十分有用。
gibbs_hvh从隐层单元开始执行一步采样过程，该函数对于CD和PCD的更新十分有用。
代码如下：

 def gibbs_hvh(self, h0_sample):        ''' This function implements one step of Gibbs sampling,            starting from the hidden state'''        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample)        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample)        return [pre_sigmoid_v1, v1_mean, v1_sample,                pre_sigmoid_h1, h1_mean, h1_sample]

def gibbs_vhv(self, v0_sample):        ''' This function implements one step of Gibbs sampling,            starting from the visible state'''        pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample)        pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample)        return [pre_sigmoid_h1, h1_mean, h1_sample,                pre_sigmoid_v1, v1_mean, v1_sample]

注意函数要求未sigmoid激活的值作为输入量。如果想深入了解这样做的原因，那么需要了解Theano的工作原理。当编译Theano函数时，计算图中输入量的速度和稳定性得到优化，这是通过改变子图中若干部分实现的。这样的优化代表softplus中log(sigmoid(x))项。对于交叉熵，当sigmoid值大于30(结果趋于1就需要这样的优化。当sigmoid值小于-30(结果趋于0)，则Theano计算log(0)，最终代价为-inf或者NaN。通常情况下，softplus中log(sigmoid(x))项会得到正常值。但这里遇到特殊情况：sigmoid在scan优化内部，log在外部。因此，Theano会执行log(scan(…))而不是log(sigmoid(…))，也不会进行优化。我们找不到替代scan中sigmoid的方法，因为只需要在最后一步执行。最简单有效的办法是输出未sigmoid的值，在scan之外同时应用log和sigmoid。
RBM类构造了自由能函数，用于计算参数的梯度(见公式4)。注意函数中，同样输出未sigmoid量。

def free_energy(self, v_sample):        ''' Function to compute the free energy '''        wx_b = T.dot(v_sample, self.W) + self.hbias        vbias_term = T.dot(v_sample, self.vbias)        hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1)        return -hidden_term - vbias_term

构造get_cost_updates函数，输出CD-k和PCD-k更新的梯度。

 def get_cost_updates(self, lr=0.1, persistent=None, k=1):        """This functions implements one step of CD-k or PCD-k        :param lr: learning rate used to train the RBM        :param persistent: None for CD. For PCD, shared variable            containing old state of Gibbs chain. This must be a shared            variable of size (batch size, number of hidden units).        :param k: number of Gibbs steps to do in CD-k/PCD-k        Returns a proxy for the cost and the updates dictionary. The        dictionary contains the update rules for weights and biases but        also an update of the shared variable used to store the persistent        chain, if one is used.        """        # compute positive phase        pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input)        # decide how to initialize persistent chain:        # for CD, we use the newly generate hidden sample        # for PCD, we initialize from the old state of the chain        if persistent is None:            chain_start = ph_sample        else:            chain_start = persistent

注意到get_cost_updates有一个persistent的参数。因此，我们可以使用同一段代码执行CD和PCD。使用PCD时，persistent 是一个包含上次Gibbs采样的共享参数。
如果persistent 是None，那么在正项中对隐藏层样本初始化Gibbs链，执行CD。当决定了链的起始点，就能得到该链所有用于梯度计算(见公式4的样本。使用Theano提供的scan 来执行。该函数的使用建议读者阅读该链接。

 # perform actual negative phase        # in order to implement CD-k/PCD-k we need to scan over the        # function that implements one gibbs step k times.        # Read Theano tutorial on scan for more information :        # http://deeplearning.net/software/theano/library/scan.html        # the scan will return the entire Gibbs chain        (            [                pre_sigmoid_nvs,                nv_means,                nv_samples,                pre_sigmoid_nhs,                nh_means,                nh_samples            ],            updates        ) = theano.scan(            self.gibbs_hvh,            # the None are place holders, saying that            # chain_start is the initial state corresponding to the            # 6th output            outputs_info=[None, None, None, None, None, chain_start],            n_steps=k,            name="gibbs_hvh"        )

生成Gibbs链之后，从链末端进行采样，从而得到负项的自由能。注意到chain_end是一个代表模型参数数量的Theano的符号变量。如果应用* T.grad*，那么该函数会通过Gibbs链得到梯度。这不是我们期望的(这会混淆梯度)，而使用T.grad中的consider_constant 可以实现将T.grad 和* chain_end*作为常量的要求。

 # determine gradients on RBM parameters        # note that we only need the sample at the end of the chain        chain_end = nv_samples[-1]        cost = T.mean(self.free_energy(self.input)) - T.mean(            self.free_energy(chain_end))        # We must not compute the gradient through the gibbs sampling        gparams = T.grad(cost, self.params, consider_constant=[chain_end])

最后，利用scan(它包含theano_rng随机状态的更新规则)求出更新字典。对于PCD，同时需要更新Gibbs链状态的共享变量。

 # constructs the update dictionary        for gparam, param in zip(gparams, self.params):            # make sure that the learning rate is of the right dtype            updates[param] = param - gparam * T.cast(                lr,                dtype=theano.config.floatX            )        if persistent:            # Note that this works only if persistent is a shared variable            updates[persistent] = nh_samples[-1]            # pseudo-likelihood is a better proxy for PCD            monitoring_cost = self.get_pseudo_likelihood_cost(updates)        else:            # reconstruction cross-entropy is a better proxy for CD            monitoring_cost = self.get_reconstruction_cost(updates,                                                           pre_sigmoid_nvs[-1])        return monitoring_cost, updates

进度跟踪

RBMs的训练有很多技巧。考虑到公式(1)的配分函数，不能在训练过程中估计log似然函数log(P(x))。因此无法直接获得用于超参数选择的指标。

负样本的检验

训练过程中负样本的获取是可见的。通过训练，RBM定义的模型的越来越接近真实分布ptrain(x)。负样本是从训练集中获得的样本。显然，坏的超参数会被丢弃。

可见滤波检验

模型的滤波学习过程是可见的。各个单元的权重组成灰度图(变换为方阵)。过滤器在数据中选择最强的特征。特征在原始MNIST上并不明显，就想探针一样的存在。 training on natural images lead to Gabor like filters if trained in conjunction with a sparsity criteria.（这句没看懂）

似然函数的替代

可用其他函数来代替似然函数。使用PCD训练RBM时，可用伪似然函数代替。伪似然函数(Pseudo likehood，PL)的计算量更小，当然该算法假设各参数是相互独立的。因此：
$这里写图片描述$
xi代表除了i以外所有x的集合。log-PL是所有xi的log概率总和。MNIST的输入有784个维度，计算量相当大。因此，采用随机近似log-PL。
$这里写图片描述$
上式是求指定i的似然函数，其中N是可见单元的数量。对于二进制单元，引入x~i代表x的相反数(1->0, 0->1)。二进制RBM的log-PL写做：

通过RBM类的get_cost_updates函数得到代价和更新。需要注意的是，更新字典中增加了索引i。其中i∈{0,1,...,N}，遍历整个集合。
CD训练输入和重构之间(与降噪自编码相同)的交叉熵代价比伪log似然函数更可靠。下面给出计算伪似然函数的代码：

def get_pseudo_likelihood_cost(self, updates):        """Stochastic approximation to the pseudo-likelihood"""        # index of bit i in expression p(x_i | x_{\i})        bit_i_idx = theano.shared(value=0, name='bit_i_idx')        # binarize the input image by rounding to nearest integer        xi = T.round(self.input)        # calculate free energy for the given bit configuration        fe_xi = self.free_energy(xi)        # flip bit x_i of matrix xi and preserve all other bits x_{\i}        # Equivalent to xi[:,bit_i_idx] = 1-xi[:, bit_i_idx], but assigns        # the result to xi_flip, instead of working in place on xi.        xi_flip = T.set_subtensor(xi[:, bit_i_idx], 1 - xi[:, bit_i_idx])        # calculate free energy with bit flipped        fe_xi_flip = self.free_energy(xi_flip)        # equivalent to e^(-FE(x_i)) / (e^(-FE(x_i)) + e^(-FE(x_{\i})))        cost = T.mean(self.n_visible * T.log(T.nnet.sigmoid(fe_xi_flip -                                                            fe_xi)))        # increment bit_i_idx % number as part of updates        updates[bit_i_idx] = (bit_i_idx + 1) % self.n_visible        return cost

主循环

现在已经准备好了训练网络需要的所有元素。
在进行训练之前，读者应当熟悉函数* tile_raster_images*(见Plotting Samples and Filters)。因为RBM是生成模型，所以可以将样本以图的形式展现。同时，可以画出RBM的权重，更深刻的理解RBM的工作原理。值得注意的是，图并不是完整的工作原理，因为忽略了偏置，并将权重乘以常数(将权重转换到0-1之间)。
有了这些功能函数，就可以开始训练RBM，每次训练后将图保存本地。使用PCD训练RBM，可以得到效果更好的生成模型。([Tieleman08])

 # it is ok for a theano function to have no output    # the purpose of train_rbm is solely to update the RBM parameters    train_rbm = theano.function(        [index],        cost,        updates=updates,        givens={            x: train_set_x[index * batch_size: (index + 1) * batch_size]        },        name='train_rbm'    )    plotting_time = 0.    start_time = timeit.default_timer()    # go through training epochs    for epoch in range(training_epochs):        # go through the training set        mean_cost = []        for batch_index in range(n_train_batches):            mean_cost += [train_rbm(batch_index)]        print('Training epoch %d, cost is ' % epoch, numpy.mean(mean_cost))        # Plot filters after each training epoch        plotting_start = timeit.default_timer()        # Construct image from the weight matrix        image = Image.fromarray(            tile_raster_images(                X=rbm.W.get_value(borrow=True).T,                img_shape=(28, 28),                tile_shape=(10, 10),                tile_spacing=(1, 1)            )        )        image.save('filters_at_epoch_%i.png' % epoch)        plotting_stop = timeit.default_timer()        plotting_time += (plotting_stop - plotting_start)    end_time = timeit.default_timer()    pretraining_time = (end_time - start_time) - plotting_time    print ('Training took %f minutes' % (pretraining_time / 60.))

完成RBM训练后，使用gibbs_vhv函数执行Gibbs采样。我们不使用随机初始化，而是根据测试样本初始化Gibss链(也可以根据训练集合)加速收敛。使用Theano的scan进行1000次迭代，然后画一次图。

#################################    #     Sampling from the RBM     #    #################################    # find out the number of test samples    number_of_test_samples = test_set_x.get_value(borrow=True).shape[0]    # pick random test examples, with which to initialize the persistent chain    test_idx = rng.randint(number_of_test_samples - n_chains)    persistent_vis_chain = theano.shared(        numpy.asarray(            test_set_x.get_value(borrow=True)[test_idx:test_idx + n_chains],            dtype=theano.config.floatX        )    )

然后同时创建20条固定链进行采样。构造Theano函数实现一步Gibbs采样，并根据新的可见样本更新固定链的状态。迭代使用该函数，每1000步画一次图。

  plot_every = 1000    # define one step of Gibbs sampling (mf = mean-field) define a    # function that does `plot_every` steps before returning the    # sample for plotting    (        [            presig_hids,            hid_mfs,            hid_samples,            presig_vis,            vis_mfs,            vis_samples        ],        updates    ) = theano.scan(        rbm.gibbs_vhv,        outputs_info=[None, None, None, None, None, persistent_vis_chain],        n_steps=plot_every,        name="gibbs_vhv"    )    # add to updates the shared variable that takes care of our persistent    # chain :.    updates.update({persistent_vis_chain: vis_samples[-1]})    # construct the function that implements our persistent chain.    # we generate the "mean field" activations for plotting and the actual    # samples for reinitializing the state of our persistent chain    sample_fn = theano.function(        [],        [            vis_mfs[-1],            vis_samples[-1]        ],        updates=updates,        name='sample_fn'    )    # create a space to store the image for plotting ( we need to leave    # room for the tile_spacing as well)    image_data = numpy.zeros(        (29 * n_samples + 1, 29 * n_chains - 1),        dtype='uint8'    )    for idx in range(n_samples):        # generate `plot_every` intermediate samples that we discard,        # because successive samples in the chain are too correlated        vis_mf, vis_sample = sample_fn()        print(' ... plotting sample %d' % idx)        image_data[29 * idx:29 * idx + 28, :] = tile_raster_images(            X=vis_mf,            img_shape=(28, 28),            tile_shape=(1, n_chains),            tile_spacing=(1, 1)        )    # construct image    image = Image.fromarray(image_data)    image.save('samples.png')

结果

参数设置：PCD-15，学习率0.1，块大小20，迭代次数15。模型训练耗时122.466分钟。计算机配置：Intel Xeon E5430 @ 2.66GHz CPU，单线程GotoBLAS。
结果如下：

... loading dataTraining epoch 0, cost is  -90.6507246003Training epoch 1, cost is  -81.235857373Training epoch 2, cost is  -74.9120966945Training epoch 3, cost is  -73.0213216101Training epoch 4, cost is  -68.4098570497Training epoch 5, cost is  -63.2693021647Training epoch 6, cost is  -65.99578971Training epoch 7, cost is  -68.1236650015Training epoch 8, cost is  -68.3207365087Training epoch 9, cost is  -64.2949797113Training epoch 10, cost is  -61.5194867893Training epoch 11, cost is  -61.6539369402Training epoch 12, cost is  -63.5465278086Training epoch 13, cost is  -63.3787093527Training epoch 14, cost is  -62.755739271Training took 122.466000 minutes ... plotting sample  0 ... plotting sample  1 ... plotting sample  2 ... plotting sample  3 ... plotting sample  4 ... plotting sample  5 ... plotting sample  6 ... plotting sample  7 ... plotting sample  8 ... plotting sample  9

下图展示滤波器15次迭代后的效果:
15次迭代后滤波器效果
下图经过训练后RBM生成的样本。每行代表负粒子(粉分别从Gibbs链采样)，每行都进行了1000次Gibbs采样。

0 0