RestrictedBoltzmannMachines.py源码剖析
来源:互联网 发布:luaeditor mac 编辑:程序博客网 时间:2024/06/09 17:58
RestrictedBoltzmannMachines.py源码剖析
这篇博客是承接我的上一篇Restricted Boltzmann Machines的一个实现RBM的例子,代码详见我的github:RestrictedBoltzmannMachines.py.
RBM.class
We construct an RBM class. The parameters of the network can either be initialized by the constructor or can be passed as arguments. This option is useful when an RBM is used as the building block of a deep network, in which case the weight matrix and the hidden layer bias is shared with the corresponding sigmoidal layer of an MLP network.
class RBM(object): """Restricted Bolzmann Machines. (RBM)""" def __init__(self, input=None, n_visible=28 * 28, n_hidden=500, w=None, hbias=None, vbias=None, numpy_rng=None, theano_rng=None): """ RBM constructor. Define the parameters of the model along with basic operations for inferring hidden from visible (and vice-versa), as well as for performing CD updates. :param input: None for standalone RBMs or symbolic variable if RBM is part of a larger graph. :param n_visible: number of visible units :param n_hidden: number of hidden units :param w: None for standalone RBMs or symbolic variable pointing to a shared weights matrix in case RBM is part of a DBN network; in a DBN , the weights are shared between RBMs and layers of a MLP :param hbias: None for standalone RBMs or symbolic variable pointing to a shared hidden units bias vector in case RBM is part of a different network :param vbias: None for standalone RBMs or a symbolic variable pointing to a shared visible units bias :param numpy_rng: :param theano_rng: """ self.n_visible = n_visible self.n_hidden = n_hidden # create a number generator if numpy_rng is None: numpy_rng = numpy.random.RandomState(1234) if theano_rng is None: theano_rng = RandomStreams(numpy_rng.randint(2 ** 30)) # initial w and hbias, vbias, create theano shared variables and bias if w is None: initial_w = numpy.asarray(numpy_rng.uniform( low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)), high=4 * numpy.sqrt(6. / (n_hidden + n_visible)), size=(n_visible, n_hidden) ), dtype=theano.config.floatX) w = theano.shared(value=initial_w, name='w', borrow=True) if hbias is None: hbias = theano.shared(value=numpy.zeros(n_hidden, dtype=theano.config.floatX), name='hbias', borrow=True) if vbias is None: vbias = theano.shared(value=numpy.zeros(n_visible, dtype=theano.config.floatX), name='vbias', borrow=True) # initialize input layer for standalone RBMs or layer0 for DBN self.input = input if not input: self.input = T.matrix('input') self.w = w self.vbias = vbias self.hbias = hbias self.theano_rng = theano_rng self.params = [self.w, self.hbias, self.vbias]
Next step is to define functions which construct the symbolic graph associated with Eqs.(12)-(13),the code is as follows:
def propup(self, vis): """ This function propagates the visible units activation upwards to the hidden units. :param vis: :return: """ pre_sigmoid_activation = T.dot(vis, self.w) + self.hbias return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_h_given_v(self, v0_example): """ This function infers state of hidden units given visible units. :param v0_example: :return: """ # p(h|v), compute the activation of the hidden units given a sample of the visibles. pre_sigmoid_h1, h1_mean = self.propup(v0_example) # get a sample of the hiddens given their activation # 利用p(h|v)采样h h1_sample = self.theano_rng.binomial(size=h1_mean.shape, n=1, p=h1_mean, dtype=theano.config.floatX) return [pre_sigmoid_h1, h1_mean, h1_sample]
def propdown(self, hid): """ This function propagates the hidden units activation down to the visible units. :param hid: :return: """ pre_sigmoid_activation = T.dot(hid, self.w.T) + self.vbias return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_v_given_h(self, h0_sample): """ This function infers state of visible units given hidden units. :param h0_sample: :return: """ # p(v|h), compute the activation of the visible units given the hidden sample pre_sigmoid_v1, v1_mean = self.propdown(h0_sample) # get a sample of the visible given their activation v1_example = self.theano_rng.binomial(size=v1_mean.shape, n=1, p=v1_mean, dtype=theano.config.floatX) return [pre_sigmoid_v1, v1_mean, v1_example]
We can then use these functions to define the symbolic graph for a Gibbs sampling step. We define two functions:
- gibbs_vhv which performs a step of Gibbs sampling starting from the visible units. As we shall see, this will be useful for sampling from the RBM.
- gibbs_hvh which performs a step of Gibbs sampling starting from the hidden units. This function will be useful for performing CD and PCD updates.
The code is as follows:
def gibbs_hvh(self, h0_sample): """ This function implements one step of Gibbs sampling, starting from the hidden state. :param h0_sample: :return: """ pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h0_sample) pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v1_sample) return [pre_sigmoid_v1, v1_mean, v1_sample, pre_sigmoid_h1, h1_mean, h1_sample]
def gibbs_vhv(self, v0_sample): """ This function implements one step of Gibbs sampling, starting from the visible state. :param v0_sample: :return: """ pre_sigmoid_h1, h1_mean, h1_sample = self.sample_h_given_v(v0_sample) pre_sigmoid_v1, v1_mean, v1_sample = self.sample_v_given_h(h1_sample) return [pre_sigmoid_h1, h1_mean, h1_sample, pre_sigmoid_v1, v1_mean, v1_sample]
The class also has a function that computes the free energy of the model, needed for computing the gradient of the parameters (see Eq. (14) (6)). Note that we also return the pre-sigmoid.
The code is as follows:
def free_energy(self, v_sample): """ Function to compute the free energy. :param v_sample: :return: """ wx_b = T.dot(v_sample, self.w) + self.hbias vbias_term = T.dot(v_sample, self.vbias) hidden_term = T.sum(T.log(1 + T.exp(wx_b)), axis=1) return -hidden_term - vbias_term
We then add a get_cost_updates method, whose purpose is to generate the symbolic gradients for CD-k and PCD-k updates.
def get_cost_updates(self, learning_rate=0.1, persistent=None, k=1): """ This function implements one step of CD-k or PCD-k. :param learning_rate: learning rate used to train the RBM. :param persistent: None for CD. For PCD, shared variables containing old state of Gibbs chain. This must be a shared variable of size (batch_size, number of hidden units). :param k: number of Gibbs steps to do in CD-k / PCD-k :return: a proxy for the cost and the updates dictionary.The dictionary contains the update rules for weights and biases but also an update of the shared variable used to store the persistent chain, if one is used. """ # compute positive phase pre_sigmoid_ph, ph_mean, ph_sample = self.sample_h_given_v(self.input) # decide how to initialize persistent chain: for CD, we use the newly generate hidden sample, # for PCD, we initialize from the old state of the chain if persistent is None: chain_start = ph_sample else: chain_start = persistent # perform actual negative phase # use "scan" to scan over the function that implements on gibbs step k times # the scan will return the entire Gibbs chain # the None are place holders, saying that chain_start is the initial state corresponding to the 6th ([pre_sigmoid_nvs, nv_means, nv_samples, pre_sigmoid_nhs, nh_means, nh_samples], updates) = \ theano.scan(self.gibbs_hvh, outputs_info=[None, None, None, None, None, chain_start], n_steps=k, name="gibbs_hvh") # determine gradients on RBM parameters # note that we only need the sample at the end of the chain chain_end = nv_samples[-1] cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end)) # we must not compute the gradient through the gibbs sampling gparams = T.grad(cost, self.params, consider_constant=[chain_end]) # construct the update dictionary for gparam, param in zip(gparams, self.params): updates[param] = param - gparam * T.cast(learning_rate, theano.config.floatX) if persistent: updates[persistent] = nh_samples[-1] # pseudo-likelihood cross-entropy is a better proxy for PCD monitoring_cost = self.get_pseudo_likelihood_cost(updates) else: # reconstruction cross-entropy is a better proxy for CD monitoring_cost = self.get_reconstruction_cost(updates, pre_sigmoid_nvs[-1]) return monitoring_cost, updates
如果使用CD-k sampling,那么monitoring_cost 直接使用reconstruction cross-entropy,代码如下:
def get_reconstruction_cost(self, updates, pre_sigmoid_nv): """ Approximation to the reconstruction error. :param updates: :param pre_sigmoid_nv: :return: """ cross_entropy = T.mean(T.sum(self.input * T.log(T.nnet.sigmoid(pre_sigmoid_nv)) + (1 - self.input) * T.log(1 - T.nnet.sigmoid(pre_sigmoid_nv)), axis=1)) return cross_entropy
注意这里就是一个pre_sigmoid_nv被返回的原因,是为了theano能够计算优化。
如果使用PCD-k sampling,那么monitoring_cost使用pseudo-likelihood。
Pseudo-likelihood (PL) is much less expensive to compute, as it assumes that all bits are independent. Therefore,
Here
where the expectation is taken over the uniform random choice of index i, and N is the number of visible units. In order to work with binary units, we further introduce the notation
def get_pseudo_likelihood_cost(self, updates): """ Stochastic approximation to the pseudo-likelihood. :param updates: :return: """ # index of bit i in expression p(xi|x-i) bit_i_idx = theano.shared(value=0, name='bit_i_idx') # binarize the input image by rounding to nearest integer xi = T.round(self.input) # calculate free energy for the given bit configuration fe_xi = self.free_energy(xi) # flip bit x_i of matrix xi and preserve all other bits x-i xi_flip = T.set_subtensor(xi[:, bit_i_idx], 1 - xi[:, bit_i_idx]) # calculate free energy for given bit flipped fe_xi_flip = self.free_energy(xi_flip) cost = T.mean(self.n_visible * T.log(T.nnet.sigmoid(fe_xi_flip - fe_xi))) # increment bit_i_idx % number as part of updates updates[bit_i_idx] = (bit_i_idx + 1) % self.n_visible return cost
Main Loop
We now have all the necessary ingredients to start training our network.
Before going over the training loop however, the reader should familiarize himself with the function tile_raster_images (see tile_raster_images). Since RBMs are generative models, we are interested in sampling from them and plotting/visualizing these samples. We also want to visualize the filters (weights) learnt by the RBM, to gain insights into what the RBM is actually doing. Bear in mind however, that this does not provide the entire story, since we neglect the biases and plot the weights up to a multiplicative constant (weights are converted to values between 0 and 1).
Having these utility functions, we can start training the RBM and plot/save the filters after each training epoch. We train the RBM using PCD, as it has been shown to lead to a better generative model ([Tieleman08]).
def test_rbm(learning_rate=0.1, training_epochs=15, dataset='mnist.pkl.gz', batch_size=20, n_chains=20, n_samples=10, output_folder='rbm_plots', n_hidden=600): """ Demonstrate how to train and afterwards sample from it using Theano. :param learning_rate: :param training_epochs: :param dataset: :param batch_size: :param n_chains: number of parallel Gibbs chains used to train the RBM :param n_samples: number of samples to plot for each chain :param output_folder: :param n_hidden: :return: """ datasets = load_data(dataset) train_set_x, train_set_y = datasets[0] test_set_x, test_set_y = datasets[2] n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size # allocate symbolic variables for the data index = T.lscalar('index') # index to a minibatch x = T.matrix('x') # the data is presented as rasterized images rng = numpy.random.RandomState(123) theano_rng = RandomStreams(rng.randint(2 ** 30)) # initialize storage for the persistent chain (state = hidden layer of chain) persistent_chain = theano.shared(numpy.zeros((batch_size, n_hidden), dtype=theano.config.floatX), borrow=True) # construct the RBM class rbm = RBM(input=x, n_visible=28 * 28, n_hidden=n_hidden, numpy_rng=rng, theano_rng=theano_rng) # get the cost and gradient corresponding to one step of CD-15 cost, updates = rbm.get_cost_updates(learning_rate=learning_rate, persistent=persistent_chain, k=15) """ Train the RBM """ if not os.path.isdir(output_folder): os.makedirs(output_folder) os.chdir(output_folder) train_rbm = theano.function(inputs=[index], outputs=cost, updates=updates, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size] }, name='train_rbm') plotting_time = 0. start_time = timeit.default_timer() # go through training epochs for epoch in range(training_epochs): mean_cost = [] for batch_index in range(n_train_batches): mean_cost += [train_rbm(batch_index)] print "Training epoch %d, cost is " % epoch, numpy.mean(mean_cost) # plot filters after each training epoch plotting_start = timeit.default_timer() # construct image from the weight matrix image = Image.fromarray(tile_raster_images(X=rbm.w.get_value(borrow=True).T, img_shape=(28, 28), tile_shape=(10, 10), tile_spacing=(1, 1))) image.save('filters_at_epoch_%i.png' % epoch) plotting_stop = timeit.default_timer() plotting_time += (plotting_stop - plotting_start) end_time = timeit.default_timer() pretraining_time = (end_time - start_time) - plotting_time print "Training took %f minutes " % (pretraining_time / 60.)
visualize the filters (weights) learnt by the RBM
经过15次迭代后我们得到了‘filters_at_epoch_14.png’,如下图所示:
Sampling from the RBM
Once the RBM is trained, we can then use the gibbs_vhv function to implement the Gibbs chain required for sampling. We initialize the Gibbs chain starting from test examples (although we could as well pick it from the training set) in order to speed up convergence and avoid problems with random initialization. We again use Theano’s scan op to do 1000 steps before each plotting.
""" Sampling from the RBM """ # find out the number of test samples number_of_test_samples = test_set_x.get_value(borrow=True).shape[0] # pick random test examples, with which to initialize the persistent chain test_idx = rng.randint(number_of_test_samples - n_chains) persistent_vis_chain = theano.shared(numpy.asarray(test_set_x.get_value(borrow=True)[test_idx: test_idx + n_chains], dtype=theano.config.floatX))
Next we create the 20 persistent chains in parallel to get our samples. To do so, we compile a theano function which performs one Gibbs step and updates the state of the persistent chain with the new visible sample. We apply this function iteratively for a large number of steps, plotting the samples at every 1000 steps.
plot_every = 1000 # compile a theano function which performs one Gibbs step and updates the state of the persistent # chain with the new visible sample. # define one step of Gibbs sampling (mf = mean - field) # define a function that does 'plot_every' steps before returning the sample for plotting # gibbs_vhv return: [pre_sigmoid_h1, h1_mean, h1_sample, pre_sigmoid_v1, v1_mean, v1_sample] ([presig_hids, hid_mfs, hid_samples, presig_vis, vis_mfs, vis_samples], updates) = \ theano.scan(rbm.gibbs_vhv, outputs_info=[None, None, None, None, None, persistent_vis_chain], n_steps=plot_every, name='gibbs_vhv') # add to updates the shared variable that takes care of our persistent chain updates.update({persistent_vis_chain: vis_samples[-1]}) # construct the function that implements our persistent chain. # we generate the "mean field" activations for plotting and the actual # samples for reinitializing the state of our persistent chain sample_fn = theano.function(inputs=[], outputs=[vis_mfs[-1], vis_samples[-1]], updates=updates, name='sample_fn') # create a space to store the image for plotting (we need to leave room for the tile_spacing as well) image_data = numpy.zeros((29 * n_samples + 1, 29 * n_chains - 1), dtype='uint8') # vis_mfs(n_chains, nv) for idx in range(n_samples): # generate `plot_every` intermediate samples that we discard, # because successive samples in the chain are too correlated vis_mf, vis_sample = sample_fn() print(' ... plotting sample %d' % idx) image_data[29 * idx:29 * idx + 28, :] = tile_raster_images( X=vis_mf, img_shape=(28, 28), tile_shape=(1, n_chains), tile_spacing=(1, 1) ) # construct image image = Image.fromarray(image_data) image.save('samples.png') # end-snippet-7 os.chdir('../')
将每次sample之后的vis_mf(n_chains,n_v)可视化显示,思路如下:
结果如下:
- RestrictedBoltzmannMachines.py源码剖析
- urllister.py源码分析
- 《stl源码剖析》剖析
- 【源码】ArrayList源码剖析
- 【源码】LinkedList源码剖析
- 【源码】HashMap源码剖析
- 【源码】HashMap源码剖析
- 【源码】Hashtable源码剖析
- 【源码】LinkedHashMap源码剖析
- 【源码】LruCache源码剖析
- 【源码】TreeMap源码剖析
- 【源码】LinkedHashMap源码剖析
- 【源码】LruCache源码剖析
- 【源码】Hashtable源码剖析
- Web.py源码分析--总览
- web.py源码分析: application
- web.py源码分析: application
- web.py源码分析: application
- 网络是怎样连接的学习笔记(五)
- Eclipse+Jrebel实现web项目热部署
- Mybatis 框架使用的最核心内容(二):mapper.xml中常用的标签详解
- Spring10种常见异常解决方法
- 深入理解Java:SimpleDateFormat安全的时间格式化
- RestrictedBoltzmannMachines.py源码剖析
- 揭秘:华为竟然搞了3年自动驾驶!
- 对话 | 微软人工智能CTO古卓伦:联合创企开发AI芯片 解密四大AI集群
- [kernel 启动流程]系列
- 表头固定,内容可滚动表格的3种实现方法
- 人工智能如何从娃娃抓起 听一位大学教授的心得
- 阿里将推汽车自动贩售机 首汽月底上线自动驾驶网约车
- mysql 中游标使用
- 太阳系