Theano-Deep Learning Tutorials 笔记:Getting Started
来源:互联网 发布:python redis 连接池 编辑:程序博客网 时间:2024/05/20 18:43
教程地址:http://www.deeplearning.net/tutorial/gettingstarted.html
Datasets
(1)mnist手写数字集:每张是一个784维向量(28*28),像素值为0到1的float,每张代表一个0到9的数,50000张training set,10000张validation set(验证集用于类似学习率,model size等参数的选择),10000张testing set。
For convenience we pickled the dataset to make it easier to use in python.
import cPickle, gzip, numpy# Load the datasetf = gzip.open('mnist.pkl.gz', 'rb')train_set, valid_set, test_set = cPickle.load(f)f.close()
Note:cPickle包的功能和用法与pickle包几乎完全相同,cPickle用C码的,性能好很多。
(2)We encourage you to store the dataset into shared variablesand access it based on the minibatch index, given a fixed and known batch size(即代码中的batch_size =500).
原因是:使用GPU时,不停地把数据拷贝到GPU效率不高,尽量使用Theano shared variables来提高性能;建议设6个不同共享变量,data:training set,validation set ,testing set 3个,label 3个。
def shared_dataset(data_xy): #Function that loads the dataset into shared variables data_x, data_y = data_xy shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX)) shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX)) # GPU上数据存储为float,y应该是int,所以return的时候用cast转成int, return shared_x, T.cast(shared_y, 'int32')test_set_x, test_set_y = shared_dataset(test_set)valid_set_x, valid_set_y = shared_dataset(valid_set)train_set_x, train_set_y = shared_dataset(train_set)batch_size = 500 # size of the minibatch# accessing the third minibatch of the training setdata = train_set_x[2 * batch_size: 3 * batch_size]label = train_set_y[2 * batch_size: 3 * batch_size]
如果出现内存溢出的情况:
you can store a sufficiently small chunk of your data (several minibatches) in a shared variable and use that during training. Once you got through the chunk, update the values it stores.
Learning a Classifier
Zero-One Loss
预测对的样本损失就是0,不对就是1,所有样本损失求和
If is the prediction function, then this loss can be written as:
where either is the training set (during training) or (to avoid biasing the evaluation of validation or test error). is the indicator function defined as:
In this tutorial, is defined as:
# zero_one_loss is a Theano variable representing a symbolic# expression of the zero one loss ; to get the actual value this# symbolic expression has to be compiled into a Theano function (see# the Theano tutorial for more details)zero_one_loss = T.sum(T.neq(T.argmax(p_y_given_x), y))
Negative Log-Likelihood Loss
原理类似最大似然估计。
minimize the negative log-likelihood (NLL), defined as:
# NLL is a symbolic variable ; to get the actual value of NLL, this symbolic# expression has to be compiled into a Theano function (see the Theano# tutorial for more details)NLL = -T.sum(T.log(p_y_given_x)[T.arange(y.shape[0]), y])# note on syntax: T.arange(y.shape[0]) is a vector of integers [0,1,2,...,len(y)].# Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the# elements M[0,a], M[1,b], ..., M[K,k] as a vector. Here, we use this# syntax to retrieve the log-probability of the correct labels, y.
Stochastic Gradient Descent
随机梯度下降是梯度下降的改进:
梯度下降求取所有样本损失的均值,每次迭代都对所有样本计算,计算量大,收敛慢;所以采用随机抽取小部分样本的方式(minibatch),每次计算minibatch的损失均值来调整参数。
minibatch的数量选择:选大了选小了都各有优劣。
An optimal is model-, dataset-, and hardware-dependent, and can be anywhere from 1 to maybe several hundreds.In the tutorial we set it to 20, but this choice is almost arbitrary (though harmless).
If you are training for a fixed number of epochs, the minibatch size becomes important because it controlsthe number of updates done to your parameters. Training the same model for 10 epochs using a batch size of 1 yields completely different results compared to training for the same 10 epochs but with a batchsize of 20.
# Minibatch Stochastic Gradient Descent# assume loss is a symbolic description of the loss function given# the symbolic variables params (shared variable), x_batch, y_batch;# compute gradient of loss with respect to paramsd_loss_wrt_params = T.grad(loss, params)# compile the MSGD step into a theano functionupdates = [(params, params - learning_rate * d_loss_wrt_params)]MSGD = theano.function([x_batch,y_batch], loss, updates=updates)for (x_batch, y_batch) in train_batches: # here x_batch and y_batch are elements of train_batches and # therefore numpy arrays; function MSGD also updates the params print('Current loss is ', MSGD(x_batch, y_batch)) if stopping_condition_is_met: return params
Regularization
机器学习中正则化随处可见,主要作用是防止过拟合。
直观的理解是:在损失函数中加入模型参数的范式,优化目标是使参数尽量小(接近0),这就是模型在原有基础上尽量简单,机器学习理论中,模型尽量简单就更不容易过拟合。(并不能一味追求简单,简单的模型并不一定泛化能力(generalization)就好)
L1 and L2 regularization
就是在损失函数后面加参数向量的1范数和2范数。
Formally, if our loss function is:
then the regularized loss will be:
or, in our case
where
p为1,2
正则化的详细介绍:
In principle, adding a regularization term to the loss will encouragesmooth network mappings in a neural network (bypenalizing large values of the parameters, whichdecreases the amount of nonlinearity that the network models). More intuitively, the two terms (NLL and) correspond tomodelling the data well (NLL) and having “simple” or “smooth” solutions (). Thus, minimizing the sum of both will, in theory, correspond to finding theright trade-off (即折衷考虑)between the fit to the training data and the “generality” of the solution that is found. To followOccam’s razor principle, this minimization should find us thesimplest solution (as measured by our simplicity criterion) that fits the training data.
Note that the fact that a solution is “simple” does not mean that it will generalize well. Empirically, it was found that performingsuch regularization in the context of neural networks helps with generalization, especially on small datasets. The code block below shows how to compute the loss in python when it contains both a L1 regularization term weighted by and L2 regularization term weighted by
# symbolic Theano variable that represents the L1 regularization termL1 = T.sum(abs(param))# symbolic Theano variable that represents the squared L2 termL2 = T.sum(param ** 2)# the lossloss = NLL + lambda_1 * L1 + lambda_2 * L2
Early-Stopping
Early-stopping通过测试模型在validation set的性能来防止过拟合。即当性能在测试集上不再显著提高甚至下降时,就停止优化迭代。
The choice of when to stop is a judgement call and a few heuristics(启发式) exist, but these tutorials will make use of a strategy based on a geometrically increasing amount ofpatience.(模拟一种耐心程度来决定何时停止)
# early-stopping parameterspatience = 5000 # look as this many examples regardlesspatience_increase = 2 # wait this much longer when a new best is # foundimprovement_threshold = 0.995 # a relative improvement of this much is # considered significantvalidation_frequency = min(n_train_batches, patience/2) # go through this many # minibatches before checking the network # on the validation set; in this case we # check every epoch 因为n_train_batches比patience/2小,每n_train_batches验证一次就是每epoch验证一次best_params = Nonebest_validation_loss = numpy.inftest_score = 0.start_time = time.clock()done_looping = Falseepoch = 0while (epoch < n_epochs) and (not done_looping): # Report "1" for first epoch, "n_epochs" for last epoch epoch = epoch + 1 for minibatch_index in xrange(n_train_batches): d_loss_wrt_params = ... # compute gradient params -= learning_rate * d_loss_wrt_params # gradient descent # iteration number. We want it to start at 0. iter = (epoch - 1) * n_train_batches + minibatch_index # note that if we do `iter % validation_frequency` it will be # true for iter = 0 which we do not want. We want it true for # iter = validation_frequency - 1. if (iter + 1) % validation_frequency == 0: this_validation_loss = ... # compute zero-one loss on validation set if this_validation_loss < best_validation_loss: # improve patience if loss improvement is good enough if this_validation_loss < best_validation_loss * improvement_threshold: patience = max(patience, iter * patience_increase) best_params = copy.deepcopy(params) best_validation_loss = this_validation_loss if patience <= iter: done_looping = True break# POSTCONDITION:# best_params refers to the best out-of-sample parameters observed during the optimization
If we run out of batches of training data before running out of patience, then we just go back to the beginningof the training set and repeat.
代码过程是:
(1)不停地更新参数,iter不停在涨
(2)每隔validation_frequency这么多次,就验证一下
(3)如果在验证集上的损失有明显下降且iter * patience_increase>patience,patience就增长:patience = max(patience, iter * patience_increase) 注意patience_increase为2,iter越大,patience增长越多。
(4)iter,patience各自都在涨,当iter>=patience就停止了。
Note:validation_frequency = min(n_train_batches, patience/2)
这句代码保证了,无论什么情况下,都能验证2次及以上:假设patience不增长,在iter=patience/2时可以验证一次,在iter=patience时又可以验证一记,所以至少两次。
Note:This algorithm could possibly be improved by using a test ofstatistical significance rather than the simple comparison, when deciding whether to increase the patience.
Theano/Python Tips
Loading and Saving Models
训练,测试了半天,需要把得到的最佳参数储存下来,matlab非常容易储存,python则使用cPickle
Read more about serialization in Theano, or Python’s pickling.
Pickle the numpy ndarrays from your shared variables
if your parameters are in shared variables w, v,u
, then your save command should look something like:
import cPicklesave_file = open('path', 'wb') # this will overwrite current contentsPickle.dump(w.get_value(borrow=True), save_file, -1) # the -1 is for HIGHEST_PROTOCOLcPickle.dump(v.get_value(borrow=True), save_file, -1) # .. and it triggers much more efficientcPickle.dump(u.get_value(borrow=True), save_file, -1) # .. storage than numpy's defaultsave_file.close()
Then later, you can load your data back like this:
save_file = open('path')w.set_value(cPickle.load(save_file), borrow=True)v.set_value(cPickle.load(save_file), borrow=True)u.set_value(cPickle.load(save_file), borrow=True)
Do not pickle your training or test functions for long-term storage
Theano functions are compatible with Python’s deepcopy and pickle mechanisms, but youshould not necessarily pickle a Theano function. If youupdate your Theano folder and one of the internal changes, then youmay not be able to un-pickle your model.
Plotting Intermediate Results
用PIL,matplotlib两个库实现可视化。
- Theano-Deep Learning Tutorials 笔记:Getting Started
- Theano-Deep Learning Tutorials 笔记:Multilayer Perceptron
- Theano-Deep Learning Tutorials 笔记:Convolutional Neural Networks (LeNet)
- Theano-Deep Learning Tutorials 笔记:Denoising Autoencoders (dA)
- Theano-Deep Learning Tutorials 笔记:Restricted Boltzmann Machines
- Theano-Deep Learning Tutorials 笔记:LSTM Networks for Sentiment Analysis
- Theano-Deep Learning Tutorials 笔记:Stacked Denoising Autoencoders (SdA)
- Theano-Deep Learning Tutorials 笔记:Classifying MNIST digits using Logistic Regression
- Theano-Deep Learning Tutorials 笔记:Recurrent Neural Networks with Word Embeddings
- Theano-Deep Learning Tutorials 笔记:Modeling and generating sequences of polyphonic music with the RNN
- Deep Learning Tutorials
- Deep Learning Tutorials
- Deep Learning Tutorials
- Deep Learning Tutorials 0.1
- Deep Learning Tutorials
- theano-deep learning
- deep learning tutorials 的翻译版+theano教程的翻译版本
- Tutorials:Getting Started with jQuery---1
- 如何成为数据分析师
- Java基础十大方向
- hive udaf开发入门和运行过程详解
- SQL Server DBA面试知识点(三)--SQL Server如何实现高可用性
- 计算机组成原理课程设计-基本模型机的设计与实现
- Theano-Deep Learning Tutorials 笔记:Getting Started
- 在64位ubuntu上安装tensorflow
- 在Eclipse中导入Android源码(成功)
- form表单传值
- 根据Request获取客户端IP
- 将Gradle项目导入ADT的方法
- 《VR入门系列教程》之2---VR头显
- iOS 保存图片或视频到PhotoLibrary
- STL 中 算法的遍历和排序