theano学习2MLP

来源：互联网发布：unity for linux 编辑：程序博客网时间：2024/06/06 00:49

代码来自于http://deeplearning.net/tutorial/mlp.html#mlp

一

模型和数据集背景介绍

MLP模型就是最简单的神经网络模型，数据用的是mnist（手写识别），正常的结果应该是错误率是1.6到1.7

二

代码结构

这个代码的特点是每一层神经网络都被抽象成一个类，整个的神经网络也是一个类，神经网络这个类包括了隐层类和输出层类。

Theano的代码当然使用的都是符号变量，写代码的过程中就好像在黑板上写了这个公式f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x)))

W^{(1)}表示第一层权重

b^{(1)}表示第一层偏置

以此类推

还有一点需要注意的是，这个代码都是批训练的

三

具体代码分析

首先是hiddenlayer类

class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out, W=None, b=None,
activation=T.tanh):

self.input = input
if W is None:
W_values = numpy.asarray(rng.uniform(
low=-numpy.sqrt(6. / (n_in + n_out)),
high=numpy.sqrt(6. / (n_in + n_out)),
size=(n_in, n_out)), dtype=theano.config.floatX)
if activation == theano.tensor.nnet.sigmoid:
W_values *= 4

W = theano.shared(value=W_values, name='W', borrow=True)

if b is None:
b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
b = theano.shared(value=b_values, name='b', borrow=True)

self.W = W
self.b = b

lin_output = T.dot(input, self.W) + self.b
self.output = (lin_output if activation is None
else activation(lin_output))
# parameters of the model
self.params = [self.W, self.b]

这部分代码初始化了W和b，W和b都是shared变量，实际上就可以把他们看作有初值的符号变量。传入的input是符号变量，因为是批训练的所以传入的input是一个矩阵，每一行代表一个训练样本

这个类也计算了输出，当然只是写出了一个计算公式，告诉了计算机输入和输出之间的关系

输出层类

class LogisticRegression(object):

def __init__(self, input, n_in, n_out):

# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(value=numpy.zeros((n_in, n_out),
dtype=theano.config.floatX),
name='W', borrow=True)
# initialize the baises b as a vector of n_out 0s
self.b = theano.shared(value=numpy.zeros((n_out,),
dtype=theano.config.floatX),
name='b', borrow=True)

# compute vector of class-membership probabilities in symbolic form
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

# compute prediction as class whose probability is maximal in
# symbolic form
self.y_pred = T.argmax(self.p_y_given_x, axis=1)

# parameters of the model
self.params = [self.W, self.b]

def negative_log_likelihood(self, y):

return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

def errors(self, y):

if y.ndim != self.y_pred.ndim:
raise TypeError('y should have the same shape as self.y_pred',
('y', target.type, 'y_pred', self.y_pred.type))
# check if y is of the correct datatype
if y.dtype.startswith('int'):
# the T.neq operator returns a vector of 0s and 1s, where 1
# represents a mistake in prediction
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()

构造函数初始化W，b，并且计算了输出以及最终分类结果

negative_log_likelihood函数计算了交叉熵的cost function

errors函数计算错误率

整个都是批处理的

MLP类

class MLP(object):

def __init__(self, rng, input, n_in, n_hidden, n_out):

self.hiddenLayer = HiddenLayer(rng=rng, input=input,
n_in=n_in, n_out=n_hidden,
activation=T.tanh)

self.logRegressionLayer = LogisticRegression(
input=self.hiddenLayer.output,
n_in=n_hidden,
n_out=n_out)

self.L1 = abs(self.hiddenLayer.W).sum() \
+ abs(self.logRegressionLayer.W).sum()

self.L2_sqr = (self.hiddenLayer.W ** 2).sum() \
+ (self.logRegressionLayer.W ** 2).sum()

self.negative_log_likelihood = self.logRegressionLayer.negative_log_likelihood
# same holds for the function computing the number of errors
self.errors = self.logRegressionLayer.errors

self.params = self.hiddenLayer.params + self.logRegressionLayer.params

MLP类中只有构造函数在构造函数中实例化了隐层类和输出层类

定义了L1和L2正则化项符号变量

negative_log_likelihood是最终的cost的符号变量

errors是错误率的符号变量

定义训练和测试函数

训练和测试都在下面的函数中

def test_mlp(learning_rate=0.01, L1_reg=0.00, L2_reg=0.0001, n_epochs=1000,
dataset='../data/mnist.pkl.gz', batch_size=20, n_hidden=500):

读取数据以及确定有多少个batch

train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x, test_set_y = datasets[2]

n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size
n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size

开始构建模型

print '... building the model'

index = T.lscalar()

x = T.matrix('x')

y = T.ivector('y')

上面三个都是符号变量

index表示处理的批次，x和y分别是输入和输出，因为是批处理，所以x是矩阵，y是向量

随机流

rng = numpy.random.RandomState(1234)

实例化MLP
classifier = MLP(rng=rng, input=x, n_in=28 * 28,
n_hidden=n_hidden, n_out=10)

cost符号变量的构建

cost = classifier.negative_log_likelihood(y) \
+ L1_reg * classifier.L1 \
+ L2_reg * classifier.L2_sqr

下面的函数分别返回测试集和验证集的错误率

输入是批次，输出是这个批次的错误率

test_model = theano.function(inputs=[index],
outputs=classifier.errors(y),
givens={
x: test_set_x[index * batch_size:(index + 1) * batch_size],
y: test_set_y[index * batch_size:(index + 1) * batch_size]})

validate_model = theano.function(inputs=[index],
outputs=classifier.errors(y),
givens={
x: valid_set_x[index * batch_size:(index + 1) * batch_size],
y: valid_set_y[index * batch_size:(index + 1) * batch_size]})

x,y的值根据jindex变化，实际上x，y和index也是一种函数关系，不同的index，x和y的值也不同

下面是计算cost对params的导数

gparams = []
for param in classifier.params:
gparam = T.grad(cost, param)
gparams.append(gparam)

updates = []

for param, gparam in zip(classifier.params, gparams):
updates.append((param, param - learning_rate * gparam))

有了导数gparam（符号变量），可以得到update的值，定义了如何更新权重

下面是最终的训练函数！

train_model = theano.function(inputs=[index], outputs=cost,
updates=updates,
givens={
x: train_set_x[index * batch_size:(index + 1) * batch_size],
y: train_set_y[index * batch_size:(index + 1) * batch_size]})

输入是index，输出是cost，更重要的是同时会更新参数

开始训练和测试

这里使用了如下机制机制进行训练：每次训练一个epoch后，都看看在验证集上的准确率，当目前验证集上的准确率比之前验证集上最好的结果有一定程度提高的话（提高超过阈值），我们认为模型显著提高了。但是如果长时间在验证集上的结果没有显著提高，那么就可以认为模型的参数已经调得差不多了，可以结束训练了。

这个试验中设定是如果训练的epoch是上次验证集显著变化时的epoch的两倍，那么停止训练。比如在第300个epoch上经过修改参数，模型显著提高，在300到600epoch中模型并没有显著提高，那么在600epoch的时候模型就可以停止训练了

训练就是两重循环

第一重是epoch

第二重是batch

while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
for minibatch_index in xrange(n_train_batches):

minibatch_avg_cost = train_model(minibatch_index)
# iteration number
iter = (epoch - 1) * n_train_batches + minibatch_index

if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i) for i
in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses)

print('epoch %i, minibatch %i/%i, validation error %f %%' %
(epoch, minibatch_index + 1, n_train_batches,
this_validation_loss * 100.))

# if we got the best validation score until now
if this_validation_loss < best_validation_loss:
#improve patience if loss improvement is good enough
if this_validation_loss < best_validation_loss * \
improvement_threshold:
patience = max(patience, iter * patience_increase)

best_validation_loss = this_validation_loss
best_iter = iter

# test it on the test set
test_losses = [test_model(i) for i
in xrange(n_test_batches)]
test_score = numpy.mean(test_losses)

print((' epoch %i, minibatch %i/%i, test error of '
'best model %f %%') %
(epoch, minibatch_index + 1, n_train_batches,
test_score * 100.))

if patience <= iter:
done_looping = True
break

这段计算了验证集的错误率

validation_losses = [validate_model(i) for i
in xrange(n_valid_batches)]
this_validation_loss = numpy.mean(validation_losses)

这段计算了测试集的错误率

test_losses = [test_model(i) for i
in xrange(n_test_batches)]
test_score = numpy.mean(test_losses)

0 0