TensorFlow(0.12.0) LeNet-2 CNN实战问题分析

来源：互联网发布：python thread sleep 编辑：程序博客网时间：2024/04/30 15:39

在网上看到一篇用TensorFlow实现CNN的文章：Playing with convolutions in TensorFlow, 于是赶紧下载了源代码学习一下，这里面使用了tf.contrib下的一些框架，还没学习过，不管三七二十一，先跑一下，结果悲剧，一堆warning最后还有一个Error，估计是我本机用的是最新的0.12版本的TensorFlow，和作者当时有变化导致的，没办法，还得继续研究代码。

原文代码不长，这里先贴一下：

import sklearn.metrics as metricsimport tensorflow as tffrom tensorflow.contrib import learnfrom utils import LOG_DIR, dense_layer, flatten_convolutionIMAGE_SIZE = 28mnist = learn.datasets.load_dataset('mnist')def lenet_layer(tensor_in, n_filters, kernel_size, pool_size, activation_fn=tf.nn.tanh,                padding='SAME'):    conv = tf.contrib.layers.convolution2d(tensor_in,                                           num_outputs=n_filters,                                           kernel_size=kernel_size,                                           activation_fn=activation_fn,                                           padding=padding)    pool = tf.nn.max_pool(conv, ksize=pool_size, strides=pool_size, padding=padding)    return pooldef lenet_model(X, y, image_size=(-1, IMAGE_SIZE, IMAGE_SIZE, 1), pool_size=(1, 2, 2, 1)):    y = tf.one_hot(y, 10, 1, 0)    X = tf.reshape(X, image_size)    with tf.variable_scope('layer1'):        """        Valid:         * input: (?, 28, 28, 1)         * filter: (5, 5, 1, 4)         * pool: (1, 2, 2, 1)         * output: (?, 12, 12, 4)        Same:         * input: (?, 28, 28, 1)         * filter: (5, 5, 1, 4)         * pool: (1, 2, 2, 1)         * output: (?, 14, 14, 4)        """        layer1 = lenet_layer(X, 4, [5, 5], pool_size)    with tf.variable_scope('layer2'):        """        VALID:         * input: (?, 12, 12, 4)         * filter: (5, 5, 4, 6)         * pool: (1, 2, 2, 1)         * output: (?, 4, 4, 6)         * flat_output: (?, 4 * 4 * 6)        SAME:         * input: (?, 14, 14, 4)         * filter: (5, 5, 4, 6)         * pool: (1, 2, 2, 1)         * output: (?, 7, 7, 6)         * flat_output: (?, 7 * 7 * 6)        """        layer2 = lenet_layer(layer1, 6, [5, 5], pool_size)        layer2_flat = flatten_convolution(layer2)    result = dense_layer(layer2_flat, [1024], activation_fn=tf.nn.tanh, keep_prob=0.5)    prediction, loss = tf.contrib.learn.models.logistic_regression_zero_init(result, y)    train_op = tf.contrib.layers.optimize_loss(        loss, tf.contrib.framework.get_global_step(), optimizer='Adagrad',        learning_rate=0.1)    return {'class': tf.argmax(prediction, 1), 'prob': prediction}, loss, train_opclassifier = learn.Estimator(model_fn=lenet_model, model_dir=LOG_DIR)classifier.fit(mnist.train.images, mnist.train.labels, steps=10000, batch_size=300,               monitors=[learn.monitors.ValidationMonitor(mnist.validation.images,                                                          mnist.validation.labels)])score = metrics.accuracy_score(mnist.test.labels, classifier.predict(mnist.test.images))print('Accuracy: {0:f}'.format(score))

告警信息较多，如下是一些分析和修改:

Estimator.fit相关

WARNING:tensorflow:From C:\Python35\lib\site-packages\tensorflow\contrib\learn\python\learn\monitors.py:710 in every_n_step_end.: calling BaseEstimator.evaluate (from tensorflow.contrib.learn.python.learn.estimators.estimator) with x is deprecated and will be removed after 2016-12-01.Instructions for updating:Estimator is decoupled from Scikit Learn interface by moving intoseparate class SKCompat. Arguments x, y and batch_size are onlyavailable in the SKCompat class, Estimator will only accept input_fn.Example conversion:  est = Estimator(...) -> est = SKCompat(Estimator(...))

fit方法用input_fn来代替了原来的x，如果要继续用x这种形式，需要SKCompat包装一下。看了input_fn的说明，感觉是和feed_fn一起用的，对于CNN，一般数据量都很大，需要分批输入给模型去训练，input_fn指定了全部数据，每次训练的时候由feed_fn从里面获取batch_size的数据来训练，但是让人感到疑惑的是fit方法的参数只有input_fn，没有feed_fn，反而是如果不指定input_fn,按传统的方式用x,y做参数，系统会用DataFeeder对象来进行包装，产生这两个方法。所以对于CNN，该用哪种参数形式好呢，不是很明白google的期望？当然也许是我理解不够，有知道的请指教。
所以这里先只是改成：

classifier = SKCompat(learn.Estimator(model_fn=lenet_model, model_dir=LOG_DIR))

Estimator.init的model_fn参数

从文档看，model_fn的输入参数已经发生了很大改变：

features--相当于原来的xlabels--相当于原来的ymode --模式：TRAIN/INFER/EVALparams--其他参数dict格式

因此model_fn需要进行相应地修改，只是把该方法的前面几行修改一下：

def lenet_model(features, labels, mode, params):    image_size = params['image_size']    pool_size = params['pool_size']    y = tf.one_hot(labels, 10, 1, 0)    X = tf.reshape(features, image_size)

同时用到的地方也要改一下：

classifier = SKCompat(learn.Estimator(model_fn=lenet_model, model_dir=LOG_DIR, params={'image_size': (-1, IMAGE_SIZE, IMAGE_SIZE, 1), 'pool_size': (1, 2, 2, 1)}))

错误信息处理

log如下：

Traceback (most recent call last):  File "D:/PythonProjects/tensorflow-convolution-models/lenet.py", line 75, in <module>    score = metrics.accuracy_score(mnist.test.labels, classifier.predict(mnist.test.images))  File "C:\Python35\lib\site-packages\sklearn\metrics\classification.py", line 172, in accuracy_score    y_type, y_true, y_pred = _check_targets(y_true, y_pred)  File "C:\Python35\lib\site-packages\sklearn\metrics\classification.py", line 72, in _check_targets    check_consistent_length(y_true, y_pred)  File "C:\Python35\lib\site-packages\sklearn\utils\validation.py", line 181, in check_consistent_length    " samples: %r" % [int(l) for l in lengths])ValueError: Found input variables with inconsistent numbers of samples: [10000, 2]

看起来是代码的最后一句话出错了，于是研究predict的输出是什么，按道理应该是[10000, 1]一个数组。但是从文档看，此方法的输出其实是一个可迭代的对象，网上有一个例子是：

 # Print out predictions  y = regressor.predict(input_fn=lambda: input_fn(prediction_set))  # .predict() returns an iterator; convert to a list and print predictions  predictions = list(itertools.islice(y, 6))  print("Predictions: {}".format(str(predictions)))

于是修改代码，debug了一下，看看输出结果：

predictions = classifier.predict(mnist.test.images)

结果是一个dict对象，包含两个key: class prob再查看代码，原来是model_fn返回的时候定义好的。SKCompat看起来自动做了迭代取值的动作。
把代码修改一下，错误没有了：

predictions = classifier.predict(mnist.test.images)score = metrics.accuracy_score(mnist.test.labels, predictions['class'])

还有一些Warning，暂时没解决，后面继续分析。

2 0