深度学习小白——tensorflow(四)CIFAR-10实例
来源:互联网 发布:python spark sql 编辑:程序博客网 时间:2024/06/06 06:40
最近看了https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10这个实例,里面有大量的tf库函数,看了好久才大概看个明白,想在此做个学习笔记,把一些函数用途以及整个CNN网络框架记录下来。
一、数据读取
因为之前写过,见http://blog.csdn.net/margretwg/article/details/70168256,这里就不重复了
二、模型建立
全局参数
import osimport reimport sysimport tarfileimport tensorflow as tfimport CIFAR10.CIFAR_input as inputFLAGS=tf.app.flags.FLAGS#模型参数tf.app.flags.DEFINE_integer('batch_size', 128, """Number of images to process in a batch.""")tf.app.flags.DEFINE_string('data_dir', 'E:/Python/tensorflow/CIFAR10', """Path to the CIFAR-10 data directory.""")tf.app.flags.DEFINE_boolean('use_fp16', False, """Train the model using fp16.""")#全局变量IMAGE_SIZE=input.IMAGE_SIZENUM_CLASSES=input.NUM_CLASSESNUM_EXAMPLES_PER_EPOCH_FOR_TRAIN=input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAINNUM_EXAMPLES_PER_EPOCH_FOR_EVAL=input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL#训练过程中的常量MOVING_AVERAGE_DECAY=0.9999NUM_EPOCH_PER_DECAY=350.0 #epochs after which learning rate decaysLEARNING_RATE_DECAY_FACTOR=0.1 #学习率衰减因子INITIAL_LEARNING_RATE=0.1
2.1 模型预测inference()
主要有:conv1-->pool1-->norm1-->conv2-->norm2-->pool2-->local3-->local4-->softmax_linear
该模块返回的是(128,10)的张量
def inference(images): """ 创建CIFAR-10模型 :param images: Images来自distorted_inputs()或inputs() :return: Logits神经元 """ #conv1 with tf.variable_scope('conv1')as scope: kernel=_variable_with_weight_decay('weights',shape=[5,5,3,64],stddev=5e-2,wd=0.0) conv=tf.nn.conv2d(images,kernel,[1,1,1,1],padding='SAME')#卷积操作 biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.0)) pre_activation=tf.nn.bias_add(conv,biases)# WX+b conv1=tf.nn.relu(pre_activation,name=scope.name) _activation_summary(conv1) #pool1 pool1=tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1') #norm1 norm1=tf.nn.lrn(pool1,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1') #conv2 with tf.variable_scope('conv2') as scope: kernel=_variable_with_weight_decay('weights',shape=[5,5,64,64],stddev=5e-2,wd=0.0) conv=tf.nn.conv2d(norm1,kernel,[1,1,1,1],padding='SAME') biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.1)) pre_activation=tf.nn.bias_add(conv,biases) conv2=tf.nn.relu(pre_activation,name=scope.name) _activation_summary(conv2) #norm2 norm2=tf.nn.lrn(conv2,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2') #pool2 pool2=tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2') #local3 with tf.variable_scope('local3')as scope: #Move everything into depth so we can perform a single matrix multiply reshape=tf.reshape(pool2,[FLAGS.batch_size,-1]) dim=reshape.get_shape()[1].value weights=_variable_with_weight_decay('weights',shape=[dim,384],stddev=0.04,wd=0.004) biases=_variable_on_cpu('biases',[384],tf.constant_initializer(0.1)) local3=tf.nn.relu(tf.matmul(reshape,weights)+biases,name=scope.name) _activation_summary(local3) #local4 with tf.variable_scope('local4') as scope: weights = _variable_with_weight_decay('weights', shape=[384, 192], stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1)) local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name) _activation_summary(local4) with tf.variable_scope('softmax_linear') as scope: weights=_variable_with_weight_decay('weights',[192,NUM_CLASSES],stddev=1/192.0,wd=0.0) biases=_variable_on_cpu('biases',[NUM_CLASSES],tf.constant_initializer(0.0)) softmax_linear=tf.add(tf.matmul(local4,weights),biases,name=scope.name) _activation_summary(softmax_linear) return softmax_linear #
其中,_variable_with_weight_decay()函数用于初始化weights,并且这里带一个衰减系数wd,用于计算权重衰减loss,加入到collection中,方便最后计算total_loss
def _variable_with_weight_decay(name,shape,stddev,wd): """ Helper to create an initialized Variable with weight decay 这里变量被初始化为截断正态分布 :param stddev:标准差 :param wd: add L2 loss weight decay multiplied by this float. If None, weight decay is not added for this Variable :return: Variable tensor """ dtype=tf.float16 if FLAGS.use_fp16 else tf.float32 var=_variable_on_cpu(name,shape,tf.truncated_normal_initializer(stddev=stddev,dtype=dtype)) if wd is not None: weight_decay=tf.multiply(tf.nn.l2_loss(var),wd,name='weight_loss') tf.add_to_collection('losses',weight_decay) return var
def _variable_on_cpu(name,shape,initializer): """ Helper to create a Variable stored oon CPU memory :param name: 变量名 :param shape: lists of ints :param initializer: 初始化变量值 :return: Variable Tensor """ with tf.device('/cpu:0'): dtype=tf.float16 if FLAGS.use_fp16 else tf.float32 var=tf.get_variable(name,shape,initializer=initializer,dtype=dtype) return var
[补1——collection]:
tensorflow 的collection提供了一个全局的存储机制,不会受到变量名生存空间的影响,一处保存,到处可取
(1)tf.Graph.add_to_collection(name,value) 向collection中存数据
collection不是set,所以一个'name'下可以存很多值, tf.add_to_collection(name,value)是给默认图使用的
(2)tf.Graph.get_collection(name,scope=None)
返回名字为name的list of values in the collection,scope不为None的时候,the resulting list is filtered to include only items whose name attribute matches using re.math,items without a name attribute are never returned.因此此例没有用这个参数,所以我具体也不太清楚这个scope是干嘛的··以后碰到再补充吧
2.2 算loss
对所有学习变量应用权重衰减损失。模型的目标函数是求交叉熵损失和所有权重衰减项的和
def loss(logits,labels): """ Add L2loss to all the trainable variables Add summary for "loss" and "loss/avg" :param logits: logits from inference() :param labels: labels from distorted_inputs or inputs() 1-D tensor of shape[batch_size] :return: loss tensor of type float """ #计算平均交叉熵损失对一个batch labels=tf.cast(labels,tf.int64) cross_entropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits,name="cross_entropy_per_exapmle") cross_entropy_mean=tf.reduce_mean(cross_entropy,name='cross_entropy') tf.add_to_collection('losses',cross_entropy_mean) #总共的损失应该是交叉熵损失加上权重衰减项(L2 LOSS) #权重的二范数值刚刚也加到了'losses'的collection里,这里的tf.add_n()就是将loss和刚刚的weights的二范数值对应相加 return tf.add_n(tf.get_collection('losses'),name='total_loss')
计算稀疏softmax交叉熵between labels和logits,该函数针对那种每一个样本对应一个离散的独立的分类任务,如CIFAR-10,也就是说soft classes 在这里是不允许的,label 向量必须提供一个单一具体的index对于每一行(样本)logits。对于soft softmax分类,用tf.nn.softmax_cross_entropy_with_logtis()
返回一个与‘labels’一样大小的tensor,里面是每个样本的loss
==================================================================================================
【补3】
tf.add_n(inputs,name=None)
Add all input tensors element-wise
返回与inputs里面元素大小一样的tensor
此处将collection 里面叫‘losses’的元素list全加起来,就是把刚算的平均loss和所有不同层的weights的二范数值加起来得到total_loss
2.3 更新参数/train_op
添加一些操作使得目标函数最小化,这些操作包括计算梯度、更新学习变量, 函数最终会返回一个用以对一批图像执行所有计算的操作步骤(train_op),以便训练并更新模型。
def train(total_loss,global_step): """ Train CIFAR-10 model 设立优化器,并对于所有可训练变量添加滑动平均 :param total_loss:Total loss from loss() :param global_step:integer Varibale conunting the number of trainnig steps processed :return: train_op:op for training """ #Variables that affect learning rate num_batches_per_epoch=NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/FLAGS.batch_size decay_steps=int(num_batches_per_epoch* NUM_EPOCH_PER_DECAY) #decay the learning rate exponentially based on the number of steps #随着迭代过程衰减学习率 lr=tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True) tf.summary.scalar('learning_rate',lr) #滑动平均 of all losses and associated summaries loss_averages_op=_add_loss_summaries(total_loss) #计算梯度 with tf.control_dependencies([loss_averages_op]): opt=tf.train.GradientDescentOptimizer(lr) grads=opt.compute_gradients(total_loss) #apply gradients apply_gradient_op=opt.apply_gradients(grads,global_step=global_step) #This is the second part of `minimize()`. It returns an `Operation` that applies gradients. #add histogram for grad,var in grads: if grad is not None: tf.summary.histogram(var.op.name+'/gradients',grad) # Track the moving averages of all trainable variables. variable_averages = tf.train.ExponentialMovingAverage( MOVING_AVERAGE_DECAY, global_step) variables_averages_op = variable_averages.apply(tf.trainable_variables()) with tf.control_dependencies([apply_gradient_op, variables_averages_op]): train_op = tf.no_op(name='train') return train_op
先设立学习率,此处学习率是随着迭代过程衰减的
【补4】tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)
用于使学习率指数衰减,公式为:decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)
参数:
learning_rate:初始学习率 float
global_step:必须为负值用于衰减计算,这里是整数变量,计算着已执行的训练步骤数
decay_steps:必须为正值,此处为每次迭代经历的batch数*每次衰减要经过的迭代数
staircase: 如果true,则衰减的学习率为离散的整数
返回衰减的学习率的值,然后用tf.summary.scalar()添加一个标量‘learning_rate’以便观察
============================================================================================
_add_loss_summaries()
将所有loss计算滑动平均后的值存储到collection 'losses'里,并依次以scalar存入summary中
返回一个op用于得到losses的滑动平均值
def _add_loss_summaries(total_loss): """ Add summaries for losses in CIFAR-10 model Generates moving average for all losses and associated summaries of visualizing the performnce of the network :param total_loss:Total loss from loss() :return: loss_averages_op: op for generating moving averages of losses """ #计算moving average of all individual losses and the total loss #MovingAverage为滑动平均,计算方法:对于一个给定的数列,首先设定一个固定的值k,然后分别计算第1项到第k项,第2项到第k+1项,第3项到第k+2项的平均值,依次类推。 loss_averages=tf.train.ExponentialMovingAverage(0.9,name='avg') losses=tf.get_collection('losses') loss_averages_op=loss_averages.apply(losses+[total_loss]) #给每一个单独的losses和total loss attach a scalar summary;do the same #for the averaged version of the losses for l in losses+[total_loss]: tf.summary.scalar(l.op.name+'(raw)',l) tf.summary.scalar(l.op.name,loss_averages.average(l)) return loss_averages_op
【补6】loss_averages= tf.train.ExponentialMovingAverage()
这是一个创立了一个ExponentialMovingAverage类对象
当训练一个模型时,保存已训参数的滑动平均值更好,可以得到更好的结果。此处主要使用了其apply()方法,所以主要介绍这个方法
- __init__(self, decay, num_updates=None, zero_debias=False, name='ExponentialMovingAverage')
- apply(self, var_list=None)
Maintains moving averages of variables.方法添加了一个关于已训变量的影子副本,而且添加了能保留变量的滑动平均值在副本中的op,这个op通常在每一步训练步骤后
返回一个op,注意apply()方法可以被调用多次,每次有不同的lists of variables
【补7】 with tf.control_dependencies(control_inputs):
control_inputs: list of ops 或者tensors对象,而且这个list里的对象必须在context定义的那些操作之前完成,形成依赖关系!
此例中得到滑动平均op(loss_averages_op)后,与梯度下降形成依赖关系,先执行滑动平均更新loss,然后再以这个loss为目标函执行梯度下降。
【补8】tf.train.GradientDescentOptimizer()
这里梯度下降也使用到了一个GradientDescentOptimizer类对象,用到的方法有
- __init__(self, learning_rate, use_locking=False, name='GradientDescent')
- compute_gradients(self, loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)
- apply_gradients(self, grads_and_vars, global_step=None, name=None)
from datetime import datetimeimport timeimport tensorflow as tffrom CIFAR10 import model_buildFLAGS=tf.app.flags.FLAGStf.app.flags.DEFINE_string('train_dir','E:/Python/tensorflow/CIFAR10',"""Directorywhere to write event logs and checkpoint""")tf.app.flags.DEFINE_integer('max_steps',100000,"""Number of batches to run.""")tf.app.flags.DEFINE_boolean('log_device_placement', False, """Whether to log device placement.""")tf.app.flags.DEFINE_integer('log_frequency', 10, """How often to log results to the console.""")
train函数
def train1(): with tf.Graph().as_default(): global_step=tf.contrib.framework.get_or_create_global_step() #use the default graph in the process in the context #global_step=tf.Variable(0,name='global_step',trainable=False) #获取图像和标签 images,labels=model_build.distorted_inputs() #创建一个图来计算神经元预测值,前向传播 logits=model_build.inference(images) #计算loss loss=model_build.loss(logits,labels) #建一个图来来训练一个Batch的样本然后更新参数 train_op=model_build.train(loss,global_step) #专门定义_LoggerHook类,在mon_sess这个对话中注册 class _LoggerHook(tf.train.SessionRunHook): """ Logs loss and runtime. """ def begin(self): self._step=-1 self._start_time=time.time() def before_run(self,run_context): #Called before each call to run() #返回‘SessionRunArgs’对象意味着ops或者tensors去加入即将到来的run(), #这些ops和tensor回合之前的一起送入run() #run()的参数里还可以包括你要feed的东西 #run_context参数包括了即将到来的run()的信息:原始的op和tensors #当该函数运行完,图就确定了,就不能再加op了 self._step+=1 return tf.train.SessionRunArgs(loss) #Asks for loss value def after_run(self,run_context,run_values): #Called after eah call to run() #'run value' argument contains results of requested ops/tensors by'before_run' #the 'run_context' argument 与送入before_run的是一样的 #'run_context.request_stop()'can be called to stop the iteration if self._step % FLAGS.log_frequency==0:#当取了FLAGS.log_frequency个batches的时候 current_time=time.time() duration=current_time-self._start_time self._start_time=current_time loss_value=run_values.results examples_per_sec=FLAGS.log_frequency* FLAGS.batch_size/duration sec_per_barch=float(duration/FLAGS.log_frequency) format_str=('%s:step %d,loss=%.2f (%.1f examples/sec; %.3f' 'sec/batch') print(format_str %(datetime.now(),self._step,loss_value,examples_per_sec,sec_per_barch)) with tf.train.MonitoredTrainingSession( #set proper session intializer/restorer,it also creates hooks related to #checkpoint and summary saving checkpoint_dir=FLAGS.train_dir, hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),tf.train.NanTensorHook(loss), _LoggerHook()], config=tf.ConfigProto( log_device_placement=FLAGS.log_device_placement)) as mon_sess: while not mon_sess.should_stop(): mon_sess.run(train_op) #此处表示在停止条件到达之前,循环运行train_op,更新模型参数def main(argv=None): train1()if __name__=='__main__': tf.app.run(main=main)
- after_create_session(self,session,coord)
session: A TensorFlow Session that has been created.
coord: A Coordinator object which keeps track of all threads.
- after_run(self, run_context, run_values)
run_context: A `SessionRunContext` object.
run_values: A SessionRunValues object.
- before_run(self, run_context)
- begin(self)
master: `String` the TensorFlow master to use.
scaffold: A `Scaffold` used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.
hooks: Optional list of `SessionRunHook` objects.
- tf.train.StopAtStepHook()为会话悬停类对象,它的作用是监视并提出停在特定步骤的请求
- tf.train.NanTensorHook()会话悬停类对象
- _LoggerHook()就是我们自己定义的会话悬停对象用来执行loss的计算,时间的记录,打印等等
chief_only_hooks: list of `SessionRunHook` objects. Activate these hooks if `is_chief==True`, ignore otherwise.
save_summaries_steps: summary写入频率
config: an instance of `tf.ConfigProto` proto used to configure the session. It's the `config` argument of constructor of `tf.Session`.
Returns:
A `MonitoredSession` object.
Filling queue with 20000 CIFAR images before starting to train.This will take a few minutes.2017-04-16 20:04:10.826531:step 0,loss=6.39 (25.3 examples/sec; 5.056sec/batch2017-04-16 20:04:36.614833:step 10,loss=6.22 (49.6 examples/sec; 2.579sec/batch2017-04-16 20:05:01.745663:step 20,loss=6.10 (50.9 examples/sec; 2.513sec/batch2017-04-16 20:05:27.068144:step 30,loss=6.01 (50.5 examples/sec; 2.532sec/batch因为我使用CPU,所以跟官网指南上给的GPU版本速度差别很大
- 深度学习小白——tensorflow(四)CIFAR-10实例
- Tensorflow+CIFAR-10实例讲解
- 深度学习 :CIFAR-10
- TensorFlow学习——CIFAR-10(一)
- TensorFlow学习——CIFAR-10(二)代码实现
- 深度学习数据集——CIFAR-10
- TensorFlow深度学习进阶教程:TensorFlow实现CIFAR-10数据集测试的卷积神经网络
- 深度学习小白——TensorFlow(一)简介
- 深度学习小白——Tensorflow(二)卷积
- 深度学习小白——Tensorflow(三) 读取数据
- 深度学习小白——Alexnet之tensorflow实现
- 深度学习Caffe平台实例——CIFAR-10数据集在caffe平台上模型训练及实例使用模型进行预测
- Tensorflow深度学习之二十:CIFAR-10数据集介绍
- Tensorflow深度学习之二十一:LeNet的实现(CIFAR-10数据集)
- Tensorflow深度学习之二十二:AlexNet的实现(CIFAR-10数据集)
- TensorFlow学习——CIFAR-10(python实现数据可视化)
- TensorFlow学习笔记(十): CIFAR-10
- TensorFlow-CNN CIFAR-10数据集 学习
- ATM模拟
- vim 常用命令(一)
- Word2016中如何设置前几页不要页眉?文档快速‘另起一页’(毕业论文中常见问题)
- 关于Scroller的一点问题
- Ecshop模板开发(二十九):商品列表、排序、分页显示
- 深度学习小白——tensorflow(四)CIFAR-10实例
- c++父类和子类构造函数和析构函数执行顺序
- Math.round()方法
- 蓝桥杯 报时助手
- Python 代码调试技巧使用 PyDev 进行调试
- chimerge数据离散化算法
- Console 控制台 字符串输入的格式问题
- JAVA初识
- 重构工具安装 Jalopy,CheckStyle,FindBugs,JDeodorant,Stench Blossom