TensorFlow学习笔记（一）

来源：互联网发布：cms管理系统是什么编辑：程序博客网时间：2024/05/21 06:31

最近在使用TensorFlow和TFLearn实现强化学习中的DDPG算法，使用这两个库的过程中有不少疑惑，现在将其记录在此，方便以后查看。

参考文档：TensorFlow Python API文档

1.Gradient Computation

我们可以使用TensorFlow的各种optimizer类来直接对问题进行优化，这些optimizer类将会自动计算graph中的导数。但是，有时候我们想要自己写自己的optimizer，那就不能直接使用TensorFlow中的optimier了，我们可以调用lower-level functions，下面我们将介绍如何调用tf.gradients()函数为TensorFlow计算图计算导数。

tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

该函数将会构建一个符号化的偏导数计算（因为是在TensorFlow计算图框架下），计算的是ys关于xs中的每个x的偏导数。

该函数将返回一个长度为len(xs)的Tensor列表，其中每一个Tensor都是sum(dy/dx)，for y in ys，也就是说把各个y对于x的导数加起来。

`2.Processing gradients before applying them`

我们在使用TensorFlow的optimizer类时，通常会调用类对应的minimize()方法，该方法包括计算梯度以及将梯度应用于变量（也即更新变量）。如果我们想要在更新变量之前对梯度进行处理，那么就可以使用一种替代的方法：

1）利用compute_gradients()函数计算梯度；

2）按照我们想要的方式处理梯度；

3）使用apply_gradients()函数将处理之后的梯度应用于变量。

举例：

# Create an optimizer.opt = GradientDescentOptimizer(learning_rate=0.1)# Compute the gradients for a list of variables.grads_and_vars = opt.compute_gradients(loss, <list of variables>)# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you# need to the 'gradient' part, for example cap them, etc.capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]# Ask the optimizer to apply the capped gradients.opt.apply_gradients(capped_grads_and_vars)

3.tf.assign()

tf.assign(ref, value, validate_shape = None, use_locking = None, name = None)

定义见：TensorFlow变量操作

该函数的功能是，将value赋值给ref来更新ref。

ref: 一个可变的张量，应该来自变量节点，节点可能未初始化。

value: 张量。必须具有与ref相同的类型，是要分配给变量的值。

validate_shape: 一个可选的bool，默认为true，如果为true，则操作将验证value的形状是否与分配给的张量的形状相匹配；如果为false，ref将对值的形状进行引用。

use_locking: 一个可选的bool，默认为true，如果为true，则分配将受到锁保护，否则，该行为是未定义的，可能会显示较少的争用。

name: 操作的名称（可选）。

为什么要有这个函数？为什么不能用a=a+b来对a进行更新？

因为TensorFlow采用的是计算图表示方法，在没有sess.run()之前，所有TensorFlow中的赋值运算都是往计算图中加入一个Op节点，在计算图中看的话，这里就成了一个死循环了，a=a+b就搞不清楚是个啥了，所以我们就引入了assign()函数，来使得我们在执行的操作更加明确。

4.关于tf.gradients()函数的详解

在使用tf.gradients()时，我有一个疑惑，返回值sum(dy/dx)到底是对多个输出变量进行求和，还是对batch进行求和，还是两者都需要求和呢？

如果，仅仅输入一个sample（也就是说不考虑batch），那么我们对于参数的更新，应该是要对多个输出变量对于该参数的导数进行求和的。而通过实验验证，确实如此，那么该函数是否会对batch进行求均值处理？答案是会对batch进行求和，但不是求均值。示例如下：

import tensorflow as tfsess=tf.InteractiveSession()x=tf.placeholder(dtype=tf.float32,shape=[1,2])w=tf.constant(value=[2,3],dtype=tf.float32,shape=[2,1])x_input=[[1,5]]y=tf.matmul(x,w)grad_y=tf.gradients(y,w)sess.run(tf.global_variables_initializer())print len(grad_y)print grad_y[0].eval(feed_dict={x:x_input})

运行结果：

引入batch，即一次性输入多组样本，这可以从x_input中看出来：

import tensorflow as tfsess=tf.InteractiveSession()x=tf.placeholder(dtype=tf.float32,shape=[2,2])w=tf.constant(value=[2,3],dtype=tf.float32,shape=[2,1])x_input=[[1,5],[3,4]]y=tf.matmul(x,w)grad_y=tf.gradients(y,w)sess.run(tf.global_variables_initializer())print len(grad_y)    #len is 1print grad_y[0].eval(feed_dict={x:x_input})

运行结果：

接下来引入多维输出（没有batch）：

import tensorflow as tfsess=tf.InteractiveSession()x=tf.placeholder(dtype=tf.float32,shape=[1,2])w=tf.constant(value=[2,3],dtype=tf.float32,shape=[2,1])x_input=[[1,5]]y=tf.matmul(x,w)grad_y=tf.gradients(y,w)sess.run(tf.global_variables_initializer())print len(grad_y)print grad_y[0].eval(feed_dict={x:x_input})

运行结果：

最后，结合多维输出以及batch：

import tensorflow as tfsess=tf.InteractiveSession()x=tf.placeholder(dtype=tf.float32,shape=[2,2])w=tf.constant(value=[2,3,5,6],dtype=tf.float32,shape=[2,2])x_input=[[1,5],[3,4]]y=tf.matmul(x,w)grad_y=tf.gradients(y,w)sess.run(tf.global_variables_initializer())print len(grad_y)print grad_y[0].eval(feed_dict={x:x_input})

运行结果：

总结：从上面的计算结果中我们可以看出，最终的返回值将会对各个batch以及各个输出变量关于参数的导数进行求和。

此外，还有一点特别需要注意，我们上面讨论的情形是对于函数的参数进行求导，而不是函数的输入，如果我们要对于函数的输入进行求导的话，则tf.gradients()函数将不会对batch进行求和，仔细想想，某个输出yi仅仅只与对应的输入xi有关，与其他的输入是无关的，所以当然不会对batch进行求和，下面我们看一个例子：

import tensorflow as tfsess=tf.InteractiveSession()x=tf.placeholder(dtype=tf.float32,shape=[2,2])w=tf.constant(value=[2,3],dtype=tf.float32,shape=[2,1])x_input=[[1,5],[3,4]]y=tf.matmul(x,w)grad_y=tf.gradients(y,x)sess.run(tf.global_variables_initializer())print len(grad_y)print grad_y[0].eval(feed_dict={x: x_input})

结果如下：

显然，维度为[batch,x_dim]，也就是说没有对batch进行求和。

关于这个问题具体在DDPG实现中的应用记载如下：

在使用tf.gradients()函数时，要仔细考虑清楚是对于参数求导还是对于输入求导，比如在DDPG中我们求dQ/da便是输入对于输入a进行求导，因此并不会对于batch进行求和，所以这时候求得的导数维度为[None,self.state_dim]，这与我们的actor网络的输出的维度[None,self.state_dim]一致，也就符合了tf.gradients()中的要求grad_ys与ys维度一致的要求，该函数将dQ/da作为系数乘入d(mu)/d(theta)中，这时候，由于是对于参数theta求导，所以会对于输入的batch进行求和，我们再除以N就行，或者您说的在learning中除以N是一样的。

下一篇文章见～

阅读全文

0 0