变量更新和控制依赖

来源：互联网发布：linux ftp更改下载目录编辑：程序博客网时间：2024/06/10 14:47

变量更新

到目前为止，我们已经将变量专门用于我们模型中的一些权重，这些权重将根据优化器的操作进行更新操作（如：Adam）。但是优化器并不是更新变量的唯一方法，还有别的一整套更高级的函数可以完成这个操作（你将再次看到，这些更高级的函数将作为一种操作添加到你的图中）。

最基本的自定义更新操作是 tf.assign() 操作。这个函数需要一个变量和一个值，并将值分配给这个变量，非常简单吧。

让我们来看一个例子：

import tensorflow as tf# We define a Variablex = tf.Variable(0, dtype=tf.int32)# We use a simple assign operationassign_op = tf.assign(x, x + 1)with tf.Session() as sess:  sess.run(tf.global_variables_initializer())  for i in range(5):    print('x:', sess.run(x))    sess.run(assign_op)# outputs:# x: 0# x: 1# x: 2# x: 3# x: 41
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

这里没有什么特别地，就跟任何其他操作一样：你能在会话（session）中调用它，并且操作确保会发生变量更新。

我们将这个操作（assign）跟通常的优化器 train_op 进行比较。两者都做同样的事情：变量更新。唯一的区别是，优化器在进行变量更新之前，需要做大量的微积分操作。

TF 有许多的函数来支持手动更新变量，你可以在 TensorFlow 的函数帮助页面进行查看，很多的操作都可以被一些张量操作来取代，然后调用 tf.assign 函数来实现更新操作，但在一些情况下，这将会是非常麻烦的一件事。所以，TensorFlow 为我们提供了两种更新操作：

这些操作被用于稀疏更新（仅仅更新变量的一个子集）：https://www.tensorflow.org/api_guides/python/state_ops#Sparse_Variable_Updates
这些操作被用于稠密更新（一次更新一整个集合）：https://www.tensorflow.org/api_guides/python/state_ops#Variable_helper_functions

我不会深挖这些函数的功能。其中一些函数可能你现在不是很理解，我的建议是你可以通过一个很简单的脚本来学习这些函数，然后再写入你的实际模型中，这种方法会帮助你节约很多的调试时间。

最后再谈一下参数更新：如果我们想改变参数的维度呢？例如，在参数中多添加一行或者一列？到目前为止，我们一直在谈论 “assign” 这个概念，并没有涉及到维度的改变。

这个问题是可以被解决的，但是比较棘手：

tf.Variable 函数中有一个参数 validate_shape 默认是设置为 True 。它阻止你对参数进行维度更新，所以我们必须将这个参数设置为 False 。
这个参数也存在于 tf.assign 函数中，所以我们也必须将这个参数进行关闭。

让我们看个例子：

import tensorflow as tf# We define a "shape-able" Variablex = tf.Variable(    [], # A list of scalar    dtype=tf.int32,    validate_shape=False, # By "shape-able", i mean we don't validate the shape so we can change it    trainable=False)# I build a new shape and assign it to xconcat = tf.concat([x, [0]], 0)assign_op = tf.assign(x, concat, validate_shape=False) # We force TF, to skip the shape validation stepwith tf.Session() as sess:  sess.run(tf.global_variables_initializer())  for i in range(5):    print('x:', sess.run(x), 'shape:', sess.run(tf.shape(x)))    sess.run(assign_op)# outputs:# x: [] shape: [0]# x: [0] shape: [1]# x: [0 0] shape: [2]# x: [0 0 0] shape: [3]# x: [0 0 0 0] shape: [4]1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

所以这也不是很难，对吧！让我们继续吧。

控制依赖

我们可以更新变量，但是如果你要在更新当前变量之前更新别的变量，那么这会造成一个严重问题：你需要调用很多次的 sess.run 来满足这个需求。这非常不实用，也没有效率。请记住，我们将参数留在图中更多，那么效率会更高。

那么有什么办法吗？当然有，那就是控制依赖。TF 提供了一组的函数来处理不完全依赖情况下的操作排序问题（就是哪个操作先执行的问题）。

让我们从最简单的例子开始：我们先构造一个拥有一个变量（Variable）和一个占位符（placeholder）的图，用来执行一个乘法操作。在每次进行乘法之前，我们需要对参数（Variable）进行更新操作，每次加一。那么，我们在实际的编程中怎么做到这一点呢？

如果我们开始天真的方式，只需要添加一个 tf.assign 调用就可以了，那么我们将得到如下结果：

import tensorflow as tf# We define our Variables and placeholdersx = tf.placeholder(tf.int32, shape=[], name='x')y = tf.Variable(2, dtype=tf.int32)# We set our assign opassign_op = tf.assign(y, y + 1)# We build our multiplication (this could be a more complicated graph)out = x * ywith tf.Session() as sess:  sess.run(tf.global_variables_initializer())  for i in range(3):    print('output:', sess.run(out, feed_dict={x: 1}))# outputs:# output: 2# output: 2# output: 21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

从结果中我们可以看出，这种操作方式并不 work ：我们的变量（Variable）并没有增长，输出结果一直都是 2 。

如果你仔细查看上面的代码，并且在脑中构建这个图，你就可以清楚的看到，如果要计算 x 和 y 之间的乘法，该图不需要计算 assign_op：因为如何对 y 进行更新操作，已经拥有了很好的定义。

为了解决这个问题，使得 y 能进行更新，我们需要一种方法来强制 TF 运行 assign_op 操作。

这种操作确实是存在的！我们可以添加一个控制依赖来做这件事。这样就像 Graph 或者 Variables 一样，我们能将它和 Python 语句一起使用。

让我们来看一个例子：

import tensorflow as tf# We define our Variables and placeholdersx = tf.placeholder(tf.int32, shape=[], name='x')y = tf.Variable(2, dtype=tf.int32)# We set our assign opassign_op = tf.assign(y, y + 1)# We build our multiplication, but this time, inside a control depedency scheme!with tf.control_dependencies([assign_op]):    # Now, we are under the dependency scope:    # All the operations happening here will only happens after     # the "assign_op" has been computed first    out = x * ywith tf.Session() as sess:  sess.run(tf.global_variables_initializer())  for i in range(3):    print('output:', sess.run(out, feed_dict={x: 1}))# outputs:# output: 3# output: 4# output: 51
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

一切都按照我们的想法进行工作了。TF 看到了我们设置的依赖关系，所以它在运行依赖关系里面的操作之前，它会运行 assign_op ，这里有一个可视化结果：

在上图，图并不会去计算 assign_op 。
在下图，控制依赖在计算乘法之前会强制图去计算 assign_op 。

一个陷阱

在前面我们讨论了如何去改变变量的维度。但是有一些地方需要注意，当我们使用控制依赖去改变变量维度时，那么我们进入了一个黑盒优化层面。

比如，你可以先查看一下这段代码：

import tensorflow as tf# I define a "shape-able" Variablex = tf.Variable(    [],     dtype=tf.int32,    validate_shape=False, # By "shape-able", i mean we don't validate the shape    trainable=False)# I build a new shape and assign it to xconcat = tf.concat([x, [0]], 0)assign_op = tf.assign(x, concat, validate_shape=False)with tf.control_dependencies([assign_op]):    # I print x after the assignment  # print_op_dep = tf.Print(x, data=[x], message="print_op_dep:")   
new_x = x.read_value()print_op_dep = tf.Print(new_x, data=[new_x], message="print_op_dep:")
# The assign_op is called, but it seems that print statement happens# before the assignment, that is wrong.with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(3): sess.run(print_op_dep)# outputs:# x: [] , x_read: [0]# x: [0] , x_read: [0 0]# x: [0 0], x_read: [0 0 0] 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
让我们仔细看看这段代码：
打印操作依赖于 assign_op ，它只能在 x 被更新之后计算。
然而，当我们打印 x 的时候，它看起来好像没有更新。
但实际上，由于我们可以使用特殊的 read_value 函数来获取 x 的真正值。
原因解释：
 If you'd done tf.assign_add(x, x + 1) (or something else that preserved the shape of x) you would see things happen in the order you expected, because everything happens on the same device, and the "snapshot" remains an alias of the underlying buffer.) The tf.Print() op gets the old snapshot value, and prints that.
How can you avoid this? One way is to force an explicit x.read_value(), which forces a new snapshot to be taken, respecting the control dependencies.

结束语
那么，我们怎么来使用这些新的性能呢？其中一点我想到的是，维度变化这个功能可以用在 NLP 问题中的句子长度不一问题，如果你在处理词向量问题时，遇到句子之间的长度不同，那么你不需要添加 <UNK> 之类的标志，直接改变维度就可以了。
注意：我不确定这个想法是否能产生好的效果，如果你做了实验，那么我很想听到实验结果，感谢！
Reference：
http://stackoverflow.com/questions/38994037/tensorflow-while-loop-for-training
https://github.com/tensorflow/tensorflow/issues/7782

阅读全文

0 0