theano - scan - 个人理解

来源：互联网发布：微信数据损坏无法登陆编辑：程序博客网时间：2024/05/21 07:04

theano 的 scan 是这段时间觉得比较难以理解的东西，本想着既然用loop的方式就可以实现元素的遍历和计算操作，为什么要做出这么一个 scan 函数呢？

scan的主页http://deeplearning.net/software/theano/library/scan.html给了这么一段描述：

The scan functions provides the basic functionality needed to do loops in Theano. Scan comes with many whistles and bells, which we will introduce byway of examples.

我想要理解scan的提出，应该先从theano说起（参考http://blog.csdn.net/wangjian1204/article/details/50518591）：

在theano编程中，Graph是指导theano如何对变量进行操作的唯一途径，theano变量和theano Ops（操作）是Graph的两个基本构成元素。Graph只能由theano变量（包括shared变量）或常数组成。

通常可以按如下步骤构造Graph：首先声明theano变量，theano变量在Python文件中的作用范围和普通python变量相同；然后用theano的Ops建立变量之间的联系，如T.sum(a,b)；最后用theano.function把变量和变量间关系结合起来，构成一个完整的Graph。

假设已经创建了一个function，称为fn，fn=theano.function(…)。Graph中的shared变量已经包含了调用fn时需要的数据，而普通theano变量仅仅是一个占位符，需要在function中作为输入，并且在调用fn时给变量赋具体值（如numpy的array或者常数）。

这里值得一提的是shared 变量。首先给出一个官方的解释（http://deeplearning.net/software/theano/tutorial/examples.html?highlight=tanh#using-shared-variables）：

Shared variablescan be used in symbolicexpressions just like the objects returnedby dmatrices(...) but theyalso have an internal value that defines thevalue taken by thissymbolic variable in all the functions that useit. It is calleda shared variable because its value is sharedbetween many functions.The value can be accessed and modified by the .get_value() and .set_value() methods.

上面的一段解释我觉得应该关注“internal”这个词（我理解为中间结果）。shared变量一般是模型需要学习的参数，模型的参数在迭代过程中更新，当前的参数value依赖于上一个时刻的value（internal value），所以要用shared变量来定义。一般来说，shared变量不会用在input中，因为input是固定的，没有internal value一说。不过有的时候我们可能会看到把input变成shared variable（dtype=theano.config.floatX）的操作，我想是为了GPU运算加速吧，看看下面这个tips（http://deeplearning.net/software/theano/tutorial/using_gpu.html）：

Prefer constructors like matrix, vector and scalar to dmatrix, dvector and dscalar becausethe former will give youfloat32 variables when floatX=float32.

Ensure that your output variables have a float32 dtype andnot float64. The more float32 variables are in your graph, themore work the GPU can do for you.

Minimize tranfers to the GPU device by using shared float32 variablesto store frequently-accessed data (see shared()). When using theGPU, float32 tensor shared variables are stored on the GPUby default to eliminate transfer time for GPU ops using those variables.

另外，shared 变量的 borrow相对于c++的reference。

好的，那下面正式开始scan的讨论，先看一个例子：

import numpy

coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")

max_coefficients_supported = 10000

# Generate the components of the polynomial
components, updates = theano.scan(fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power),
sequences=[coefficients, theano.tensor.arange(max_coefficients_supported)],

outputs_info=None,
non_sequences=x)
# Sum them up
polynomial = components.sum()

# Compile a function
calculate_polynomial = theano.function(inputs=[coefficients, x], outputs=polynomial)

# Test
test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
test_value = 3
print(calculate_polynomial(test_coefficients, test_value))
print(1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2))

这里的fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power)是一个函数，它定义了单步操作的实现细节。当然，fn也可以用一个函数来定义：

def accumulate_by_adding(arange_val, sum_to_date):    return sum_to_date*2 + arange_val;

然后fn=accumulate_by_adding

接下来是fn的参数，它包含三种类型，而且要按照下面顺序来写

sequences (if any), prior result(s) (if needed), non-sequences (if any)

sequences 为单步操作取出的对象列表，对应关键字sequences；

prior result(s) 对应为上次单步操作的结果，对应关键字outputs_info；outputs_info的设置与fn定义的返回值有关，如果fn定义的单步运算返回了两个结果，那么outputs_info=[结果1，结果2]。如果在下一次单步操作只用到结果2的部分，可以将outputs_info=[None，结果2]，这样在给fn传参时就会忽略结果1，将结果2传进去。另外，因为outputs_info是保留的是上一次单步运算的结果，所以在第一次单步运算时，要做一个初始化，例如：

outputs_info=T.zeros_like(constant),

non_sequences对应为每一次单步操作的常数变量，对应关键字non_sequences；

体会一个例子：

import numpy as npimport theanoimport theano.tensor as Tdef accumulate_by_adding(arange_val, sum_to_date):    return [sum_to_date*2 + arange_val, sum_to_date + arange_val*2];up_to = T.iscalar("up_to")seq = T.arange(up_to)scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,                                        sequences=seq,                                        outputs_info=[None,T.as_tensor_variable(np.asarray(0, seq.dtype))],                                        non_sequences=None)triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result[1])# testsome_num = 15print(triangular_sequence(some_num))

输出：[  0   2   6  12  20  30  42  56  72  90 110 132 156 182 210]

单步运算：seq的一个元素为arange_val，T.as_tensor_variable(np.asarray(0, seq.dtype))是sum_to_date，单步执行完成后

sum_to_date + arange_val*2 的结果返回为outputs_info，作为下一步单步运算sum_to_date的值

好了，就写到这里。

0 0