RNN的四种代表性扩展—Attention and Augmented Recurrent Neural Networks（二）

来源：互联网发布：矿山建设工程预算软件编辑：程序博客网时间：2024/04/19 18:18

这是RNN扩展的后两种介绍。接 Attention and Augmented Recurrent Neural Networks（一）

Adaptive Computation Time（自适应计算次数）

Standard RNNs do the same amount of computation each time step. This seems unintuitive. Surely, one should think more when things are hard? It also limits RNNs to doing O(n) operations for a list of length n.
标准的RNN在每一次中都做着相同的大量计算。它貌似是合理的，但如果处理的信息非常复杂时，我们是否可以考虑多运行几步？这种同样的计算量方式也限制了RNN在处理长度为n的序列时，它的复杂度也为O(n)。
Adaptive Computation Time (Graves, 2016), is a way for RNNs to do different amounts of computation each step. The big picture idea is simple: allow the RNN to do multiple steps of computation for each time step.
ACT时一种基于该思想的方法，它允许RNN在每一次运算中做不同的计算量。该想法的大体思路很简单：它允许RNN在每一次运算中多步运算。
In order for the network to learn how many steps to do, we want the number of steps to be differentiable. We achieve this with the same trick we used before: instead of deciding to run for a discrete number of steps, we have an attention distribution over the number of steps to run. The output is a weighted combination of the outputs of each step.
为了让NN学习每一次运算要做多少步，我们希望步数是可分的。我们通过之前用过的技巧来做到这样的效果：相比传统的单步独立运行，我们使用一个attention distribution机制来控制每次运行的步数，每次的输出是各步输出的的联合权重。
这里写图片描述
There are a few more details, which were left out in the previous diagram. Here’s a complete diagram of a time step with three computation steps.（这里给出了每次的完整构图）：

That’s a bit complicated, so let’s work through it step by step. At a high-level, we’re still running the RNN and outputting a weighted combination of the states
有点复杂，一步步来。下图：先总的来看，我们运行这个RNN，该RNN的输出是三个状态步输出的权重组合。
这里写图片描述
The weight for each step is determined by a “halting neuron”. It’s a sigmoid neuron that looks at the RNN state and gives an halting weight, which we can think of as the probability that we should stop at that step.
每步的权重由一个“halting neuron”决定。它是一个S型神经元，它看着RNN状态并给出一个停止权重，我们可以将其视为我们应该在该步骤停止的概率。
这里写图片描述
We have a total budget for the halting weights of 1, so we track that budget along the top. When it gets to less than epsilon, we stop.
我们设总的犹豫权重值为1，依次减去每步输出的犹豫值并进行判断，直到剩余的犹豫值小于epsilon后，就停止。
这里写图片描述
When we stop, might have some left over halting budget because we stop when it gets to less than epsilon. What should we do with it? Technically, it’s being given to future steps but we don’t want to compute those, so we attribute it to the last step.
当我们结束后，应该还会遗留一些犹豫值，因为我们在犹豫值小于epsilon时停止的。我们该怎么处理这些剩余的犹豫值？从技术上讲，它们应该传递到后面的计算步骤中去，但我们不想计算这些值，所以我们就把这些剩余的犹豫值归总到最后一步。
这里写图片描述
When training Adaptive Computation Time models, one adds a “ponder cost” term to the cost function. This penalizes the model for the amount of computation it uses. The bigger you make this term, the more it will trade-off performance for lowering compute time.
Adaptive Computation Time is a very new idea, but we believe that it, along with similar ideas, will be very important.
Code
The only open source implementation of Adaptive Computation Time at the moment seems to be Mark Neumann’s (TensorFlow).
https://github.com/DeNeutoy/act-tensorflow

Neural Programmer（神经程序员）

Neural nets are excellent at many tasks, but they also struggle to do some basic things like arithmetic, which are trivial in normal approaches to computing. It would be really nice to have a way to fuse neural nets with normal programming, and get the best of both worlds.
神经网络在许多任务中都很出色，但他们也很难做一些基本的事情，例如算术，使用正常方法做计算很简单，但是如果可以找到一种方式将神经网络和普通计算融合在一起，来获得两种方法得最好效果，那就非常nice了。
The neural programmer (Neelakantan, et al., 2015) is one approach to this. It learns to create programs in order to solve a task. In fact, it learns to generate such programs without needing examples of correct programs. It discovers how to produce programs as a means to the end of accomplishing some task.
神经程序员就是做这种事情得一种方法。它能够创造程序来解决一个任务。实际上，它不需要正确的程序样例就可以创造程序。它会发现如何产生程序作为完成某些任务的手段。
The actual model in the paper answers questions about tables by generating SQL-like programs to query the table. However, there are a number of details here that make it a bit complicated, so let’s start by imagining a slightly simpler model, which is given an arithmetic expression and generates a program to evaluate it.
该论文生成了一个可以生成查询数据表的类sql语句的模型。这里面的原理一定是很复杂的，所以我们先从简单的轻量级模型讲起。该示例模型给出了一个算法表达式，并生成一个程序来评估它。
The generated program is a sequence of operations. Each operation is defined to operate on the output of past operations. So an operation might be something like “add the output of the operation 2 steps ago and the output of the operation 1 step ago.” It’s more like a unix pipe than a program with variables being assigned to and read from.
生成的程序式一个操作序列。每一个操作将上一个操作的输出作为输入，而上一步操作输出的定义可以是两步前的操作输出，也可以是一步前的操作输出。这样的操作更类似于unix的pipe操作。
这里写图片描述
The program is generated one operation at a time by a controller RNN. At each step, the controller RNN outputs a probability distribution for what the next operation should be. For example, we might be pretty sure we want to perform addition at the first time step, then have a hard time deciding whether we should multiply or divide at the second step, and so on…
该程序会在一个RNN控制器下，每个时间点生成一个操作。在每一步中，RNN控制器会输出一个下一步该如何走的概率分布。例如，我们在当前步很确定做加法运算，而在下一步我们应该是做乘法还是除法就不好判断了。
这里写图片描述
The resulting distribution over operations can now be evaluated. Instead of running a single operation at each step, we do the usual attention trick of running all of them and then average the outputs together, weighted by the probability we ran that operation.
等生成最后的操作结果后就可以评估了。相比在每一步中运行一个单一的操作，我们可以使用attention技巧，在每一步运行所有的操作，然后将所有操作组合并输出平均值，最后将概率权值加到该操作输出中。
这里写图片描述
As long as we can define derivatives through the operations, the program’s output is differentiable with respect to the probabilities. We can then define a loss, and train the neural net to produce programs that give the correct answer. In this way, the Neural Programmer learns to produce programs without examples of good programs. The only supervision is the answer the program should produce.
只要我们可以通过操作定义导数，程序的输出相对于概率是可微的。然后我们可以定义一个损失，并训练神经网络以产生给出正确答案的程序。这样神经程序员就可以在没有好的样例程序下生成程序。唯一的监督是程序产生的答案。
That’s the core idea of Neural Programmer, but the version in the paper answers questions about tables, rather than arithmetic expressions. There’s a few additional neat tricks:
以上是NP的核心思想，但是该论文应用的是查询数据表，而不是算术表达式。所以会有一些额外的技巧。
1.Multiple Types: Many of the operations in the Neural Programmer deal with types other than scalar numbers. Some operations output selections of table columns or selections of cells. Only outputs of the same type get merged together.
1.多类型：许多NP的操作对象除了标量外，更多的是处理带有类型的数据。例如在数据表操作中，一些操作是输出列表的选择，一些操作是输出单元格的选择。只有相同类型的输出才能merge到一起。
2.Referencing Inputs: The neural programmer needs to answer questions like “How many cities have a population greater than 1,000,000?” given a table of cities with a population column. To facilitate this, some operations allow the network to reference constants in the question they’re answering, or the names of columns. This referencing happens by attention, in the style of pointer networks (Vinyals, et al., 2015).
2.引用输入：有时神经程序员需要提供一张具有人口数量的表，来回答诸如“有多少城市有大于1,000,000的人口”这样的问题。为了方便起见，一些操作允许网络引用问题中的常量。这种应用是基于attention的。

The Neural Programmer isn’t the only approach to having neural networks generate programs. Another lovely approach is the Neural Programmer-Interpreter (Reed & de Freitas, 2015) which can accomplish a number of very interesting tasks, but requires supervision in the form of correct programs.
神经程序员不是使神经网络生成程序的唯一方法。另一个有趣的方法是Neural Programmer-Interpreter（Reed＆de Freitas，2015），它可以完成许多非常有趣的任务，但需要正确程序的形式来做监督。
We think that this general space, of bridging the gap between more traditional programming and neural networks is extremely important. While the Neural Programmer is clearly not the final solution, we think there are a lot of important lessons to be learned from it.(NP不是最终，而是开始，我们可以从该思想中学到很多)
Code（开源代码）
There don’t seem to be any open source implementations of the Neural Programmer at present, but there is an implementation of the Neural Programmer-Interpreter by Ken Morishita (Keras).（https://github.com/mokemokechicken/keras_npi）

转载自http://m.blog.csdn.net/article/details?id=53690186

0 0