[paper] Hypernetworks
来源:互联网 发布:mac在windows截图 编辑:程序博客网 时间:2024/05/10 05:30
(ICLR 2017) Hypernetworks
Paper: https://openreview.net/pdf?id=rkpACe1lx
Code: https://github.com/hardmaru/supercell
Blog: http://blog.otoro.net/2016/09/28/hyper-networks/
学习一个动态更新的循环神经网络,利用一个小网络去学习另一个大网络的权重,学习到的权重也会是大网络某个层的特定的。
HN 提供了一种新的权重共享方式,介于 CNN 和 RNN 之间,使得 HN 能在参数的个数和模型的效果和灵活性之间做出比较不错的平衡。
using one network, also known as a hypernetwork, to generate the weights for another network.
We apply hypernetworks to generate adaptive weights for recurrent networks.
hypernetworks can generate non-shared weights for LSTM.
Introduction
using a small network (called a “hypernetwork”) to generate the weights for a larger network (called a main network)
the hypernetwork takes a set of inputs that contain information about the structure of the weights and generates the weights for that layer.
The focus of this work is to use hypernetworks to generate weights for recurrent networks (RNN).
We perform experiments to investigate the behaviors of hypernetworks in a range of contexts and find that hypernetworks mix well with other techniques such as batch normalization and layer normalization.
Our main result is that hypernetworks can generate non-shared weights for LSTM that work better than the standard version of LSTM.
Related Work
difficult to directly operate in large search spaces consisting of millions of weight parameters
HyperNEAT framework: Compositional Pattern-Producing Networks (CPPNs) are evolved to define the weight structure of the much larger main network.
Differentiable Pattern Producing Networks (DPPNs): the structure is evolved but the weights are learned
ACDC-Networks: linear layers are compressed with DCT and the parameters are learned
Methods
when they are applied to recurrent networks, hypernetworks can be seen as a form of relaxed weight-sharing in the time dimension.
HyperRNN
When a hypernetwork is used to generate the weights for an RNN, we refer to it as the HyperRNN.
The standard formulation of a Basic RNN is given by:
In HyperRNN, we allow
b(z_b) = W_{bz} z_b + b_0
Figure 1: An overview of HyperRNNs. Black connections and parameters are associated basic
RNNs. Orange connections and parameters are introduced in this work and associated with HyperRNNs. Dotted arrows are for parameter generation.
We use a recurrent hypernetwork to compute z_h
\hat{x}_t = \begin{pmatrix} h_{t - 1} \\ x_t \\ \end{pmatrix}
\hat{h}_t = \phi(W_{\hat{h}} \hat{h}_{t - 1} + W_{\hat{x}} \hat{x}_t + \hat{b})
However, Equation 2 is not practical because the memory usage becomes too large for real problems.
We will use an intermediate hidden vector
only be using memory in the order
Related Approaches
The formulation of the HyperRNN in Equation 5 has similarities to Recurrent Batch Normalization (Cooijmans et al., 2016) and Layer Normalization (Ba et al., 2016).
The central idea for the normalization techniques is to calculate the first two statistical moments of the inputs to the activation function, and to linearly scale the inputs to have zero mean and unit variance.
After the normalization, an additional set of fixed parameters are learned to unscale the inputs if required.
The element-wise operation also has similarities to the Multiplicative RNN and its extensions (mRNN, mLSTM) (Sutskever et al., 2011; Krause et al., 2016) and Multiplicative Integration RNN (MI-RNN) (Wu et al., 2016).
Experiments
Character-level Penn Treebank Language Modelling
Hutter Prize Wikipedia Language Modelling
Handwriting Sequence Generation
Neural Machine Translation
Conclusion
In this paper, we presented a method to use one network to generate weights for another neural network. Our hypernetworks are trained end-to-end with backpropagation and therefore are efficient and scalable. We focused on applying hypernetworks to generate weights for recurrent networks. On language modelling and handwriting generation, hypernetworks are competitive to or sometimes better than state-of-the-art models. On machine translation, hypernetworks achieve a significant gain on top of a state-of-the-art production-level model.
- [paper] Hypernetworks
- paper
- paper
- paper
- PAPER
- 读paper写paper
- Paper Account
- Some paper
- Paper Account
- 读paper
- English paper
- Paper-latex
- paper writing
- paper issues
- 经典paper
- icml paper
- paper svn
- iros13 paper
- Anaconda入门
- git怎样删除未监视的文件untracked files
- CM5.11.0安装笔记
- Java中Synchronized的用法
- CSS3 box-sizing属性
- [paper] Hypernetworks
- Ueditor富文本编辑器--上传图片自定义上传操作
- JavaWeb 12 国际化
- 欢迎使用CSDN-markdown编辑器
- 优化Android布局——减少过度绘制
- 使用Hexo搭建博客,备份至GitHub过程
- 你真的会做决策吗?认知0007
- 用Alfred的Script Filter获取GitHub的个人仓库
- 特征选择