神经网络压缩:Mimic(一)Do Deep Nets Really Need to be Deep
来源:互联网 发布:java string变量类型 编辑:程序博客网 时间:2024/05/18 03:28
Do Deep Nets Really Need to be Deep?
论文链接: http://arxiv.org/abs/1312.6184
- 文章主旨
Shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models.
文章采用一种模型压缩(model compression)的方法模拟深度网络训练浅层网络,新的浅层模型的准确率能够达到和深度模型几乎一样的效果。(但直接训练浅层网络得到的准确率和深度网络还是没法比的) Train Shallow Nets to Mimic Deep Nets
浅层网络的训练是通过两步得到的:- train a state-of-the-art deep model
- train a shallow model to mimic the deep model
采用训练好的深度网络和 unlabeled data 共同训练浅层网络。这个浅层网络(mimic model)不是直接在原始数据及标签上训出来的,而是通过将数据输入深度网络,来学习深度网络已经学习到的函数。文章后面的实验中unlabeled data是将原始数据标签丢弃得到的,但要注意:第一,unlabeled的样本最好不能只是将deep model的训练集中的label去掉而得到的样本,因为deep model往往在这些样本点上有overfitting;第二,unlabel的样本数需要比deep model的训练集中的样本数要多得多,这样才能更可能的近似原来的这个deep model,unlabeled set 比 train set更大时会work best。
文章中提到参考文献[3](Model compression,2006) 用一种模型压缩的方法将复杂的集成模型转化成单层神经网络(训练了一个single neural net of modest size,to mimic a much larger ensemble of models),还提出了三种获取unlabeled data的方法。
当复杂模型可以被浅层模型来模拟时,就说明复杂模型学习到的函数并不是真正复杂的。模型的复杂性,和模型表达能力的复杂度是两回事。
Mimic Learning
正常网络是trained with cross-entropy on the n p values—— ,也就是softmax的输出;现在mimic model是直接在n log probability values z 上去train,文章里把softmax之前的这部分叫logit:- By training the student model on the logits directly, the student is better able to learn the internal model learned by the teacher, without suffering from the information loss that occurs after passing through the logits to probability space;
- 就是说监督teacher和student model的softmax上一层,学也是从这上一层开始学,求loss;
- 公式
- Note:Normalization not crucial 但还是有用的;
Speed-up
By introducing a linear layer- 模拟模型层数少,但节点多,输入层和隐藏层之间的参数量O(HD),D is input feature dimension and H is the number of hidden units,运算量会比较大比较慢,收敛的也慢,因此在输入层和隐藏层之间加入一个线性层(含有k个linear hidden units),由于线性层可以被吸收到权重矩阵中,所以加入线性层之后,新的模型具备和原来一样的表达能力,而且此时参数量O(k(H+D)),减小了很多,加速收敛。
- 这样重新参数化权重矩阵不仅提高了收敛速度,也大大降低了内存空间,这样也就允许训练更大的浅层网络
Results-Discussion
- 具体Results就看论文就好了
- Discussion:这样用teacher model来train student model 可以排除一些错误标签的影响; student model 可以学习到soft labels;mimic model sees non-zero targets for most outputs on most training cases,学习这种不确定性,比单纯强制学习0/1更有意义;
- 文章另外还讨论了浅层模型的学习能力和表达能力,其实teacher model好shallow model就更容易好,总结有两点:shallow models with a number of parameters comparable to deep models are likely capable of learning even more accurate functions if a more accurate teacher and/or more unlabeled data became available;
- The results suggest that it may be possible to devise better learning algorithms for training more accurate shallow feed-forward nets than those currently in use.
0 0
- 神经网络压缩:Mimic(一)Do Deep Nets Really Need to be Deep
- 《Do Deep Nets Really Need to be Deep?》精读笔记
- [NIPS2014]Do Deep Nets Really Need to be Deep?
- Do Deep Nets Really Need to be Deep
- "Do Deep Nets Really Need to be Deep?"阅读笔记
- [神经网络]1.7-Using neural nets to recognize handwritten digits-Toward deep learning(翻译)
- 神经网络压缩:Deep Compression
- 神经网络压缩(1):Deep Compression
- 【深度神经网络压缩】Deep Compression (ICLR2016 Best Paper)
- How to do a deep SQL tuning
- Deep Learning学习笔记(一):卷积神经网络(CNN)
- 深度神经网络优化(一)- Practical aspects of Deep Learning
- Deep Learning:深度前馈神经网络(一)
- deep learning 专项课程一 深层神经网络
- Deep Learning(深度学习)之(六)【深度神经网络压缩】Deep Compression (ICLR2016 Best Paper)
- 深入探究递归神经网络RNN A Deep Dive into Recurrent Neural Nets
- 【面向代码】学习 Deep Learning(二)Deep Belief Nets(DBNs)
- 【面向代码】学习 Deep Learning(二)Deep Belief Nets(DBNs) .
- C语言学习历程——编程练习2——09
- Java解惑学习有感(六)---库之谜
- 离散基础 (9). 整除性
- AdaBoost训练出现问题:Train dataset for temp stage can not be filled
- 【Android】Android 4.0 Camera架构分析之Camera初始化
- 神经网络压缩:Mimic(一)Do Deep Nets Really Need to be Deep
- 如何手动关闭tomcat服务,不在Eclipse中的server里按那个红色按钮关。
- 图论矩乘——BZOJ1297 [SCOI2009]迷路
- Source Insight 中文乱码终极解决
- 设置html网页只能在微信中打开,并禁用分享功能
- ubuntu下g++报错 g++:command not found解决办法
- option操作
- 第六章 React props
- Java内部类_成员内部类_静态