[深度学习论文笔记][Optimization] Unit Tests for Stochastic Optimization
来源:互联网 发布:西门子编程器pg m4价格 编辑:程序博客网 时间:2024/06/05 14:39
Schaul, Tom, Ioannis Antonoglou, and David Silver. “Unit tests for stochastic optimization.” arXiv preprint arXiv:1312.6055 (2013). [Citations: 17].
• Each unit test is small-scale, isolated, and well-understood difficulty.
• Rather than in real-world scenarios where many such issues are entangled.
• Convex bowls (e.g., local optima).
• Long linear slopes.
• Non-convex.
• Multiple orders of magnitude.
• Steep cliffs (e.g., RNNs).
• Plateaus (e.g., ReLU).
• Multiplicative noise (scale-proportional).
• Mask-out noise (dropout).
• Outliers.
• It is difficult to substantially beat well-tuned SGD in performance on most unit tests.
• Hyper-parameter tuning matters much less for the adaptive algorithms (AdaGrad, AdaDelta, RPROP, RMSprop) than for the non-adaptive SGD variants.
• Most algorithms saturate under high noise.
1 Idea
[Motivation] There are many optimization algorithms, such as SGD, AdaGrad, AdaDelta, RMSprop, etc. Which is the best?
• Each unit test is small-scale, isolated, and well-understood difficulty.
• Rather than in real-world scenarios where many such issues are entangled.
2 Prototypes
[Shape Prototypes]• Convex bowls (e.g., local optima).
• Long linear slopes.
• Non-convex.
• Non-differentiable (e.g., l1 ).
• Multiple orders of magnitude.
• Steep cliffs (e.g., RNNs).
• Plateaus (e.g., ReLU).
[Noise Prototypes]
• Additive Gaussian noise.• Multiplicative noise (scale-proportional).
• Mask-out noise (dropout).
• Outliers.
3 Combinations
[1D] Shape/scale/noise can be varied independently.[Higher Dimension] Any 1D combinations can be combined.
[Correlations/Conditioning]
[Saddle Points]
• It is difficult to substantially beat well-tuned SGD in performance on most unit tests.
• Hyper-parameter tuning matters much less for the adaptive algorithms (AdaGrad, AdaDelta, RPROP, RMSprop) than for the non-adaptive SGD variants.
• Most algorithms saturate under high noise.
5 Reference
[1]. ICLR 2014 Talk. https://www.youtube.com/watch?v=9GF9UB6kcxs. 0 0
- [深度学习论文笔记][Optimization] Unit Tests for Stochastic Optimization
- Google深度学习笔记 Stochastic Optimization
- TensorFlow深度学习笔记 Stochastic Optimization
- [论文笔记]Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- CS231N学习笔记4 Optimization: Stochastic Gradient Descent
- Robust Optimization VS Stochastic Optimization
- Stochastic Optimization Techniques随机优化相关笔记
- 笔记:Online Robust PCA via Stochastic Optimization
- Optimization:Stochastic Gradient Descent
- Optimization: Stochastic Gradient Descent
- CS231n Optimization: Stochastic Gradient Descent
- 学习笔记—Optimization algorithms
- optimization
- optimization
- Optimization
- optimization
- optimization
- 【深度学习】深度学习中监督优化入门(A Primer on Supervised Optimization for Deep Learning)
- Android aar打包碰到的问题
- 网络状态诊断
- Maven建立父子关系项目工程,建立依赖关系结构,构建工作集
- hdu4126Genghis Khan the Conqueror (最小生成树+树形dp)
- response详解
- [深度学习论文笔记][Optimization] Unit Tests for Stochastic Optimization
- 整理Node.js+Express环境搭建(window下)
- IntellJ 创建Maven项目target目录没有项目本身文件(或者没有web.xml)
- Android Volley完全解析(一),初识Volley的基本用法
- 【04】基础:将采集结果转成Excel
- 初学移动开发技术_实验二_任务三_理解Task
- 实现AOP动态代理原理
- pyhon中input()和raw_input()函数的区别
- wps面试一小时的面经