[深度学习论文笔记][Optimization] Unit Tests for Stochastic Optimization

来源:互联网 发布:西门子编程器pg m4价格 编辑:程序博客网 时间:2024/06/05 14:39
Schaul, Tom, Ioannis Antonoglou, and David Silver. “Unit tests for stochastic optimization.” arXiv preprint arXiv:1312.6055 (2013). [Citations: 17].


1 Idea

[Motivation] There are many optimization algorithms, such as SGD, AdaGrad, AdaDelta, RMSprop, etc. Which is the best?


[Idea] Establish a collection of benchmarks to evaluate those optimization algorithms.
• Each unit test is small-scale, isolated, and well-understood difficulty.
• Rather than in real-world scenarios where many such issues are entangled.


2 Prototypes

[Shape Prototypes]
• Convex bowls (e.g., local optima).
• Long linear slopes.
• Non-convex.

• Non-differentiable (e.g., l1 ).


[Scale Prototypes]
• Multiple orders of magnitude.
• Steep cliffs (e.g., RNNs).
• Plateaus (e.g., ReLU).


[Noise Prototypes]

• Additive Gaussian noise.
• Multiplicative noise (scale-proportional).
• Mask-out noise (dropout).
• Outliers.


3 Combinations

[1D] Shape/scale/noise can be varied independently.


[Higher Dimension] Any 1D combinations can be combined.


[Correlations/Conditioning]


[Saddle Points]


4 Results
• It is difficult to substantially beat well-tuned SGD in performance on most unit tests.
• Hyper-parameter tuning matters much less for the adaptive algorithms (AdaGrad, AdaDelta, RPROP, RMSprop) than for the non-adaptive SGD variants.
• Most algorithms saturate under high noise.


5 Reference

[1]. ICLR 2014 Talk. https://www.youtube.com/watch?v=9GF9UB6kcxs.
0 0
原创粉丝点击