优化:深度神经网络Tricks【笔记】
来源:互联网 发布:淘宝运营一般工资多少 编辑:程序博客网 时间:2024/05/27 20:51
Slide:http://lamda.nju.edu.cn/weixs/slide/CNNTricks_slide.pdf
博文:http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html
1)data augmentation;
2)pre-processing on images;
3)initializations of Networks;
4)some tips during training;
5)selections of activation functions;
6)diverse regularizations;
7)some insights found from figures and finally
8)methods of ensemble multiple deep networks.
Sec. 1: Data Augmentation
训练的时候,训练集有限,可以用Data Augmentation来扩充数据集合;
(1)、简单的crops: horizontally flipping, random crops andcolor jittering.
(2)、结合(1)中简单的处理
(3)、Krizhevsky et al. [1] 提出的 fancy PCA : alters the intensities of the RGB channels in training images.
Sec. 2: Pre-Processing
(1)、 zero-center + normalize:
python实现
>>> X -= np.mean(X, axis = 0) # zero-center>>> X /= np.std(X, axis = 0) # normalize
(2)、 PCA Whitening:zero-center-->计算covariance matrix(数据之间的correlation结构)-->decorrelate数据-->whitening
python实现
>>> X -= np.mean(X, axis = 0) # zero-center>>> cov = np.dot(X.T, X) / X.shape[0] # compute the covariance matrix
decorrelate data :通过将原来的数据(除了zero-centres)映射带eigenbasis
>>> U,S,V = np.linalg.svd(cov) # compute the SVD factorization of the data covariance matrix>>> Xrot = np.dot(X, U) # decorrelate the data
whitening:用eigenvalue将eigenbasis的每个维度分开来normalize the scale
>>> Xwhite = Xrot / np.sqrt(S + 1e-5) # divide by the eigenvalues (which are square roots of the singular values)
Sec. 3: Initializations
(1)、All Zero Initialization
理想状态下认为一般权重为正数一半为负数再见过适当的data normalization
缺点:no source of asymmetry between neurons
(2)、Initialization with Small Random Numbers:
优点:symmetry breaking
思想:the neurons are all random and unique in the beginning,
eg1: , where is a zero mean, unit standard deviation gaussian.
eg2:small numbers drawn from a uniform distribution,
(3)、Calibrating the Variances
思想:normalize the variance of each neuron's output to 1 ,但是不会考虑ReLUs
python实现:
>>> w = np.random.randn(n) / sqrt(n) # calibrating the variances with 1/sqrt(n)
(4)、Current Recommendation
He et al. [4] 关注 ReLUs:variance :
python实现:
>>> w = np.random.randn(n) * sqrt(2.0/n) # current recommendation.
Sec. 4: During Training
Filters and pooling size. input images: power-of-2 ; filter (e.g.,) ;strides (e.g., 1) with zeros-padding; pooling :eg: .
Learning rate.利用validation set ,再次 Ilya Sutskever [2]:divide the gradients by mini batch size
Fine-tune on pre-trained models. 考虑:新的数据集的大小&和预训练模型训练数据集的相似性
(1)、如果自己的数据和预训练的相似 ,直接在从预训练模型的高层提取的特征尚训练一个 linear classifier
(2)、如果有许多数据,可以用small learning rate微调预训练模型的高层
(3)、如果自己的数据集和预训练模型的数据集差异很大,但是有很多训练图像,大部分的layers需要用小的learning rate在自己的数据集上进行 fine-tuned
(4)、如果自己的数据集小而且与预训练模型数据集差异很大,那就只训练一个 linear classifier.
Sec. 5: Activation Functions :non-linearity
Sigmoid
large negative numbers become 0 and large positive numbers become 1.
sigmoids saturate and kill gradients. .
Sigmoid outputs are not zero-centered.
tanh(x)
range [-1, 1].
1、 its activations saturate
2、zero-centered
Rectified Linear Unit
(Pros) do expensive operations (exponentials, etc.),
(Pros) ReLUs does not suffer from saturating.
(Pros) accelerate (e.g., a factor of 6 in [1]) the convergence of stochastic
gradient descent (linear, non-saturating form.)
(Cons) fragile during training and can “die”.
Leaky ReLU
fix the “dying ReLU” problem.
if ( : a small constant)
if,
(cons)the results are not always consistent.
Parametric ReLU :
PReLU, is learned from data not pre-defiined[[4]]
Leaky ReLU is fixed.
RReLU, is a random variable in a given range in the training,
and then fixed in the testing[[5]] (cons) reduce overfitting
Randomized ReLU
RReLU, 在训练时是给定范围的随机变量 ,但在测试时是固定的。[[5]]
Sec. 6: Regularizations
L2 regularization : add to the objective, :regularization strength. ( heavily penalizing peaky weight vectors and preferring diffuse weight vectors)
L1 regularization: add to the objective. 结合: (Elastic net regularization).
Max norm constraints. enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint.. (always 3 or 4).update are bounded so the nwtwork wont explores..
Dropout : [6] only updating the parameters of the sampled network based on the input data .
[6]. training: keeping a neuron active with some
probability (a hyper-parameter), or setting it to zero .
testing: no dropout
dropout ratio is a reasonable default
Sec. 7: Insights from Figures
learning rate
loss curve.: the “width” of the curve is related to the batch size.
accuracy curve.
Sec. 8: Ensemble[8]
Same model, different initialization. 用交叉验证集来决定最好的超参数 hyperparameters, 然后用这些超参数来训练多个 models ,但是随机初始化.
Top models discovered during cross-validation. 用交叉验证集来决定最好的超参数 hyperparameters,然后选出前n个最好的models来ensemble.(风险是可能包含未达标准的model).
Different checkpoints of a single model. training非常expensive的情况下, 选取一个single network中不同时刻的不同的 checkpoints 来ensemble. (缺乏多样性,但是cheap).
Some practical examples. 如果你的任务是high-level image semantic: 可以在不同的数据集上使用多个深度模型来提取不同的互补的深度representations.
Miscellaneous
Problems:
data:class-imbalanced: some classes have a large number of images/training instances, while some have very limited number of images.
method1:balance the training data by directly up-sampling and down-sampling the imbalanced data[10].
method2: crops processing[7].
method3 :adjust the fine-tuning strategy
- 优化:深度神经网络Tricks【笔记】
- 神经网络与深度学习学习笔记:神经网络的优化
- 神经网络与深度学习学习笔记:神经网络的优化(二)
- Google深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow深度学习笔记 实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- TensorFlow 深度学习笔记 TensorFlow实现与优化深度神经网络
- 如何优化深度神经网络
- 实现与优化深度神经网络
- 深度学习:优化神经网络(1)
- Coursera吴恩达《优化深度神经网络》课程笔记(1)-- 深度学习的实用层面
- Coursera吴恩达《优化深度神经网络》课程笔记(1)-- 深度学习的实用层面
- Coursera吴恩达《优化深度神经网络》课程笔记(2)-- 优化算法
- Coursera吴恩达《优化深度神经网络》课程笔记(2)-- 优化算法
- Kotlin类和对象 (七)--- 密封类(sealed class)
- Java
- 查询表信息
- 关于Python函数中self参数使用介绍
- JavaScript继承的几种方法比较
- 优化:深度神经网络Tricks【笔记】
- 关于measure
- MySQL数据库的高可用方案
- ECharts_01_柱形图
- Python初级教程:入门详解
- Java Security Architecture--Java安全体系技术文档翻译(三)
- Borrowing treasures from the wealthy: deep transfer learning through selective joint fine-tuning
- 第七周项目2
- Python爬虫:抓取内涵段子1000张搞笑图片-上篇(小爬虫诞生篇)