一天搞懂深度学习
来源:互联网 发布:奕乐麻将 嘉兴玩网络 编辑:程序博客网 时间:2024/05/16 11:07
Lecture I: Introduction of Deep Learning
Three Steps for Deep Learning
- define a set of function(Neural Network)
- goodness of function
- pick the best function
Soft max layer as the output layer.
FAQhow many layers? How many neutons for each layer?
Trial and Error + Intuition(试错+直觉)
Gradient Descent
- pick an initial value for
w - random(good enough)
- RBM pre-train
- compute
∂L∂w w←w−η∂L∂w , whereη is called “learning rate” - repeat Until
∂L∂w is approximately small
But gradient descent never guarantee global minima
Modularization(模块化)
Deep
Each basic classifier can have sufficient training examples
Sharing by the following classifiers as module
The modulrization is automatically learned from data
Lecture II: Tips for Training DNN
Do not always blame overfitting
- Reason for overfitting: Training data and testing data can be different
- Panacea for Overfitting: Have more training data or Create more training data
Different approaches for different problems
Chossing proper loss
- square error(mse)
∑(yi−yi^)2
- cross entropy(categorical_crosssentropy)
−∑(yi^lnyi) - When using softmax output layer, choose cross entropy
- square error(mse)
Mini-batch
- Mini-batch is Faster
- Randomly initialize network parameters
- Pick the 1 st batch, update parameters once
- Pick the 2 nd batch, update parameters once
- …
- Until all mini-batches have been picked(one epoch finished)
- Repeat the above process(2-5)
- Mini-batch is Faster
New activation function
- Vanishing Gradient Problem
- RBM pre-training
- Rectified Linear Unit (ReLU)
- Fast to compute
- Biological reason
- Infinite sigmoid z with different biases
- A Thinner linear network
- A special cases of Maxout
- Vanishing gradient problem
- ReLU - variant
Adaptive Learning Rate
Popular & Simple Idea: Reduce the learning rate by some factor every few epochs
ηt=ηt+1√
Adagrad
- Original:
w←w−η∂L∂w - Adagrad:
w←w−ηw∂L∂w,ηw=η∑ti=0(gi)2√
- Original:
Momentum
- Movement = Negative of
∂L/∂w + Momentum - Adam = RMSProp (Advanced Adagrad) + Momentum
- Movement = Negative of
Early Stopping
Weight Decay
- Original:
w←w−η∂L∂w - Weight Decay:
w←0.99w−η∂L∂w
- Original:
Dropout
- Training:
- Each neuron has p% to dropout
- The structure of the network is changed.
- Using the new network for training
- Testing:
- If the dropout rate at training is p%, all the weights times (1-p)%
- Dropout is a kind of ensemble
Lecture III: Variants of Neural Networks
Convolutional Neural Network (CNN)
- The convolution is not fully connected
- The convolution is sharing weights
- Learning: gradient descent
Recurrent Neural Network (RNN)
Long Short-term Memory (LSTM)
- Gated Recurrent Unit (GRU): simpler than LSTM
Lecture IV: Next Wave
Supervised Learning
Ultra Deep Network
Worry about training first!
This ultra deep network have special structure
Ultra deep network is the ensemble of many networks with different depth
Ensemble: 6 layers, 4 layers or 2 layers
FractalNet
Residual Network
Highway Network
Attention Model
Attention-based Model
Attention-based Model v2
Reinforcement Learning
- Agent learns to take actions to maximize expected reward.
- Difficulties of Reinforcement Learning
- It may be better to sacrifice immediate reward to gain more long-term reward
- Agent’s actions affect the subsequent data it receives
Unsupervised Learning
- Image: Realizing what the World Looks Like
- Text: Understanding the Meaning of Words
- Audio: Learning human language without supervision
- 一天搞懂深度学习
- 一天搞懂深度学习
- 一天搞懂深度学习—学习笔记1
- 初步了解DeepLearning----《一天搞懂深度学习》
- 一天搞懂深度学习—学习笔记2(CNN)
- 一天搞懂深度学习—学习笔记3(RNN)
- 一天搞懂深度学习—学习笔记4(knowledge and tricks)
- 【286页干货】一天搞懂深度学习(台湾资料科学年会课程)
- 干货 | 台大“一天搞懂深度学习”课程PPT(下载方式见文末!!)
- 一天搞懂机器学习PPT笔记-1
- 一天搞懂机器学习PPT笔记-2
- 一篇文章搞懂人工智能、机器学习和深度学习之间的区别
- 深度学习记录第一天--神经网络
- 神经网络与深度学习第一天读书笔记
- 《一天学懂深度学习》PPT翻译一
- 用 CNTK 搞深度学习 (一) 入门
- 搞深度学习需掌握的基础数学知识
- 用 CNTK 搞深度学习 (一) 入门
- poj 1664 DP
- 蓝桥杯-第七届蓝桥杯java B组决赛
- [LeetCode]350. Intersection of Two Arrays II(求两个数组交集 II)
- Arduino IED for EDP8266编写的相关函数
- Local Reference Overflow
- 一天搞懂深度学习
- PHP综合web开发(0)
- 12. Integer to Roman StringBuilder的运用以及StringBuffer和StringBuilder的比较
- 监控
- [NOIP2013] 货车运输 最大生成树 LCA
- 17.剑指offer-旋转数组的最小数字
- 蛇形填数
- 【数值分析】插值算法-拉格朗日插值法
- 【bzoj2141】排队