一天搞懂深度学习

来源:互联网 发布:奕乐麻将 嘉兴玩网络 编辑:程序博客网 时间:2024/05/16 11:07

Lecture I: Introduction of Deep Learning

Three Steps for Deep Learning

  1. define a set of function(Neural Network)
  2. goodness of function
  3. pick the best function

Soft max layer as the output layer.

FAQhow many layers? How many neutons for each layer?

Trial and Error + Intuition(试错+直觉)

Gradient Descent

  1. pick an initial value for w
    • random(good enough)
    • RBM pre-train
  2. compute Lw
    wwηLw, where η is called “learning rate”
  3. repeat Until Lw is approximately small

But gradient descent never guarantee global minima

Modularization(模块化)

Deep Modularization

  • Each basic classifier can have sufficient training examples

  • Sharing by the following classifiers as module

  • The modulrization is automatically learned from data

Lecture II: Tips for Training DNN

  1. Do not always blame overfitting

    • Reason for overfitting: Training data and testing data can be different
    • Panacea for Overfitting: Have more training data or Create more training data
  2. Different approaches for different problems

  3. Chossing proper loss

    • square error(mse)
      • (yiyi^)2
    • cross entropy(categorical_crosssentropy)
      • (yi^lnyi)
      • When using softmax output layer, choose cross entropy
  4. Mini-batch

    • Mini-batch is Faster
      1. Randomly initialize network parameters
      2. Pick the 1 st batch, update parameters once
      3. Pick the 2 nd batch, update parameters once
      4. Until all mini-batches have been picked(one epoch finished)
      5. Repeat the above process(2-5)
  5. New activation function

    • Vanishing Gradient Problem
    • RBM pre-training
    • Rectified Linear Unit (ReLU)
    • Fast to compute
    • Biological reason
    • Infinite sigmoid z with different biases
    • A Thinner linear network
    • A special cases of Maxout
    • Vanishing gradient problem
    • 这里写图片描述
    • ReLU - variant
    • 这里写图片描述
  6. Adaptive Learning Rate

    • Popular & Simple Idea: Reduce the learning rate by some factor every few epochs

      • ηt=ηt+1
    • Adagrad

      • Original: wwηLw
      • Adagrad:wwηwLw,ηw=ηti=0(gi)2
  7. Momentum

    • Movement = Negative of L/w + Momentum
    • Adam = RMSProp (Advanced Adagrad) + Momentum
  8. Early Stopping

  9. Weight Decay

    • Original: wwηLw
    • Weight Decay: w0.99wηLw
  10. Dropout

    • Training:
    • Each neuron has p% to dropout
    • The structure of the network is changed.
    • Using the new network for training
    • Testing:
    • If the dropout rate at training is p%, all the weights times (1-p)%
    • Dropout is a kind of ensemble

Lecture III: Variants of Neural Networks

Convolutional Neural Network (CNN)

这里写图片描述

这里写图片描述

这里写图片描述

  • The convolution is not fully connected
  • The convolution is sharing weights
  • Learning: gradient descent

Recurrent Neural Network (RNN)

这里写图片描述

这里写图片描述

Long Short-term Memory (LSTM)

这里写图片描述

这里写图片描述

这里写图片描述

  • Gated Recurrent Unit (GRU): simpler than LSTM

Lecture IV: Next Wave

Supervised Learning

Ultra Deep Network

  • Worry about training first!

  • This ultra deep network have special structure

  • Ultra deep network is the ensemble of many networks with different depth

  • Ensemble: 6 layers, 4 layers or 2 layers

  • FractalNet

    这里写图片描述

  • Residual Network

    这里写图片描述

  • Highway Network

    这里写图片描述

Attention Model

  • Attention-based Model

    这里写图片描述

  • Attention-based Model v2

    这里写图片描述

Reinforcement Learning

  • Agent learns to take actions to maximize expected reward.
  • Difficulties of Reinforcement Learning
    • It may be better to sacrifice immediate reward to gain more long-term reward
    • Agent’s actions affect the subsequent data it receives
  • 这里写图片描述

Unsupervised Learning

  • Image: Realizing what the World Looks Like
  • Text: Understanding the Meaning of Words
  • Audio: Learning human language without supervision
0 0