cs231n Lecture 3
来源:互联网 发布:软件设计师考试辅导 编辑:程序博客网 时间:2024/05/16 23:35
说明
除特殊说明,本文以及这个系列文章中的所有插图都来自斯坦福cs231n课程。
转载请注明出处;也请加上这个链接http://vision.stanford.edu/teaching/cs231n/syllabus.html
Feel free to contact me or leave a comment.
Abstract
上次课简单叙述了一下图像分类的整个流程;接着引入了一个linear score function.
今天的主要内容为:Loss function+optimization.
Loss function
上次课我们定义了一个linear score function,可是,我们看到有些图片分类正确,有些错误,那我们怎么衡量这些错误结果是多差呢?(quantifies our unhappiness with the scores across the training data)我们引入loss function.
Multiclass SVM loss(hinge loss)
Given an example
the SVM loss has the form: Li=max(0,sj-syi+1) and the full training loss is the mean over all examples in the training data: Li=max(0,sj-syi+1)/N.
Q: what if the sum was instead over all classes?(including j = y_i)
Scores will plus 1.Shifting.Q: what if we used a mean instead of a sum here?
Scaling.Q: what if we used ()2 square?
Nonlinear, sometimes will be better.(这里不太懂为什么有时候会比较好。)Q: what is the min/max possible loss?
Zero,infinity.Q: usually at initialization W are small numbers, so all s ~= 0. What is the loss?
Number of classes minus 1.
那现在我们已经学了multiclass SVM loss的loss function.但是,这个式子却有一个明显的bug,因为使得loss最小的Weights不唯一。那怎么样的Weights才比较好呢?
Weight Regularization
前一部分去fit training data;后一部分去make W nice;这两部分相互fight,尽管可能在训练集上表现欠佳,但在测试集的泛化能力比较好。
Q: What is the min/max possible loss L_i?
Zero,infinity.Q: usually at initialization W are small numbers, so all s ~= 0. What is the loss?
negtive log 1/num of Classes
Softmax loss(cross-entropy loss)
我们刚才讨论了hinge loss和cross-entropy loss,对于其的一个可视化的网站可以参考这里:Linear Classification Loss Visualization.
Optimization
找到使得loss function最小的parameters.
Strategy #1: A first very bad idea solution: Random search
Strategy #2: Follow the slope
In multiple dimensions, the gradient is the vector of (partial derivatives).
Evaluation the gradient numerically
- approximate
- very slow to evaluate
Calculus
In summary:
- Numerical gradient: approximate, slow, easy to write
- Analytic gradient: exact, fast, error-prone
=>In practice:
Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check.
Gradient Descent
negative gradient direction
Mini-batch Gradient Descent
- only use a small portion of the training set to compute the gradient.
- Common mini-batch sizes are 32/64/128 examples. e.g. Krizhevsky ILSVRC ConvNet used 256 examples
The effects of step size (or “learning rate”)
Next class: Becoming a backprop ninja and Neural Networks (part 1).
- cs231n Lecture 3
- cs231n Lecture 2
- cs231n Lecture 4
- cs231n Lecture 5
- CS231n系列课程Lecture 3: Loss functions and Optimization
- 【CS231n winter2016 Lecture 3 (Linear classification II/loss function/optimization/SGD)】
- 【CS231n winter2016 Lecture 6 (Training Neural Networks Part 3,update schemes/dropout and so on..)】
- [CS231N课程笔记] Lecture 2. Image Classification
- CS231n winter 2016 学习笔记lecture 1
- 【CS231n】Lecture 5:Training Network,Part I
- Lecture 3
- Lecture 3
- 【CS231n Winter 2016 Lecture 1 (Brief history&course overview)】
- 【CS231n winter2016 Lecture 4 (Backpropagation ,Introduction to neural networks)】
- 【CS231n winter2016 Lecture 7 (Convolutional Neural Network)】
- CS231n系列之Lecture 2: Image Classification pipeline
- CS231n系列课程Lecture 5:Train Neural Networks, Part 1
- 【CS231n】Lecture 6:Training Neural Networks,Part 2
- 《GLSL渲染语言入门与VBO、VAO使用:绘制一个三角形》的正确版本及源代码
- 指针基础知识 ,破坏指针数组的讨论,
- FZU 1058粗心的物理学家
- 一台电脑启动多个tomcat
- 字段定义引起的bug
- cs231n Lecture 3
- pngquant——一个好用的png压缩工具
- Java中finalize方法
- Caffe ubuntu
- 如何用AWS(亚马逊云服务器)搭建一个自己的blog (3) – 如何登陆到AWS建立的网络服务器
- FZU 1061矩阵连乘
- 16. 3Sum Closest
- Spinner加载自定义Adapter
- leetcode 第221题 Maximal Square