斯坦福机器学习-第三周(分类,逻辑回归,过度拟合及解决方法)
来源:互联网 发布:怎么设置linux的ip地址 编辑:程序博客网 时间:2024/06/05 19:10
逻辑回归(Logistic Regression)
1. 分类(Classification)
The classification problem is just like the regression problem, except that the values we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1. For instance, if we are trying to build a spam classifier for email, then
x(i) may be some features of a piece of email, and y may be 1 if it is a piece of spam mail, and 0 otherwise. Hence, y∈{0,1}. 0 is also called the negative class, and 1 the positive class.
简而言之,分类就是通过一系列的特征值,来将数据集分成不同的类别。也就是说其最终的输出
2. 假设函数(Hypothesis function)
逻辑回归中的假设函数在本质与意义上同线性回归中的假设函数,仅仅只是在形式上发生了变化。
We could approach the classification problem ignoring the fact that y is discrete-valued, and use our old linear regression algorithm to try to predict y given x. However, it is easy to construct examples below where this method performs very poorly.
example 1:
在上面的图片中,我们可以用以下表达式来表示假设函数:
当
当
但是这样表示的问题就是,如果此时在添加一条数据(如下图),这个表达式就不适用了。
example 2:
在逻辑回归中,
Logistic Regression Model:
hθ(x) will give us the probability that our output is 1. For example,hθ(x) =0.7 gives us a probability of 70% that our output is 1.
其中
3. 决策边界(Decision boundary)
In order to get our discrete 0 or 1 classification, we can translate the output of the hypothesis function as follows:
为了解决离散值0和1的分类问题,我们可以将假设函数转化为如下形式:
hθ(x)≥0.5→y=1
hθ(x)<0.5→y=0
也就是说当
同时,根据
当
当
立即推:
当
The Decision Boundary is a property of the hypothesis including the parameters
θ0,θ1,θ2⋯ , which is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function. And the data set is only used to fit the parameters theta.
看一个例子:
已知
所以,原数据集被决策边界
4. 代价函数(Cost function)
We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy looks like the figure left above, causing many local optima. In other words, it will not be a convex function.
Instead, our cost function for logistic regression looks like:
Note:
由于
A vectorized implementation is:
h=g(Xθ)
J(θ)=1m(−yTlog(h)−(1−y)Tlog(1−h)) If our correct answer ‘y’ is 0:
then the cost function will be 0 if our hypothesis function also outputs 0.
then the cost function will approach infinity,If our hypothesis approaches 1.If our correct answer ‘y’ is 1:
then the cost function will be 0 if our hypothesis function outputs 1.
then the cost function will approach infinity, If our hypothesis approaches 0.Note that writing the cost function in this way guarantees that J(θ) is convex for logistic regression.
5. 梯度下降(Gradient Descent)
有了代价函数,下一步就是用梯度下降算法进行最小化Minimize
Gradient Descent
Remember that the general form of gradient descent is:
在逻辑回归中:
所以,求导后的表达式如下:
We can work out the derivative part using calculus to get:
Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.
其中,
A vectorized implementation is:
推导见关于梯度下降算法的矢量化过程
6. 进阶优化(Advanced Optimization)
“Conjugate gradient”, “BFGS”, and “L-BFGS” are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they’re already tested and highly optimized. Octave provides them.
有一个观点是这样来描述梯度下降算法的:梯度下降算法做了两件事,第一计算
如图,现在我们用Matlab中的函数fminunc来计算
You set a few options. This is a options as a data structure that stores the options you want. So grant up on, this sets the gradient objective parameter to on. It just means you are indeed going to provide a gradient to this algorithm. I’m going to set the maximum number of iterations to, let’s say, one hundred. We’re going give it an initial guess for theta. There’s a 2 by 1 vector.
optTheta %用来保存最后计算得到的参数值functionVal %用来保存代价函数的计算值exitFlag %用来表示最终是否收敛(1表示收敛)@costFunction %表示调用函数costFunctioin
function [ jVal,gradient ] = costFunction( theta )%此函数有两个返回值%jVal 表示 cost function%gradient 表示分别对两个参数的求导公式jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;gradient = zeros(2,1);gradient(1) = 2 * (theta(1) - 5);gradient(2) = 2 * (theta(2) - 5);end
>> options = optimset('GradObj','on','MaxIter',100);>> initialTheta = zeros(2,1);>> [optTheta,functionVal,exitFlag]=fminunc(@costFunction,initialTheta,options)
因此,不管是在逻辑回归中还是线性回归中,只需要完成下图红色矩形中的内容即可。
7. 多分类(Multi-class classification: One-vs-all)
Multi-class 简而言之就是
One-vs-all
Train a logistic regression classifierh(i)θ(x) for each classi to predict the probability thaty=i .On a new input
x , to make a prediction, pick the classi that maximizesmax{h(i)θ(x)} .
处理方法:
We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.
在解决这个问题的时候,我们根据图一,图二,图三的处理来训练三个分类器(classifier)
8. 过度拟合(Over fitting)
既然有过度拟合,那就可定有对应的欠拟合;简单的说过度拟合就是假设函数过于复杂,虽然他能完美地拟合training set 但却不能预测新的数据。这中现象不仅出现在线性回归中,逻辑回归中一样会有。下面的两幅图最左边的都是欠拟合(underfit),最右边的都是过度拟合(overfitting),中间的刚刚好(just right). 产生过度拟合的其中一个原因就是,训练数据太少,而特征值太多。
Underfitting, or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features. At the other extreme, overfitting, or high variance, is caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
那么怎么来解决这个问题呢?有两种方法:
There are two main options to address the issue of overfitting:
1) Reduce the number of features: (减少特征值)
Manually select which features to keep.(手动)
Use a model selection algorithm (studied later in the course).(利用选择模型自动)
2) Regularization
Keep all the features, but reduce the magnitude of parametersθj .
Regularization works well when we have a lot of slightly useful features.
9. 规则化(Regularization)
If we have overfitting from our hypothesis function, we can reduce the weight that some of the terms in our function carry by increasing their cost.
Say we wanted to make the following function more quadratic:
θ0+θ1x+θ2x2+θ3x3+θ4x4 We’ll want to eliminate the influence of
θ3x3 andθ4x4 .
简而言之,我们想把上面的4次多项式近似的改成一个2次多项式,也就是消除3次项和4次项对原式的影响,但又不能直接去掉这两项。该怎么办呢? 办法就是通过参数
Without actually getting rid of these features or changing the form of our hypothesis, we can instead modify our cost function:
Minimize
12m∑i=1m(hθ(x(i))−y(i))2+1000θ23+1000θ24 We’ve added two extra terms at the end to inflate the cost of
θ3 andθ4 . Now, in order for the cost function to get close to zero, we will have to reduce the values ofθ3 andθ4 to near zero. This will in turn greatly reduce the values ofθ33 andθ44 in our hypothesis function. As a result, we see that the new hypothesis (depicted by the pink curve) looks like a quadratic function but fits the data better due to the extra small termsθ33 andθ44 .我们已经在原式的末尾额外加上了两项来增加
θ3 和θ4 的代价。现在,对于代价函数来说,为了能使代价值最低(接近0),那么我们就必须去降低θ3 和θ4 的值,使其接近于0(因为θ3 和θ4 的系数很大,若θ3 和θ4 不能接近于0那么代价函数就不可能趋于0)。同时,这也将极大地降低θ3 和θ4 在假设函数中的值(权重)。最终,我能将会看到一个新的假设函数(下图的粉红曲线),其图形就类似于2次函数了,但却依旧能更好的拟合数据集了。
在这个例子中,因为我们事先知道目的(使其类似于一个二次多项式),所以我们就知道惩罚(penalize)参数
We could also regularize all of our theta parameters in a single summation as:
min12m∑i=1m(hθ(x(i))−y(i))2+λ∑j=1nθ2j The
λ∑nj=1θ2j ,is the regularization term The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.Using the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If lambda is chosen to be too large, it may smooth out the function too much and cause underfitting. Hence, what would happen if λ=0 or is too small ?
在这个例子中,由于
10.Regularized linear regression
We can apply regularization to both linear regression and logistic regression. We will approach linear regression first.
want: Minimize
Gradient Descent
We will modify our gradient descent function to separate outθ0 from the rest of the parameters because we do not want to penalizeθ0 .
Repeat {
}
The first term in the above equation,
1−αλm will always be less than 1. Intuitively you can see it as reducing the value ofθj by some amount on every update. Notice that the second term is now exactly the same as it was before.Normal Equation
Now let’s approach regularization using the alternate method of the non-iterative normal equation.
To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:
L is a matrix with 0 at the top left and 1 is down the diagonal, with 0 is everywhere else. It should have dimension (n+1)×(n+1). Intuitively, this is the identity matrix (though we are not includingx0 ), multiplied with a single real number λ.Recall that if m < n, then
XTX is non-invertible. However, when we add the term λ⋅L, thenXTX + λ⋅L becomes invertible.
11.Regularized logistic regression
Repeat {
}
- 斯坦福机器学习-第三周(分类,逻辑回归,过度拟合及解决方法)
- 斯坦福机器学习Coursera课程:第三周作业--逻辑回归
- 机器学习(二)分类器及回归拟合
- 斯坦福机器学习公开课学习笔记(3)—拟合问题以及局部权重回归、逻辑回归
- 机器学习(五)逻辑回归分类
- 机器学习-逻辑回归-分类
- 斯坦福机器学习公开课6-x 使用逻辑回归处理多分类
- 斯坦福机器学习笔记 第3周 六、逻辑回归(一)
- 斯坦福机器学习笔记 第3周 六、逻辑回归(二)
- 机器学习第三周(一)--逻辑回归引入
- [机器学习]04.多级分类(Multiclass classfication) 过度拟合(overfitting)
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Stanford机器学习---第三讲. 逻辑回归和过拟合问题的解决 logistic Regression & Regularization
- Lisp 调用 API 函数示例
- mysql 5.7首次登录设置密码
- Spark2.1.1<通俗易懂理解combineByKey-combineByKeyWithClassTag>
- 【Java并发编程】之六:Runnable和Thread实现多线程的区别(含代码)
- GDSOI2017 中学生数据结构题(Lct练习)
- 斯坦福机器学习-第三周(分类,逻辑回归,过度拟合及解决方法)
- QT中QLabel类的openExternalLinks和linkActivated使用时的一些注意细节
- dubbo
- 高通MSM8909 CAMERA TUNING 基础
- 架构设计:负载均衡层设计方案(7)——LVS + Keepalived + Nginx安装及配置
- 进程间通信IPC、LPC、RPC
- H264关于RTP协议的实现
- spring-hibernate整合 事务不起作用
- Android raw,assets目录源使文件格式使用