机器学习笔记1-Supervised learning

来源:互联网 发布:独立域名 编辑:程序博客网 时间:2024/06/06 02:22

1.1 Classification and logistic regression
classification problem is just like the regression problem, except that the values y we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1. For instance, if we are trying to build a spam classifier for email, then x(i) may be some features of a piece of email, and y may be 1 if it is a piece of spam mail, and 0 otherwise. 0 is also called the negative class, and 1 the positive class, and they are sometimes also denoted by the symbols “-” and “+.” Given x(i), the corresponding y(i) is also called the label for the training example.
1.2 Logistic regression
logistic function

hθ(x)=g(θTx)=11+eθTx

where
g(z)=11+ez

这里写图片描述
useful property
g(z)=ddz11+ez

=1(1+ez)2(ez)

=1(1+ez)(11(1+ez))

=g(z)(1g(z))

given the logistic regression model, how do we fit θ for it?
Let us assume that:
P(y=1|x;θ)=hθ(x)

P(y=0|x;θ)=1hθ(x)

written more compactly as:
p(y|x;θ)=(hθ(x))y(1hθ(x))(1y)

L(θ)=p(Y|X;θ)

=i=1mp(y(i)|x(i);θ)

=i=1m(hθ(x(i)))y(1hθ(x(i)))(1y(i)))

maximize the log likelihood
l(θ)=logL(θ)

=i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))

maximize the likelihood
θ:=θ+αθl(θ)

l(θ)θj=(y1g(θTx)(1y)11g(θTx))θjg(θTx)

=(y1g(θTx)(1y)11g(θTx))g(θTx)(1g(θTx))θjθTx

=(y(1g(θTx))(1y)g(θTx))θj

=(yhθ(x))xj

This therefore gives us the stochastic gradient ascent rule:
θj:=θj+α(y(i)hθ(x(i)))x(i)j

1.3 Digression: The perceptron learning algorithm
Consider modifying the logistic regression method to “force” it to output values that are either 0 or 1 or exactly. To do so, it seems natural to change the definition of g to be the threshold function:
g(z)={+1, 1,ifz0z<0

If we then let hθ(x)=g(θTx) as before but using this modified definition of g, and if we use the update rule :
θj:=θj+α(y(i)hθ(x(i)))x(i)j

then we have the perceptron learning algorithm.

0 0
原创粉丝点击