逻辑回归(三)

来源:互联网 发布:避孕套推荐 知乎 编辑:程序博客网 时间:2024/05/24 22:41
多类别分类(Multiclass Classification)

在多类别分类问题中,我们的训练集中有个多个类别,这时可以利用一对多的分类思想将多类别分类问题转化为二元分类问题,该方法也称为一对多分类方法。


我们分别令y=1,y=2和y=3分别表示图中的三角形、矩形和叉。


左数第一个图y=1标记为正向类,则其他的就为负向类,这时我们就可以画出图中的直线作为边界,则记假设函数为hθ(1)(x),以此类推,第二个和第三个图的假设函数可分别记为hθ(2)(x),hθ(3)(x)。

因此,我们可以将假设函数hθ(x)记为:hθ(i)(x) = p(y=i|x,θ),其中i=1,2,3,...

最后,我们分别运行hθ(i)(x),i=1,2,3,...,选出让hθ(i)(x)的值最大的i,这时我们就挑选出了最好的假设函数,即最好的分类器。

补充笔记
Multiclass Classification: One-vs-all

Now we will approach the classification of data when we have more than two categories. Instead of y = {0,1} we will expand our definition so that y = {0,1...n}.

Since y = {0,1...n}, we divide our problem into n+1 (+1 because the index starts at 0) binary classification problems; in each one, we predict the probability that 'y' is a member of one of our classes.


We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.

The following image shows how one could classify 3 classes:


To summarize:

Train a logistic regression classifier hθ(x) for each class to predict the probability that  y = i .

To make a prediction on a new x, pick the class that maximizes hθ(x)