SVM, Perceptron, LDA, Logistic Regression etc.

来源：互联网发布：c语言编程简易小游戏编辑：程序博客网时间：2024/06/04 08:41

机器学习常见算法分类汇总

Are Fisher’s linear discriminant and logistic regression classifier related?

The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis
Bradley Efron
Journal of the American Statistical Association
Vol. 70, No. 352 (Dec., 1975), pp. 892-898

What is the difference between the perceptron learning algorithm and SVM?

Difference between a SVM and a perceptron

The Perceptron does not try to optimize the separation "distance". As long as it finds a hyperplane that separates the two sets, it is good. SVM on the other hand tries to maximize the "support vector", i.e., the distance between two closest opposite sample points.The SVM typically tries to use a "kernel function" to project the sample points to high dimension space to make them linearly separable, while the perceptron assumes the sample points are linearly separable.

It sounds right to me. People sometimes also use the word “Perceptron” to refer to the training algorithm together with the classifier. For example, someone explained this to me in the answer to this question. Also, there is nothing to stop you from using a kernel with the perceptron, and this is often a better classifier. See here for some slides (pdf) on how to implement the kernel perceptron.

The major practical difference between a (kernel) perceptron and SVM is that perceptrons can be trained online (i.e. their weights can be updated as new examples arrive one at a time) whereas SVMs cannot be. See this question for information on whether SVMs can be trained online. So, even though a SVM is usually a better classifier, perceptrons can still be useful because they are cheap and easy to re-train in a situation in which fresh training data is constantly arriving.

How does a Support Vector Machine (SVM) work?

From a perceptron: SVM uses hinge loss and L2 regularization, the perceptron uses the perceptron loss and could use early stopping (or among other techniques) for regularization, there is really no regularization term in the perceptron. As it doesn’t have an regularization term, the perceptron is bound to be overtrained, therefore the generalization capabilities can be arbitrarily bad. The optimization is done using stochastic gradient descent and is therefore very fast. On the positive side this paper shows that by doing early stopping with a slightly modified loss function the performance could be on par with an SVM.
From logistic regression: logistic regression uses logistic loss term and could use L1 or L2 regularization. You can think of logistic regression as the discriminative brother of the generative naive-Bayes.
From LDA: LDA can also be seen as a generative algorithm, it assumes that the probability density functions (p(x|y=0) and p(x|y=1) are normally distributed. This is ideal when the data is in fact normally distributed. It has however, the downside that “training” requires the inversion of a matrix that can be large (when you have many features). Under homocedasticity LDA becomes QDA which is Bayes optimal for normally distributed data. Meaning that if the assumptions are satisfied you really cannot do better than this.

Support Vector Machines: What is the difference between Linear SVMs and Logistic Regression?
Logistic regression assumes that the predictors aren’t sufficient to determine the response variable, but determine a probability that is a logistic function of a linear combination of them. If there’s a lot of noise, logistic regression (usually fit with maximum-likelihood techniques) is a great technique.

On the other hand, there are problems where you have thousands of dimensions and the predictors do nearly-certainly determine the response, but in some hard-to-explicitly-program way. An example would be image recognition. If you have a grayscale image, 100 by 100 pixels, you have 10,000 dimensions already. With various basis transforms (kernel trick) you will be able to get a linear separator of the data.

Non-regularized logistic regression techniques don’t work well (in fact, the fitted coefficients diverge) when there’s a separating hyperplane, because the maximum likelihood is achieved by any separating plane, and there’s no guarantee that you’ll get the best one. What you get is an extremely confident model with poor predictive power near the margin.

SVMs get you the best separating hyperplane, and they’re efficient in high dimensional spaces. They’re similar to regularization in terms of trying to find the lowest-normed vector that separates the data, but with a margin condition that favors choosing a good hyperplane. A hard-margin SVM will find a hyperplane that separates all the data (if one exists) and fail if there is none; soft-margin SVMs (generally preferred) do better when there’s noise in the data.

Additionally, SVMs only consider points near the margin (support vectors). Logistic regression considers all the points in the data set. Which you prefer depends on your problem.

Logistic regression is great in a low number of dimensions and when the predictors don’t suffice to give more than a probabilistic estimate of the response. SVMs do better when there’s a higher number of dimensions, and especially on problems where the predictors do certainly (or near-certainly) determine the responses.

[Support vector machines - Harvard] (http://isites.harvard.edu/fs/docs/icb.topic540049.files/cs181_lec13_handout.pdf)

智能社会专家：中国人工智能和国际几乎没差距
2-5年将出现真正的辅助驾驶

　　未来的道路交通同样也将变得智能。谷歌总部研究院资深研究员、安卓系统图像处理架构主要设计者、格灵深瞳(北京)科技有限公司创始人赵勇向与会者介绍了两种智能交通系统，一是先进驾驶辅助系统，或者叫主动防御系统，可以进行道路检测、前向障碍物检测、侧向障碍物检测、本车和旁车驾驶行为分析、情报分析，培育更安全的驾驶行为；二是自动巡航系统，或者叫限制条件下的自动驾驶，这种技术可以结合引擎的各种参数做最节能的驾驶，可以帮助司机从疲劳驾驶里面解放出来，甚至可以对前后车进行自动编队。赵勇认为，真正的辅助驾驶会在未来2-5年出现，自主巡航技术可能要5-10年，全自动驾驶可能要10多年。

为什么需要稀疏编码及解释

0 0