论文读书笔记-Supervised machine learning:a review of classification techniques

来源：互联网发布：mac 内存占用高编辑：程序博客网时间：2024/05/20 20:48

这篇文章主要是介绍了机器学习中有监督学习下的几种分类方法，包括决策树、规则分类、神经网络、统计学习、SVM等等，在文章中作者对这些方法的优劣之处进行了说明。

下面是一些摘抄：

1、the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features.

2、监督学习流程：

3、两类错误

Type 1 error is the probability that the test rejects the null hypothesis incorrectly.

Type 2 error is the probability that the null hypothesis is not rejected, when there actually is a difference.

4、对决策树的总结【基于逻辑的算法】

The most useful characteristics of decision trees is their comprehensibility. Decision trees tend to perform better when dealing with discrete/categorical features.

5、对规则集的总结【基于逻辑的算法】

The most useful characteristic of rule-based classifiers is their comprehensibility.

6、对单层感知机的总结【基于感知的算法】

Perceptron-like linear algorithm has superior time complexity when dealing with irrelevant features. It is good when there are many features but only a few relevant ones. All perceptron-like linear algorithms are anytime online algorithms that can produce a useful answer regardless of how long they run. The longer they run, the better the result they produce.

7、对多层感知机的总结【基于感知的算法】

ANN depends upon three fundamental aspects, input and activation functions of the unit, network architecture and the weight of each input connection. Their greatest problem is that they are two slow for most applications.

8、对径向基函数网络的总结【基于感知的算法】

Its training procedure is usually divided into two stages. First, the centers and widths of the hidden layer are determined by clustering algorithms. Second, the weights connecting the hidden layer with the output layer are determined by Singular Value Decomposition (SVD) or Least Mean Squared (LMS) algorithms.

与决策树的比较：

ANN lack of ability to reason about their output in a way that can be effectively communicated.

9、朴素贝叶斯【基于统计学习算法】

The major advantage of the naive Bayes classifier is its short computational time for training.In addition, since the model has the form of a product, it can be converted into a sum through the use of logarithms - with significant consequent computational advantages.

10、贝叶斯网络【基于统计学习方法】

the task of learning a Bayesian network can be divided into two subtasks: initially, the learning of

the DAG structure of the network, and then the determination of its parameters.

The most interesting feature of BNs, compared to decision trees or neural networks, is most certainly the possibility of taking into account prior information about a given problem, in terms of structural relationships among its features.

其中prior包含下面这些内容：

11、对KNN的总结【基于实例的学习】

Ideally, the distance metric must minimize the distance between two similarly classified instances, while maximizing the distance between instances of different classes.

缺点：

i) they have large storage requirements,

ii) they are sensitive to the choice of the similarity function that is used to compare instances,

iii) they lack a principled way to choose k, except through cross-validation or similar, computationally-expensive technique

选择k的技巧：

12、SVM

the training optimization problem of the SVM necessarily reaches a global minimum, and avoids

ending in a local minimum, which may happen in other search algorithms such as neural networks. However, the SVM methods are binary, thus in the case of multi-class problem one must reduce the problem to a set of multiple binary classification problems. Discrete data presents another problem, although with suitable rescaling good results can be obtained.

13、总结

Generally, SVMs and neural networks tend to perform much better when dealing with multi-dimensions and continuous features. On the other hand, logic-based systems tend to perform better when dealing with discrete/categorical features. For neural network models and SVMs, a large sample size is required in order to achieve its maximum prediction accuracy whereas NB may need a relatively small dataset.

Mechanisms that are used to build ensemble of classifiers include:

i) using different subsets of training data with a single learning method,

ii) using different training parameters with a single training method

iii) using different learning methods.

The key question when dealing with ML classification is not whether a learning algorithm is superior to others, but under which conditions a particular method can significantly outperform others on a given application problem.