论文读书笔记-Supervised machine learning:a review of classification techniques
来源:互联网 发布:mac 内存占用高 编辑:程序博客网 时间:2024/05/20 20:48
这篇文章主要是介绍了机器学习中有监督学习下的几种分类方法,包括决策树、规则分类、神经网络、统计学习、SVM等等,在文章中作者对这些方法的优劣之处进行了说明。
下面是一些摘抄:
1、the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features.
2、监督学习流程:
3、两类错误
Type 1 error is the probability that the test rejects the null hypothesis incorrectly.
Type 2 error is the probability that the null hypothesis is not rejected, when there actually is a difference.
4、对决策树的总结【基于逻辑的算法】
The most useful characteristics of decision trees is their comprehensibility. Decision trees tend to perform better when dealing with discrete/categorical features.
5、对规则集的总结【基于逻辑的算法】
The most useful characteristic of rule-based classifiers is their comprehensibility.
6、对单层感知机的总结【基于感知的算法】
Perceptron-like linear algorithm has superior time complexity when dealing with irrelevant features. It is good when there are many features but only a few relevant ones. All perceptron-like linear algorithms are anytime online algorithms that can produce a useful answer regardless of how long they run. The longer they run, the better the result they produce.
7、对多层感知机的总结【基于感知的算法】
ANN depends upon three fundamental aspects, input and activation functions of the unit, network architecture and the weight of each input connection. Their greatest problem is that they are two slow for most applications.
8、对径向基函数网络的总结【基于感知的算法】
Its training procedure is usually divided into two stages. First, the centers and widths of the hidden layer are determined by clustering algorithms. Second, the weights connecting the hidden layer with the output layer are determined by Singular Value Decomposition (SVD) or Least Mean Squared (LMS) algorithms.
与决策树的比较:
ANN lack of ability to reason about their output in a way that can be effectively communicated.
9、朴素贝叶斯【基于统计学习算法】
The major advantage of the naive Bayes classifier is its short computational time for training.In addition, since the model has the form of a product, it can be converted into a sum through the use of logarithms - with significant consequent computational advantages.
10、贝叶斯网络【基于统计学习方法】
the task of learning a Bayesian network can be divided into two subtasks: initially, the learning of
the DAG structure of the network, and then the determination of its parameters.
The most interesting feature of BNs, compared to decision trees or neural networks, is most certainly the possibility of taking into account prior information about a given problem, in terms of structural relationships among its features.
其中prior包含下面这些内容:
11、对KNN的总结【基于实例的学习】
Ideally, the distance metric must minimize the distance between two similarly classified instances, while maximizing the distance between instances of different classes.
缺点:
i) they have large storage requirements,
ii) they are sensitive to the choice of the similarity function that is used to compare instances,
iii) they lack a principled way to choose k, except through cross-validation or similar, computationally-expensive technique
选择k的技巧:
12、SVM
the training optimization problem of the SVM necessarily reaches a global minimum, and avoids
ending in a local minimum, which may happen in other search algorithms such as neural networks. However, the SVM methods are binary, thus in the case of multi-class problem one must reduce the problem to a set of multiple binary classification problems. Discrete data presents another problem, although with suitable rescaling good results can be obtained.
13、总结
Generally, SVMs and neural networks tend to perform much better when dealing with multi-dimensions and continuous features. On the other hand, logic-based systems tend to perform better when dealing with discrete/categorical features. For neural network models and SVMs, a large sample size is required in order to achieve its maximum prediction accuracy whereas NB may need a relatively small dataset.
Mechanisms that are used to build ensemble of classifiers include:
i) using different subsets of training data with a single learning method,
ii) using different training parameters with a single training method
iii) using different learning methods.
The key question when dealing with ML classification is not whether a learning algorithm is superior to others, but under which conditions a particular method can significantly outperform others on a given application problem.
- 论文读书笔记-Supervised machine learning:a review of classification techniques
- "A Review of Medical Image Classification Techniques-2011"笔记
- review of machine learning 1
- 综述论文翻译:A Review on Deep Learning Techniques Applied to Semantic Segmentation
- 《Thumbs up? Sentiment Classification using Machine Learning Techniques》笔记
- Supervised Learning Model-Classification Learning
- Lecture4: Supervised Machine Learning
- [论文翻译]A review on image segmentation techniques
- 论文笔记 A Large Contextual Dataset for Classification,Detection and Counting of Cars with Deep Learning
- Predictive modeling, supervised machine learning, and pattern classification — the big picture
- 论文阅读:A Critical Review of Recurrent Neural Networks for Sequence Learning
- deeplearning论文学习笔记(2)A critical review of recurrent neural networks for sequence learning
- 《Machine Learning in Action》 读书笔记之五:AdaBoost Classification
- Lecture 2: Supervised machine learning
- 论文阅读:Dual Supervised Learning
- 【Review】A Review on Deep Learning Techniques Applied to Semantic Segmentation
- Machine Learning Review 1
- A literature review and classification of recommender systems research
- 0926
- Android调用天气预报的WebService简单例子
- 如何手动写一个MFC单文档程序框架
- UX图标字体库
- OS X 下优秀的屏幕录制软件 ScreenFlow
- 论文读书笔记-Supervised machine learning:a review of classification techniques
- Tips for ADOBE FORM FormCalc
- 写个dump_stack
- 解决Fedora tracker-sotre CPU占用率过高
- Mvc3.0 cshtm页面中不使用模板页
- C++引用C函数例子
- ~~~~~~
- Guava学习之ArrayListMultimap
- pthread用法简介,pthread_creat(),pthread_join