统计学习笔记(2) 监督学习概论(2)
来源:互联网 发布:淘宝c店运营提成 编辑:程序博客网 时间:2024/05/22 05:06
Some supplements to the last note:
Prediction and Inference
Prediction is different from inference, in prediction, we can treat f as a black box, we use f^ as the estimation of f, and do not care about the exact form of f^ as long as it can yield accurate prediction for Y. In inference, we really care about the form of f, that is, the relationship between predictors and the result, the dimension of the model, etc.
Example:
For instance, consider a company that is interested in conducting a direct-marketing campaign. The goal is to identify individuals who will respond positively to a mailing, based on observations of demographic variables measured on each individual. In this case, the demographic variables serve as predictors, and response to the marketing campaign (either positive or negative) serves as the outcome. The company is not interested in obtaining a deep understanding of the relationships between each individual predictor and the response; instead, the company simply wants an accurate model to predict the response using the predictors. This is an example of modeling for prediction.
Parametric and Non-parametric Methods
We can use a parametric model to estimate f, but it might be far from the true f, in this case, we can use flexible models that can fit many different possible functional forms of f. But it needs more parameters and may cause overfitting.
Non-parametric models have no assumption about the form of f. Thin-plate spline is an example of this model, but the resulting f^ might be more variable than the true f. So in the use of thin-platespline, we can select a level of smoothness, which will be further discussed in "Resampling Methods" and "Moving Beyond Linearity". And boosting methods, with low interpretability but high flexibility is shown in "Tree-Based Methods". Lasso(setting some parameters to zero) is discussed in "Linear Model Selection and Regularization" and Generalized Additive Models (the relationship between each predictor and the response is modeled using a curve) is discussed in "Moving Beyond Linearity". Bagging, boosting and SVM are shown in"Tree-Based Methods" and "Support Vector Machines".
Clustering and other unsupervised learning methods are shown in "Unsupervised Learning".
Algorithm Division
Quantitative (e.g. Least squares linear regression): regression
Qualitative (e.g. Logistic regression): classification
When evaluating the algorithms, we use test MSE instead of training MSE.
Classification:
When evaluating classification problems, we use training error rate:
better evaluated in the test set (test error rate)
Bayes classifier:
The Bayes classifier produces the lowest test error rate, with an overall error rate
K-nearest neighbors
Given a positive integer K and a test observation x0, the KNN classifier first identifies the K points in the training data that are closest to x0, represented by. It then estimates the conditional probability for class j as the fraction of points in whose response values equal j:
The above function is a ratio of the correct labels and the number of all neighbors K. Finally, KNN applies Bayes rule among different j and classifies the test observation x0 to the class with the largest probability.
Example:
Suppose that we chooseK= 3. Then KNN will first identify the three observations that are closest to the cross.This neighborhood is shown as a circle. It consists of two blue points andone orange point, resulting in estimated probabilities of 2/3 for the blue class and 1/3 for the orange class.
Despite the fact that it is a very simple approach, KNN can often produce classifiers that are surprisingly close to the optimal Bayes classifier.
The choice ofKhas a drastic effect on the KNN classifier obtained. The following figureshows 2 KNN fits to the simulated data.
When K = 1, the decision boundary is overly flexible and finds patterns in the data that don't correspond to the Bayes decision boundary. This corresponds to a classifier that has low bias but very high variance. As K grows, the method becomes less flexible andproduces a decision boundary that is close to linear. This corresponds to a low-variance but high-bias classifier.
As K decreases, flexibility increases (training error VS test error):
In"Resampling Methods", we return to this topic and discuss various methods for estimating test error rates and thereby choosing theoptimal level of flexibility for a given statistical learning method.
- 统计学习笔记(2) 监督学习概论(2)
- 统计学习笔记(1) 监督学习概论(1)
- 统计学习笔记(3) 监督学习概论(3)
- 统计学习方法学习笔记:概论
- 《统计学习方法》学习笔记-概论
- 学习笔记一:统计学习方法概论
- 监督学习--统计三要素
- 机器学习学习笔记2-有监督学习和无监督学习
- 统计学习笔记(1)——统计学习方法概论
- 统计学习笔记(1)——统计学习方法概论
- 统计学习笔记(1)——统计学习方法概论
- 统计学习笔记(1)——统计学习方法概论
- 《统计学习方法》学习笔记(一):统计学习方法概论
- 统计学习笔记(1)——统计学习方法概论
- 统计学习笔记(1)——统计学习方法概论
- 统计学习方法学习笔记1:统计学习方法概论
- Machine Learning机器学习课堂笔记2(监督学习与无监督学习)
- 1机器学习基础—2统计学习方法概论
- 解析json的总结方法
- 罪犯转移
- //7.4 对例 7.5 进行修改,将其中的友元函数 total 改为 max,即显示三个银行中存款最 //多的银行及其存款数
- 单例模式——皇帝XXX
- Linux网络编程简单示例
- 统计学习笔记(2) 监督学习概论(2)
- Bash eval命令及其常见用法
- JavaScrpt中的substr() 方法
- Android 工具
- wireshark的使用方法
- Dreamweaver——格式化HTML代码
- 关闭键盘的方法
- Java中调用SqlServer事务的步骤及注意事项
- C++ 重载中括号