机器学习基础 维基翻译 超参数选择 K近邻法 及简单的sklearn例子

来源:互联网 发布:视频清晰度增强软件 编辑:程序博客网 时间:2024/05/22 03:03
In the context of mechine learning, hyperparameter optimization or model
selection is the problem of choosing a set of hyperparameters for a
learning algorithm, usually with the goal of optimizing a measure of
the algorithm's performance on a independent data set.
超参数的选择是在独立数据集上算法表现最优。
Often cross-validation is used to estimate the generalization performance.
一般使用交叉验证。
Hyerparameter optimization contrasts with actual learning problems, which
are also often cast as optimization problems,but optimize a loss function
on the training set alone. In effect, learning algorithms learn parameters
that model/reconstruct their inputs well, while hyperparameter optimiztion
is to ensure the model dose not overfit its data by tuning.
算法的目标是更好地拟合数据,超参数最优化是防止过拟合。
e.g. regularization:
这里的正则化是指如AIC BIC引入的惩罚项。
正则化防止过拟合的道理,是对参数加限定,使其不会“拟合”得那么好,去掉一些
对噪声的过分拟合。

Algorithms for hyperparameter optimization
Grid search
网格搜索
The tradition way of performing hyerparameter optimization has been grid
search, or a parameter sweep,(参数扫描分析)
which is simply an exhaustive searching(穷举搜索)
through a manually specified subset of the hyperparameter sapce of a learning algorithm. A grid search algorithm must be guided by some performance metric
(效益指标)
typically measureed by cross-validation on the training set or on a held-out
validation set.
Since the parameter space of a machine learner may include real-valued or
unbouned value spaces for certain parameters, manually set bpuned and
discretization(离散化)
may be necessary before applying grid search.
For example, a typical soft-margin(软边际 考虑是求解约束条件中对向量长度的限定)
SVM classifier quipped with am REF kernel(Gaussion) has at least two
hyperparameters that need to be tuned(调整) for good performance on unseen
data: a regularization constant C and a kernel hyperparameter y.
Both parameters are continous ,so to perform grid search, one selects a
finite set of "reasonable" each,...

Grid search then trains an SVM with each pair (C,y) in the Certesian product
(笛卡尔直积) ot these two sets and evaluates their performance on a held-out
validation set(or by interal cross-validation on the training set, in which case mutiple SVMs are trained per pair).Finally, the grid search algorithm ourpute the settings that achieved the highest score in the validation produre
.
Grid search suffers from the curse pf dimensionality(降低了用于拟合的样本量
导致维数灾难) but is often embarrassiongly parallel(并行) because typically the
hyperparameter settings it evaluates are independent of each other.





In pattern recognition, the k-Nearest Neighbors algorithm(or k-NN)
is a non-parametric methods used for classification and regression.In oth cases, the inputs consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:
 In k-NN classification, the output is a class membership, An object is clssified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small)
 这里指会被分到与此样本点最近的K个样本所代表的类别中投票最多的那个。
 If k = 1,then the object is simpply assigned to the class of that single nearest neighbor.

 In k-NN regression,the output is the properly value for the object. This value is the average of the values of its k nearest neighbors.
 用其最相近的k个样本的平均值作为其估计。

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm
is among the simplest of all machine learning algorithms.

Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors,
so thst the nearer neighbors contributr more the the average than the more distant ones. For example, a common weighting scheme consists in giving each neghbor a weight of 1/d, where d is the distance to the neighbor.
这里提出了按照距离的倒数进行赋权的方法。

The neighbors are taken from a set of objects for which the class(for k-NN classification)or the object properly value(for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.
由于与待决定样本点最近的k个样本相应属性(分类或值)是已知的,故可以将这些样本
看成训练集。 

A shortcoming of k-NN algorithm is that it is sensitive to the local structure
of the data.
缺点,对周围的数据结构敏感。

Algorithm
The training examples are vectors in a mutidimensional feature space, each with
a class lable. The training phase of the algorithm consists only of the storing the feature vectors and class labels of the training samples.
算法仅仅由样本特征及分类向量构成。

In the classification phase, k is a user-defined constant,and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to the query point.






特征选择时默认使用F统计量,此统计量来源于回归方程整体显著性的度量。
GridSearchCV实际是一种集成分类器。
Pipeline为管道技术,同时估计出多个模型参数。
from sklearn.datasets import load_iris from sklearn.decomposition import PCAfrom sklearn.feature_selection import SelectKBestfrom sklearn.pipeline import FeatureUnionfrom sklearn.svm import SVCfrom sklearn.pipeline import Pipelinefrom sklearn.grid_search import GridSearchCViris = load_iris()X, y = iris.data, iris.target pca = PCA(n_components = 2)selection = SelectKBest(k = 1)combine_features = FeatureUnion([('pca', pca), ('univ_select', selection)])X_feature = combine_features.fit(X, y).transform(X)#print X_featuresvm = SVC(kernel = "linear")pipeline = Pipeline([("features", combine_features), ("svm", svm)])param_grid = dict(features__pca__n_components = [1, 2, 3],     features__univ_select__k = [1, 2],     svm__C = [0.1, 1, 10])grid_search = GridSearchCV(pipeline, param_grid = param_grid, verbose = 10)grid_search.fit(X, y)print grid_search.best_estimator_





维基翻译:
In pattern recognitio, the k-Nesrest Neighbors algorithm(or k-NN for short)

 

0 0
原创粉丝点击