【机器学习】交叉验证和K-折交叉验证cross-validation and k-fold cross-validation

来源：互联网发布：3个标准差重复知乎编辑：程序博客网时间：2024/04/24 00:08

http://www.anc.ed.ac.uk/rbf/intro/node16.html

If data is not scarce then the set of available input-output measurements can be divided into two parts - one part for training and one part for testing. In this way several different models, all trained on the training set, can be compared on the test set. This is the basic form of cross-validation.

如果数据不稀疏，把数据集分为两部分，一部分是训练集，一部分是测试集。这样，一些不同的模型，都在训练集上训练，在测试集上对比结果。

A better method, which is intended to avoid the possible bias introduced by relying on any one particular division into test and train components, is to partition the original set in several different ways and to compute an average score over the different partitions.

为了避免依赖某一特定的训练和测试集的划分产生了可能的偏差，一个更好的方法是把原始数据按不同的方法分，计算不同部分的平均得分。

下面引入k-折交叉验证。

https://randomforests.wordpress.com/2014/02/02/basics-of-k-fold-cross-validation-and-gridsearchcv-in-scikit-learn/

K-Fold Cross Validation is used to validate your model through generating different combinations of the data you already have. For example, if you have 100 samples, you can train your model on the first 90, and test on the last 10. Then you could train on samples 1-80 & 90-100, and test on samples 80-90. Then repeat. This way, you get different combinations of train/test data, essentially giving you ‘more’ data for validation from your original data.

k折交叉验证是使用不同的数据组合来验证模型。例如，你有100个样本，你在前90个样本上训练，在后10个上测试。然后在80-90上测试，其他的训练。重复下去。这样，你可以得到不同的训练-测试集组合，可以给你提供更多的数据去验证模型。

k-折交叉验证对网格搜索是很重要的

we’ll now check out GridSearchCV. This allows us to create a special model that will find its optimal parameter values

网格搜索允许我们去创建一个特定的模型去找到它的最优参数值。

0 0