神经网络-训练集 验证集 测试集
来源:互联网 发布:云软件官方下载 编辑:程序博客网 时间:2024/04/30 22:34
转载自:http://stackoverflow.com/questions/2976452/whats-is-the-difference-between-train-validation-and-test-set-in-neural-networ
whats is the difference between train, validation and test set, in neural networks?
侵删。
The training and validation sets are used during training.
for each epoch for each training data instance propagate error through the network adjust the weights calculate the accuracy over training data for each validation data instance calculate the accuracy over the validation data if the threshold validation accuracy is met exit training else continue training
Once you're finished training, then you run against your testing set and verify that the accuracy is sufficient.
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over then validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
Training set: A set of examples used for learning, that is to fit the parameters [i.e., weights] of the classifier.
Validation set: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network.
Test set: A set of examples used only to assess the performance [generalization] of a fully specified classifier.
From ftp://ftp.sas.com/pub/neural/FAQ1.txt section "What are the population, sample, training set, design set, validation"
The error surface will be different for different sets of data from your data set (batch learning). Therefore if you find a very good local minima for your test set data, that may not be a very good point, and may be a very bad point in the surface generated by some other set of data for the same problem. Therefore you need to compute such a model which not only finds a good weight configuration for the training set but also should be able to predict new data (which is not in the training set) with good error. In other words the network should be able to generalize the examples so that it learns the data and does not simply remembers or loads the training set by overfitting the training data.
The validation data set is a set of data for the function you want to learn, which you are not directly using to train the network. You are training the network with a set of data which you call the training data set. If you are using gradient based algorithm to train the network then the error surface and the gradient at some point will completely depend on the training data set thus the training data set is being directly used to adjust the weights. To make sure you don't overfit the network you need to input the validation dataset to the network and check if the error is within some range. Because the validation set is not being using directly to adjust the weights of the network, therefore a good error for the validation and also the test set indicates that the network predicts well for the train set examples, also it is expected to perform well when new example are presented to the network which was not used in the training process.
Early stopping is a way to stop training. There are different variations available, the main outline is, both the train and the validation set errors are monitored, the train error decreases at each iteration (backprop and brothers) and at first the validation error decreases. The training is stopped at the moment the validation error starts to rise. The weight configuration at this point indicates a model, which predicts the training data well, as well as the datawhich is not seen by the network . But because the validation data actually affects the weight configuration indirectly to select the weight configuration. This is where the Test set comes in. This set of data is never used in the training process. Once a model is selected based on the validation set, the test set data is applied on the network model and the error for this set is found. This error is a representative of the error which we can expect from absolutely new data for the same problem.
EDIT:
Also, in the case you do not have enough data for a validation set, you can usecross-validation to tune the parameters as well as estimate the test error.
- 神经网络-训练集 验证集 测试集
- 神经网络训练中的训练集、验证集以及测试集合
- 神经网络训练中的训练集、验证集以及测试集合
- 神经网络中关于训练集/验证集/测试集
- 训练集 验证集 测试集
- 训练集,验证集,测试集区分
- 训练集,测试集和验证集
- 训练集,验证集和测试集
- 训练集,验证集,测试集
- 训练集验证集测试集简析
- 训练集、测试集和验证集 训练集
- keras lastm循环神经网络训练验证测试
- mxnet卷积神经网络训练MNIST数据集测试
- [DeeplearningAI笔记]改善深层神经网络1.1_1.3深度学习实用层面_偏差/方差/欠拟合/过拟合/训练集/验证集/测试集
- 机器学习中的训练集、验证集和测试集
- 训练集,验证集和测试集的关系
- 机器学习: 训练集、验证集、测试集关系
- 训练集、验证集和测试集的意义
- sqlite3实例
- 欢迎使用CSDN-markdown编辑器
- 号线项目,百度SDK定位开发总结
- QT QTableWidget 用法总结
- 29. Divide Two Integers
- 神经网络-训练集 验证集 测试集
- iOS 10新特技
- OC中的继承与复合
- linux 实现增加磁盘容量
- LVM的配置
- Qt TableView的简单使用
- 自定义 dialog
- Free Capture 1.0发布
- 【安卓学习之常见问题】 使用Eclipse clean时,出现“cleaning all project has encountered a problem”提示