Lecture 10: Combining multiple neural networks to improve generalization

来源：互联网发布：股票技术指标软件编辑：程序博客网时间：2024/06/15 22:06

此课程是Hinton在coursera上的教程，这篇博文是我的学习笔记。

首先看题目，这节课讲的是基于上一节课的内容，通过多种方法的组合回来增强神经网络的泛化能力。

第一节：

Combining networkds: The bias-variance trade-off

When the amount of training data is limited, we get overfitting.

Averaging the predictions of many different models is a good way to reduce overfitting.
It helps most when the models make very different predictions.

For regression, the squared error can be decomposed into a "bias" term and a "variance" term.
- The bias term is big if the model has too little capacity ot fit the data.
- The variance term is big if the model has so much capacity that it is good at fitting the sampling error in each particular training set.
By averaging away the variance we can use individual models with high capacity. These models have high variance but low bias.

在我们训练网络的时候，如果训练的数据不够多，也就是不能包含全部的情况，那么我们得到的结果很容易产生overfitting（过拟合）。其实我的理解是，过拟合和欠拟合两种情况都可能出现的。

为了解决过拟合的问题，我们可以利用多种方法都做一次training，然后再将对应的结果求均值。如果这些方法得到的结果不尽相同，那么用这种平均值思想可以改善系统性能。

在回归中，均方差可以看做两部分组成：bias（偏离）和variance（震荡）。

bias是指的这个模型的能力不够，不能很好的拟合数据，所以学习后的结果跟实际结果偏离很大

variance是指的模型的能力非常强，它可以吧所有的数据精确地拟合出来，但由于数据中存在误差，所以也把误差拟合进去。

这两种情况可以这样理解。

因此，我们利用多个具有将强能力的model，在保证偏移量小的同时，利用均值消除variance带来的影响。（这里应该有一个假设，不同方法得到variance服从一个均值为0的分布，或者说sampling error的分布情况）

On any one test case, some individual predictors may be better than the combined predictor. But different individual predictors will be better on different cases.

If the individual predictors disagree a lot, the comined predictor is typically better than all of the individual predictors when we average over test cases.

第二节：Mixtures of Experts

这里所说的experts是指的不同的训练模型，对特定的数据具有较好的预测能力，那么我们在上一节的求平均值的时候，其实是默认每个模型对预测的权重值是相等的，但是在experts这里，权重值不等。

The key idea is to make each expert focus on predicting the right answer for the cases where it is already doing better than the other experts.

其实，这里面的权重值调节也有相同的作用，我们这里的权重值调节是根据每个模型的特性进行分配的。

0 0