5.4 OTHER ESTIMATES

来源：互联网发布：运单软件合成编辑：程序博客网时间：2024/06/07 06:40

(1)Leave-One-Out Cross-Validation

Each instance in turn is left out, and the learning
scheme is trained on all the remaining instances.

It is judged by its correctness on the remaining

instance—one or zero for success or failure, respectively.

The results of all n judgments, one for each member of the dataset,

are averaged, and that average represents the final error estimate.

假设dataset中有n个样本，那LOOCV也就是n-CV，意思是

每个样本单独作为一次test set，剩余n-1个样本则做为training set，

故一次LOOCV共要建立n个models。

advantage:

First, the greatest possible amount of data is used for training

in each case, which presumably increases the chance that

the classifier is an accurate one.

每一回合中几乎所有的样本皆用于训练model，因此最接近母体

样本的分布，估测所得的generalization error比较可靠。

Second, the procedure is deterministic:
No random sampling is involved. There is no point in repeating it

10 times, or repeating it at all: The same result will be obtained

each time.

实验过程中没有随机因素会影响实验数据，确保实验过程是

可以被复制的。

Nevertheless, leave-one-out seems to offer a chance of

squeezing the maximum out of a small dataset and getting
as accurate an estimate as possible。

disadvantage:

First,high computational cost.because the entire learning

procedure must be executed n times and this is usually infeasible

for large datasets.

(缺点则是计算成本高，因为需要建立的models数量与总样本数量相同，

当总样本数量相当多时，LOOCV在实作上便有困难.)

Second,by its very nature, it cannot be stratified—worse than that,
it guarantees a nonstratified sample.Stratification involves getting the correct pro-
portion of examples in each class into the test set, and this is impossible when the
test set contains only a single example. A dramatic, although highly artificial, illus-
tration of the problems this might cause is to imagine a completely random dataset
that contains exactly the same number of instances of each of two classes. The best
that an inducer can do with random data is to predict the majority class, giving a
true error rate of 50%. But in each fold of leave-one-out, the opposite class to the
test instance is in the majority—and therefore the predictions will always be incor-
rect, leading to an estimated error rate of 100%.

(此种方法不可以分层，分层的条件之一是检验集中的每一个类所占的比例与总体

中相应类的比例相等，由于检验集中只有一个样本，这个条件不可能满足。例如，

假设完全是随机的数据集中的实例可以分为2类，并且数量相等。由于完全随机，

最好的预测法是inducer根据多数类预测random data所属的类，所以错误率是50%。

在leave-one-out方法中，如果每次都有一个样本留作检验集，所以与此样本

相反的类占据大都数，根据由多数分类原则，每次都会分错类。)

(2) 0.632 bootstrap(自助发)

a dataset of n instances is sampled n times, with replacement, to give another

dataset of n instances .Because some elements in this second dataset will

(almost certainly) be repeated, there must be some instances in theoriginal dataset

that have not been picked—we will use these as test instances.

（数据集中由n个样本，从该数据集中有放回的抽取n次，因此又可以得到一个含有n个

样本的数据集，该数据集就是所谓的训练集。在第二个数据集中有一些样本是重复的，

因此在第一个数据集中有些样本没有被抽到，这些样本作为检验集）

Advantage：

The bootstrap procedure may be the best way of estimating the error ratefor

very small datasets.

Disadvantage：

estimate of the true error：e= 0. 6320× e_{training- instances}+0.368×e_{test -instances}

e：error

e_{training- instances}：the error rate of training instances

e _{test–instances}：the error rate of test –instances

In fact, the very dataset we consideredabove will do: a completely random

dataset with two classes of equal size. Thetrue error rate is 50% for any prediction

rule. But a scheme that memorized thetraining set would give a perfect resubstitution

score of 100%, so that e_{training-instances} = 0, and the 0.632 bootstrap will mix this in with

a weight of 0.368to give an overall error rate of only 31.6% (0.632 × 50% + 0.368 × 0%),

which ismisleadingly optimistic.

假设有一个数据集，其中的样本是完全随机的，并且数据可以分为两类，

每一个类中所含的样本是一样多的，因此错误率是50%。假设我们由

bootstrap得到了一个“完美的”的分类器，对训练集分类的正确率是100%，

即错误率是0%。那么对检验集的正确率或者错误率只能是50%，

有以上公式(0.632× 50% + 0.368 × 0%=31.6%<50%).

因此所得到的结果是过分乐观的