self-training and co-training

来源：互联网发布：信捷xc编程软件编辑：程序博客网时间：2024/05/22 17:01

Semi-supervised learning methods widely used include:

1.EM with generative mixture models

2.self-training

3.co-training

4.transductive support vector machines

5.graph-based methods

self-training:

A classifier is first traind with the small amount of labeled data. The classifier is then used to classify the unlabeled

data. Typically the most confident unlabeled data points, together with their predicted labels, are added to the

training set. The classifier is re-trained and the procedure repeated.

When the existing supervised classifier is complicated and hard to modify, self-training is a practical wrapper method.

applied to several natural language processing tasks, word sense disambiguation, parsing, machine translation and

object detection system from images.

co-training

Co-training assumes that features can be split into two sets. Each sub-features is sufficient to train a good classifier.

The two sets sre conditionally independent given the class. Initially two seperate classifiers are trained with the

labeled data, on the two sub-features sets respectively. Each classifier then classifies the unlabeled data, and

'teaches' the other classifier with the few unlabeled examples(and the predicted labels) they feel most confident.

Each classifier is retrained with the additional training examples given by the other classifer, and the process

repeats.

When the features naturally split into two sets, co-training may be appropriate.

Reference:

Xiaojin Zhu. Semi-Supervised Learning with Graphs.