Basic Introduction of Multi-label Learning

来源：互联网发布：社交软件架构编辑：程序博客网时间：2024/06/06 16:35

Traditional supervised learning is one of the mostly-studied machine learning paradigms, where each object is represented by a single feature vector and associated with a single label. The fundamental assumption adopted by traditional supervised learning is that each object belongs to exactly one concept, namely having unique semantic meaning. This simplified assumption could simplify many classification tasks, however, in the complex real world, this assumption doesn't fit well in many learning problems. For example, one document might cover different topics such as sports, London Olympics, and ticket sales, and one blog can be labeled as algorithm, code and machine learning.

To have a better explanation of the information incorporating multiple semantics, the direct approach is to assign multiple labels to the data, then the multi-label problems are defined and proposed. In the problem, each learning instance is connected with multiple labels, whose task is to predict a reasonable label set for unseen samples.

Early researches on multi-label learning mostly focus on the problems of medical diagnosis and multi-label text classification. In the past decades, multi-label learning has gradually aroused the significant attention from machine learning and related fields, and has been widely used in the auto-annotation for multi-media data, web mining, bio-informatics, information retrieval and label recommendation.

1. The indicators of multi-label learning

label cardinality: Measure the degree of multi-labeledness

label density: Normalize label cardinality by the number of possible labels in the label space

label diversity: The number of distinct label sets appeared in the data set

label div-density: Normalize label diversity by the number of examples to indicate the proportion of distinct label sets

2. Threshold Calibration

Threshold calibration can dichotomize the labels into relevant and irrelevant label when predicting unseen instance. Threshold calibration can give the confidence of each instance with the predicted label

3. Multi-label Evaluation Metrics

in traditional supervised learning, generalization performance of the learning system is evaluated with conventional metrics such as accuracy, F-measure, area under the ROC curve (AUC), etc. However, performance evaluation in multi-label learning is much complicated than traditional single-label setting, as each example can be associated with multiple labels simultaneously. There are mainly two different evaluation metrics: example-based metrics and label-based metrics

3-1 Example-based Metrics
(1) Subset Accuracy: The subset accuracy evaluates the fraction of correctly classified examples
(2) Hamming Loss: The hamming loss evaluates the fraction of mis-classified instance-label pairs
(3) One-error: The one-error evaluates the fraction of examples whose top-ranked label is not in the relevant label set
(4) Coverage: The coverage evaluates how many steps are needed, on average, to move down the ranked label list so as to cover all the relevant labels of the example
(5) Ranking Loss: The ranking loss evaluates the fraction of reversely ordered label pairs
(6) Average Precision: The average precision evaluates the average fraction of relevant labels ranked higher than a particular label
3-2 Label-based Metrics

Macro-averaging, Micro-averaging, AUCmacro, etc

# -*- coding: utf-8 -*-"""Created on 2017/4/3 14:21 2017@author: Randolph.Lee"""from __future__ import divisionimport numpy as npdef compute_average_precision(outputs, test_target):    """    compute the average precision    :param outputs: the predicted outputs of the classifier,    the output of the ith instance for the jth class is stored in Outputs[i,j]    :param test_target: the actual labels of the test instances,    if the ith instance belong to the jth class, test_target[i,j]=1, otherwise test_target[i,j]=-1    :return: the average precision    """    '''filter the instance with full labels or no labels'''    num_instance, num_class = outputs.shape    temp_output = []    temp_test_target = []    for i in xrange(num_instance):        if abs(sum(test_target[i, :])) != num_class:            temp_output.append(outputs[i, :])            temp_test_target.append(test_target[i, :])    '''The outputs are the predicted values rather than the labels'''    outputs = np.array(temp_output)    test_target = np.array(temp_test_target)    num_instance, num_class = outputs.shape    average_precision = 0.0    for i in xrange(num_instance):        labels = [t for t in xrange(num_class) if test_target[i, t] == 1]        label_size = len(labels)        index = np.argsort(outputs[i, :])        indicator = np.zeros(num_class)        for j in xrange(label_size):            loc = np.where(index == labels[j])[0][0]            indicator[loc] = 1        summary = 0.0        for j in xrange(label_size):            loc = np.where(index == labels[j])[0][0]            summary += sum(indicator[loc:num_class]) / (num_class - loc)        average_precision += summary / label_size    average_precision /= num_instance    return average_precisiondef compute_coverage(outputs, test_target):    """    computing the coverage    :param outputs: the predicted outputs of the classifier,    the output of the ith instance for the jth class is stored in Outputs[i,j]    :param test_target: the actual labels of the test instances,    if the ith instance belong to the jth class, test_target[i,j]=1, otherwise test_target[i,j]=-1    :return: the coverage    """    '''move down the ranked label list so as to cover all the relevant labels of example'''    num_instance, num_class = outputs.shape    coverage = 0    for i in xrange(num_instance):        labels = [t for t in xrange(num_class) if test_target[i, t] == 1]        index = np.argsort(outputs[i, :])        temp_min = num_class        for label in labels:            loc = np.where(index == label)[0][0]            if loc < temp_min:                temp_min = loc        coverage += num_class - temp_min    coverage = coverage / num_instance - 1    return coveragedef compute_one_error(outputs, test_target):    """    computing the one error    :param outputs: the predicted outputs of the classifier,    the output of the ith instance for the jth class is stored in Outputs[i,j]    :param test_target: the actual labels of the test instances,    if the ith instance belong to the jth class, test_target[i,j]=1, otherwise test_target[i,j]=-1    :return: the one error    """    '''filter the instance with full labels or no labels'''    num_instance, num_class = outputs.shape    temp_output = []    temp_test_target = []    for i in xrange(num_instance):        if abs(sum(test_target[i, :])) != num_class:            temp_output.append(outputs[i, :])            temp_test_target.append(test_target[i, :])    '''Top-ranked label is not in the relevant label set'''    outputs = np.array(temp_output)    test_target = np.array(temp_test_target)    num_instance, num_class = outputs.shape    one_error = 0    for i in xrange(num_instance):        labels = [t for t in xrange(num_class) if test_target[i, t] == 1]        max_index = np.argmax(outputs[i, :])        if max_index not in labels:            one_error += 1    one_error /= num_instance    return one_errordef compute_ranking_loss(outputs, test_target):    """    computing the ranking loss    :param outputs: the predicted outputs of the classifier,    the output of the ith instance for the jth class is stored in Outputs[i,j]    :param test_target: the actual labels of the test instances,    if the ith instance belong to the jth class, test_target[i,j]=1, otherwise test_target[i,j]=-1    :return: the ranking loss    """    '''filter the instance with full labels or no labels'''    num_instance, num_class = outputs.shape    temp_output = []    temp_test_target = []    for i in xrange(num_instance):        if abs(sum(test_target[i, :])) != num_class:            temp_output.append(outputs[i, :])            temp_test_target.append(test_target[i, :])    '''an irrelevant label is ranked higher than a relevant label'''    outputs = np.array(temp_output)    test_target = np.array(temp_test_target)    num_instance, num_class = outputs.shape    rank_loss = 0    for i in xrange(num_instance):        temp = 0        labels = [t for t in xrange(num_class) if test_target[i, t] == 1]        non_labels = [t for t in xrange(num_class) if test_target[i, t] == -1]        for m in xrange(len(labels)):            for n in xrange(len(non_labels)):                if outputs[i, labels[m]] <= outputs[i, non_labels[n]]:                    temp += 1        rank_loss += temp / (len(labels) * len(non_labels))    rank_loss /= num_instance    return rank_lossdef compute_hamming_loss(pre_labels, test_target):    """    computing the hamming loss    :param pre_labels: the predicted labels of the classifier,    if the ith instance belong to the jth class, Pre_Labels[i,j]=1, otherwise Pre_Labels[i,j]=-1    :param test_target: the actual labels of the test instances,    if the ith instance belong to the jth class, test_target[i,j]=1, otherwise test_target[i,j]=-1    :return: the hamming loss    """    pre_labels = np.int64(pre_labels)    num_class, num_instance = pre_labels.shape    miss_pairs = sum(sum(pre_labels != test_target))    hamming_loss = miss_pairs / (num_class * num_instance)    return hamming_loss

0 0