sklearn学习:make_multilabel_classification——多标签数据集方法
来源:互联网 发布:android手机装linux 编辑:程序博客网 时间:2024/06/16 19:29
Generate a random multilabel classification problem.
- For each sample, the generative process is:
- pick the number of labels: n ~ Poisson(n_labels):选取标签的数目
- n times, choose a class c: c ~ Multinomial(theta) :n次,选取类别C:多项式
- pick the document length: k ~ Poisson(length) :选取文档长度
- k times, choose a word: w ~ Multinomial(theta_c):k次,选取一个单词
- pick the number of labels: n ~ Poisson(n_labels):选取标签的数目
In the above process, rejection sampling is used to make sure that n is never zero or more than n_classes, and that the document length is never zero. Likewise, we reject classes which have already been chosen.
在上面的过程中,为确保n不为0或不超过变量n_classes,且文本长度不为0,采用拒绝抽样的方法。同样的,我们拒绝已经选择的类。
n_samples : int, optional (default=100)
The number of samples.【生成样本数】
n_features : int, optional (default=20)
The total number of features.【每个样本特征数】
n_classes : int, optional (default=5)
The number of classes of the classification problem.【分类问题类或标签总数】
n_labels : int, optional (default=2)
The average number of labels per instance. More precisely, the number of labels per sample is drawn from a Poisson distribution with
n_labels
as its expected value, but samples are bounded (usingrejection sampling) byn_classes
, and must be nonzero ifallow_unlabeled
is False.
【每个样本的平均标签数量。更准确地说,每个样本的标签数量是以泊松分布绘制的,其中n_labels为其预期值,但样本是由n_classes限定(使用注射采样),如果allow_unlabeled为False,那么它们必须非零。】
length : int, optional (default=50)
The sum of the features (number of words if documents) is drawn from a Poisson distribution with this expected value.【特征的总和(如果是文档,则为单词的数量),从具有该预期值的泊松分布绘制。】
allow_unlabeled : bool, optional (default=True)
If
True
, some instances might not belong to any class.【如果为True,一些样例可能就不属于任何一类】
sparse : bool, optional (default=False)
If
True
, return a sparse feature matrix【如果为True,返回一个稀疏的特征矩阵】New in version 0.17: parameter to allow sparse output.
return_indicator : ‘dense’ (default) | ‘sparse’ | False
If
dense
returnY
in the dense binary indicator format. If'sparse'
returnY
in the sparse binary indicator format.False
returns a list of lists of labels.
return_distributions : bool, optional (default=False)
If
True
, return the prior class probability and conditional probabilities of features given classes, from which the data wasdrawn.【如果为True,则返回先前的类概率和给定类的特征的条件概率,从中提取数据。】
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;If RandomState instance, random_state is the random number generator;If None, the random number generator is the RandomState instance usedbynp.random.【如果数字,random_state是随机数生成器使用的种子;如果是随机状态实例,random_state是随机数生成器;如果为None,则随机数生成器是np.random使用的随机状态实例。】
X : array of shape [n_samples, n_features]
The generated samples.【返回n_samples行n_features列的训练集】
Y : array or sparse CSR matrix of shape [n_samples, n_classes]
The label sets.【n_samples行n_classes列的数组或稀疏CSR阵】
p_c : array, shape [n_classes]
The probability of each class being drawn. Only returned if
return_distributions=True
.
p_w_c : array, shape [n_features, n_classes]
The probability of each feature being drawn given each class.Only returned if
return_distributions=True
.
官网教程:
"""
==============================================
Plot randomly generated multilabel dataset【绘制随机生成的多标签数据集】
==============================================
This illustrates the `datasets.make_multilabel_classification` dataset generator. Each sample consists of counts of two features (up to 50 in total), which are differently distributed in each of two classes.Points are labeled as follows, where Y means the class is present:
【数据集生成器“datasets.make_multilabel_classification”说明:】
===== ===== ===== ======
1 2 3 Color
===== ===== ===== ======
Y N N Red
N Y N Blue
N N Y Yellow
Y Y N Purple
Y N Y Orange
Y Y N Green
Y Y Y Brown
===== ===== ===== ======
A star marks the expected sample for each class; its size reflects the probability of selecting that class label.【一颗星星标志着每个类标签的预期样本,它的大小反映了
选择该类标签的概率。】
The left and right examples highlight the ``n_labels`` parameter: more of the samples in the right plot have 2 or 3 labels.Note that this two-dimensional example is very degenerate:generally the number of features would be much greater than the "document length", while here we have much larger documents than vocabulary.
Similarly, with ``n_classes > n_features``, it is much less likely that a feature distinguishes a particular class.
【左右两幅图显示“n_labels”的参数;右边的大多数样本有2到3个标签。注意,这个二维的样本是非常退化的:通常,特征的总数比“文本”的总数要多,但是在这里,我们的文本长度大于词汇数。类似地,因为``n_classes(3)> n_features(2)``,特征不太可能区分特定的类】
"""
- sklearn学习:make_multilabel_classification——多标签数据集方法
- sklearn学习(1) 数据集
- sklearn学习笔记(一)——数据预处理 sklearn.preprocessing
- 『sklearn学习』preprocessing函数——数据预处理
- Sklearn学习(二)——数据预处理(Normalization)
- 机器学习sklearn—数据的特征预处理
- Python机器学习——Sklearn——划分数据集——交叉检验
- sklearn 学习(一)数据集介绍
- sklearn学习——SVM
- sklearn 数据集
- 机器学习-->sklearn数据预处理
- sklearn特征提取方法学习
- 机器学习(九)使用sklearn库进行数据分析_——文本特征处理
- 机器学习应用——sklearn自带数据集训练(支持向量机分类)
- 机器学习应用——sklearn自带数据集训练(线性判别分析)
- python机器学习sklearn数据集iris介绍
- 机器学习sklearn iris数据集官方demo
- sklearn学习笔记3——pipeline
- PHP页面静态化
- lseek函数详解
- 第六章 中断和动态数码管
- 《机器学习实战》学习笔记(二)
- 关于未来几年的发展,闰土有话要说
- sklearn学习:make_multilabel_classification——多标签数据集方法
- 暑期学习记录12
- python转换维度
- ios-SDWebImage知识点
- 数据结构-B树
- eclipse运行web项目,提示tomcat超时45秒
- c# 调用PB中的b64_size加密解密 DES
- 安卓直播详细教程(二)-----ijkplayer集成及基本使用
- maven工程打成jar包-我的第一篇csdn博客(无图)