sklearn.preprocessing.MultiLabelBinarizer

来源:互联网 发布:光纤交换机查看端口 编辑:程序博客网 时间:2024/05/21 15:03

多标签二值化:sklearn.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)

classes_属性:若设置classes参数时,其值等于classes参数值,否则从训练集统计标签值

①classes默认值,classes_属性值从训练集中统计标签值

In [1]: from sklearn.preprocessing import MultiLabelBinarizer   ...: mlb = MultiLabelBinarizer()   ...: mlb.fit_transform([(1, 2), (3,4),(5,)])   ...:Out[1]:array([[1, 1, 0, 0, 0],       [0, 0, 1, 1, 0],       [0, 0, 0, 0, 1]])In [2]: mlb.classes_Out[2]: array([1, 2, 3, 4, 5])

In [5]: from sklearn.preprocessing import MultiLabelBinarizer   ...: mlb = MultiLabelBinarizer(sparse_output=True)   ...: mlb.fit_transform([set(['sci-fi', 'thriller']), set(['comedy'])]).toarr   ...: ay()   ...:Out[5]:array([[0, 1, 1],       [1, 0, 0]])

②设置classes参数,classes_属性值等于classes参数值

In [3]: from sklearn.preprocessing import MultiLabelBinarizer   ...: mlb = MultiLabelBinarizer(classes = [2,3,4,5,6,1])   ...: mlb.fit_transform([(1, 2), (3,4),(5,)])   ...:Out[3]:array([[1, 0, 0, 0, 0, 1],       [0, 1, 1, 0, 0, 0],       [0, 0, 0, 1, 0, 0]])In [4]: mlb.classes_Out[4]: array([2, 3, 4, 5, 6, 1])