sklearn.preprocessing.Binarizer

来源:互联网 发布:聊天机器人 知乎 编辑:程序博客网 时间:2024/05/21 14:52

Binarizer类和binarize方法根据指定的阈值将特征二值化,小于等于阈值的,将特征值赋予0,大于特征值的赋予1,其阈值threshold默认都为0

①binarize方法:sklearn.preprocessing.binarize(X, threshold=0.0, copy=True)

a、对于非稀疏矩阵而言,阈值threshold可以设置任何浮点数

In [1]: from sklearn import preprocessing   ...: from sklearn import datasets   ...: import numpy as np   ...: data = datasets.load_boston()   ...: new_target  = preprocessing.binarize(data.target[:,np.newaxis] , thresh   ...: old = data.target.mean()).astype(int)#小于等于均值赋予0,否则赋予1   ...: print(type(preprocessing.binarize(data.target[:,np.newaxis] , threshold   ...:  = data.target.mean())))   ...: new_target[:5]   ...:<class 'numpy.ndarray'>Out[1]:array([[1],       [0],       [1],       [1],       [1]])In [2]: preprocessing.binarize(data.target[:,np.newaxis] , threshold = -1).asty   ...: pe(int)[:5]Out[2]:array([[1],       [1],       [1],       [1],       [1]])
b、对于稀疏矩阵而言,阈值threshold必须设置为大于等于0浮点数
In [3]: from scipy.sparse import coo   ...: from sklearn import preprocessing   ...: spar = coo.coo_matrix(np.random.binomial(1,0.25,100))   ...: preprocessing.binarize(spar,threshold=-1)   ...:---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-3-ff778f656a6b> in <module>()      2 from sklearn import preprocessing      3 spar = coo.coo_matrix(np.random.binomial(1,0.25,100))----> 4 preprocessing.binarize(spar,threshold=-1)d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in binarize(X, threshold, copy)   1470     if sparse.issparse(X):   1471         if threshold < 0:-> 1472             raise ValueError('Cannot binarize a sparse matrix with threshold '   1473                              '< 0')   1474         cond = X.data > thresholdValueError: Cannot binarize a sparse matrix with threshold < 0In [4]: preprocessing.binarize(spar,threshold=0)Out[4]:<1x100 sparse matrix of type '<class 'numpy.int32'>'        with 24 stored elements in Compressed Sparse Row format>
②Binarizer类:sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)

a、对于非稀疏矩阵而言,阈值threshold可以设置任意浮点数

In [5]: from sklearn import preprocessing   ...: from sklearn import datasets   ...: import numpy as np   ...: data = datasets.load_boston()   ...: bz = preprocessing.Binarizer(data.target.mean())   ...: new_target = bz.fit_transform(data.target[:,np.newaxis]).astype(int)   ...: print(bz)   ...: new_target[:5]   ...:Binarizer(copy=True, threshold=22.532806324110677)Out[5]:array([[1],       [0],       [1],       [1],       [1]])In [6]: preprocessing.Binarizer(-1).fit_transform(data.target[:,np.newaxis]).as   ...: type(int)[:5]Out[6]:array([[1],       [1],       [1],       [1],       [1]])
b、对于稀疏矩阵而言,阈值threshold同样必须设置为大于等于0浮点数

In [7]: from scipy.sparse import coo   ...: spar = coo.coo_matrix(np.random.binomial(1,0.25,100))   ...: preprocessing.Binarizer(threshold= -1).fit_transform(spar)   ...:---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-7-fc5a78d3b8c5> in <module>()      1 from scipy.sparse import coo      2 spar = coo.coo_matrix(np.random.binomial(1,0.25,100))----> 3 preprocessing.Binarizer(threshold= -1).fit_transform(spar)d:\softwore\python\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)    492         if y is None:    493             # fit method of arity 1 (unsupervised transformation)--> 494             return self.fit(X, **fit_params).transform(X)    495         else:    496             # fit method of arity 2 (supervised transformation)d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in transform(self, X, y, copy)   1549         """   1550         copy = copy if copy is not None else self.copy-> 1551         return binarize(X, threshold=self.threshold, copy=copy)   1552   1553d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in binarize(X, threshold, copy)   1470     if sparse.issparse(X):   1471         if threshold < 0:-> 1472             raise ValueError('Cannot binarize a sparse matrix with threshold '   1473                              '< 0')   1474         cond = X.data > thresholdValueError: Cannot binarize a sparse matrix with threshold < 0






原创粉丝点击