sklearn错误解决:ValueError: Class label 0 not present

来源:互联网 发布:linux mysql 停止 编辑:程序博客网 时间:2024/06/07 02:31

问题来源

最近在使用sklearn中的svm进行文本分类的工作,在使用sklearn集成的Grid Search进行参数寻优的时候出现bug:ValueError: Class label 0 not present,将debug的结论记录在此。

出错代码

svr = SVC(kernel='rbf',  probability=True, class_weight={0:1.0, 1:1.5})gammals = [i*0.1 for i in range(30)]clf = GridSearchCV(svr, {'gamma':gammals}, verbose=1)clf.fit(X, Y)

sklearn 报错内容

Traceback (most recent call last):  File "svm_grid_search.py", line 115, in <module>    grid_search()  File "svm_grid_search.py", line 88, in grid_search    clf.fit(X, Y)  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 945, in fit    return self._fit(X, y, groups, ParameterGrid(self.param_grid))  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_search.py", line 564, in _fit    for parameters in parameter_iterable  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__    while self.dispatch_one_batch(iterator):  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch    self._dispatch(tasks)  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch    job = self._backend.apply_async(batch, callback=cb)  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async    result = ImmediateResult(func)  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__    self.results = batch()  File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__    return [func(*args, **kwargs) for func, args, kwargs in self.items]  File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_validation.py", line 238, in _fit_and_score    estimator.fit(X_train, y_train, **fit_params)  File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 152, in fit    y = self._validate_targets(y)  File "/usr/lib64/python2.7/site-packages/sklearn/svm/base.py", line 522, in _validate_targets    self.class_weight_ = compute_class_weight(self.class_weight, cls, y_)  File "/usr/lib64/python2.7/site-packages/sklearn/utils/class_weight.py", line 79, in compute_class_weight    raise ValueError("Class label %d not present." % c)ValueError: Class label 0 not present.

报错解读

在类别列表中找不到权重列表中的类别标签。
类别列表是sklearn从你给的数据集中自动获取的,权重列表是你给的,是估计器的输入参数,在我的代码里为下面的class_weight:

svr = SVC(kernel='rbf',  probability=True, class_weight={0:1.0, 1:1.5})

我的原因

我的类别列表是:

['0','1']

而我的权重字典是:

{0:0.2, 1:0.8}

类别列表中的标签是字符串形式,而权重字典中的类别标签是整型;
因此报错。

解决办法

在加载数据的时候,对读进来的类别标签进行转化,从字符串类型转为整型:

#labels.append(line_sp[0])labels.append(int(line_sp[0]))

如果你出现同样的bug,可以考虑按照这个方向检查。

阅读全文
0 0