Gradient Boosting Classifier sparse matrix issue using pandas and scikit
来源:互联网 发布:博奥软件 编辑:程序博客网 时间:2024/06/16 05:29
I have been using the following code to do multiclass classification which uses GradientBoostingClassifier from scikit-learn. I am facing a known issue with sparse matrix Conversion to dense matrix.
I have applied the following solution stackoverflow but it doesnt work for my case. Although the solution I used is meant for RandomForestClassifier but AFAIK it should work for GradientBoostingClassifier!
Also to add this code works perfectly if I replace GradientBoostingClassifier with RandomForestClassifier.
The data in this case is numeric 93 features with 8 target classes. The data can be fetched fromKaggle
# load datatrain = pd.read_csv('data/train.csv')test = pd.read_csv('data/test.csv')sample = pd.read_csv('submissions/sampleSubmission.csv')labels = train.target.valuesids = train.id.valuestrain = train.drop('id', axis=1)train = train.drop('target', axis=1)train_orig = traintest = test.drop('id', axis=1)# transform counts to TFIDF featurestfidf = feature_extraction.text.TfidfTransformer()train = tfidf.fit_transform(train)test = tfidf.transform(test).toarray() # Update line# encode labels lbl_enc = preprocessing.LabelEncoder()labels = lbl_enc.fit_transform(labels)# train a random forest classifierprint('starting training ... ')clf = ensemble.GradientBoostingClassifier( n_estimators=config.estimators)clf.fit(train, labels)# predict on test setprint('starting prediction ... ')preds = clf.predict_proba(test) # Error on this line even when test is densetrain_pred = clf.predict(tfidf.transform(train_orig))
Traceback:
python boosted_trees.py starting training ... Traceback (most recent call last): File "boosted_trees.py", line 57, in <module> clf.fit(train, labels) File "/usr/local/lib/python2.7/site- packages/sklearn/ensemble/gradient_boosting.py", line 941, in fit X, y = check_X_y(X, y, dtype=DTYPE) File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 439, in check_X_y ensure_min_features) File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 331, in check_array copy, force_all_finite) File "/usr/local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 239, in _ensure_sparse_format raise TypeError('A sparse matrix was passed, but dense 'TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.ere
【解决方法】Just in case anyone needs. The issue is in these lines.
train = tfidf.fit_transform(train)test = tfidf.transform(test).toarray() # Update line
Both lines should have a toarray() to fix this.
train = tfidf.fit_transform(train).toarray()test = tfidf.transform(test).toarray() # Update line
转自:http://stackoverflow.com/questions/29498106/gradient-boosting-classifier-sparse-matrix-issue-using-pandas-and-scikit
- Gradient Boosting Classifier sparse matrix issue using pandas and scikit
- Machine Learning with Scikit-Learn and Tensorflow 7.9 Gradient Boosting
- Gradient Boosting and GBDT
- Gradient Boosting and GBDT
- scikit-learn : GBR (Gradient boosting regression)
- Sparse Autoencoder3-Gradient checking and advanced optimization
- Gradient Boosting
- Gradient boosting
- Gradient Boosting
- Paper摘记:Bagging and Boosting for the Nearest Mean Classifier:
- Boosting与Gradient Boosting
- Boosting与Gradient Boosting
- Boosting与Gradient Boosting
- An issue caused by Initial Value in boost sparse matrix
- the jacobian matrix and the gradient matrix
- boost sparse matrix row and column
- LRSD: Low Rank and Sparse matrix Decomposition
- Boosting 和 Gradient Boosting 理解
- 欢迎使用CSDN-markdown编辑器
- textView 基本用法 设置行间距 设置文字大小重量 文字布局
- JavaScript cookie示例
- Android沉浸式模式的实现
- ViewPager+FragmentStatePagerAdapter 实现菜单
- Gradient Boosting Classifier sparse matrix issue using pandas and scikit
- 靛青K专访:iOS界的字幕组 -- SwiftGG 要做更多事
- popupwindow中的listview设置点击事件没响应
- 通过Hibernate框架搭建简单的dao层
- typedef用法
- 解决VS2008编译的程序在某些机器上运行提示“由于应用程序配置不正确,应用程序未能启动”的问题
- 小程序的swiper不显示图片
- GPRS Multislot operation
- ScrollView、RecyclerView、ScrollView嵌套ListView性能优化方案