Bagging – building an ensemble of classifers from bootstrap samples
来源:互联网 发布:淘宝关注的主播在哪里 编辑:程序博客网 时间:2024/06/05 18:22
1. Create a more complex classifcation problem using theWine dataset
import pandas as pddf_wine = pd.read_csv('./datasets/wine/wine.data', header=None)# https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.datadf_wine.columns = ['Class label', 'Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 'Magnesium', 'Total phenols', 'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue', 'OD280/OD315 of diluted wines', 'Proline']# drop 1 classdf_wine = df_wine[df_wine['Class label'] != 1]y = df_wine['Class label'].valuesX = df_wine[['Alcohol', 'Hue']].values
2. Next encode the class labels into binary format and split the dataset into 60 percent training and 40 percent test set, respectively:
from sklearn.preprocessing import LabelEncoderfrom sklearn.model_selection import train_test_splitle = LabelEncoder()y = le.fit_transform(y)X_train, X_test, y_train, y_test =\ train_test_split(X, y, test_size=0.40, random_state=1)3. Use an unpruned decision tree as the base classifer and create an ensemble of 500 decision trees ftted on different bootstrap samples of the training dataset
from sklearn.ensemble import BaggingClassifiertree = DecisionTreeClassifier(criterion='entropy', max_depth=None, random_state=1)bag = BaggingClassifier(base_estimator=tree, n_estimators=500, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, n_jobs=1, random_state=1)4. calculate the accuracy score of the prediction on the training and test dataset to compare the performance of the bagging classifer to the performance of a single unpruned decision tree
from sklearn.metrics import accuracy_scoretree = tree.fit(X_train, y_train)y_train_pred = tree.predict(X_train)y_test_pred = tree.predict(X_test)tree_train = accuracy_score(y_train, y_train_pred)tree_test = accuracy_score(y_test, y_test_pred)print('Decision tree train/test accuracies %.3f/%.3f' % (tree_train, tree_test))
Decision tree train/test accuracies 1.000/0.833
bag = bag.fit(X_train, y_train)y_train_pred = bag.predict(X_train)y_test_pred = bag.predict(X_test)bag_train = accuracy_score(y_train, y_train_pred)bag_test = accuracy_score(y_test, y_test_pred)print('Bagging train/test accuracies %.3f/%.3f' % (bag_train, bag_test))
Bagging train/test accuracies 1.000/0.896
6. compare the decision regions between the decision tree and bagging classifer
x_min = X_train[:, 0].min() - 1x_max = X_train[:, 0].max() + 1y_min = X_train[:, 1].min() - 1y_max = X_train[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))f, axarr = plt.subplots(nrows=1, ncols=2, sharex='col', sharey='row', figsize=(8, 3))for idx, clf, tt in zip([0, 1], [tree, bag], ['Decision Tree', 'Bagging']): clf.fit(X_train, y_train) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) axarr[idx].contourf(xx, yy, Z, alpha=0.3) axarr[idx].scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1], c='blue', marker='^') axarr[idx].scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1], c='red', marker='o') axarr[idx].set_title(tt)axarr[0].set_ylabel('Alcohol', fontsize=12)plt.text(10.2, -1.2, s='Hue', ha='center', va='center', fontsize=12)plt.show()
7. Results:
The piece-wise linear decision boundary of the three-node deep decision tree looks smoother in the bagging ensemble
0 0
- Bagging – building an ensemble of classifers from bootstrap samples
- Ensemble Learning: Bootstrap aggregating (Bagging) & Boosting & Stacked generalization (Stacking)
- Ensemble methods --Bagging meta-estimator
- One Millisecond Face alignment with an Ensemble of Regression Trees
- 【V8.Internal】Building V8 from bootstrap
- Building an ARM GCC Toolchain from Source
- building an array from several ranges
- Ensemble learning:Bagging,Random Forest,Boosting
- bootstrap bagging boosting
- Bootstrap aggregating----Bagging
- bootstrap, boosting, bagging
- Bootstrap,Bagging,Boosting
- Bagging and bootstrap
- 【论文笔记】One Millisecond Face Alignment with an Ensemble of Regression Trees
- 【论文笔记】One Millisecond Face Alignment with an Ensemble of Regression Trees
- 【论文笔记】One Millisecond Face Alignment with an Ensemble of Regression Trees
- Ensemble learning algorithms(Bagging, Boosting, Random Foreast)
- 集成学习(ensemble learning):bagging、boosting、random forest总结
- 继承(Inheritance)与复合(Composition)关系下的构造与析构
- 2017,回来了~
- Shell 练习题 21—30,内附答案
- 关于如何单独失能STM32 TIM通道的方法
- 设计模式-3-建造者模式
- Bagging – building an ensemble of classifers from bootstrap samples
- C++之拷贝构造与拷贝赋值
- AJAX
- 常用操作
- Android开发:setAlpha()方法和常用RGB颜色表----颜色, r g b分量数值(int), 16进制表示 一一对应
- c++第五作业—三角形类
- 【LeetCode】【Python】【C++】7. Reverse Integer代码实现
- 通过反射取类的属性
- LeetCode 80. Remove Duplicates from Sorted Array II