集成算法-Xgboost
来源:互联网 发布:淘宝嘻哈店铺 编辑:程序博客网 时间:2024/06/10 03:49
Xgboost其实是将弱分类器组合起来的一种算法
核心在于加入新分类器后提升预测能力
惩罚项:欧米伽ft
其中γ是惩罚力度,T是树的个数,w是权重
Xgboost Python实例:
数据集展示:
import xgboost# First XGBoost model for Pima Indians datasetfrom numpy import loadtxtfrom xgboost import XGBClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]Y = dataset[:,8]# split data into train and test setsseed = 7test_size = 0.33X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)# fit model no training datamodel = XGBClassifier()model.fit(X_train, y_train)# make predictions for test datay_pred = model.predict(X_test)predictions = [round(value) for value in y_pred]# evaluate predictionsaccuracy = accuracy_score(y_test, predictions)print("Accuracy: %.2f%%" % (accuracy * 100.0))
需要用到xgboost模块
from xgboost import XGBClassifier
还用到了准确率测算模块
from sklearn.metrics import accuracy_score
from numpy import loadtxtfrom xgboost import XGBClassifierfrom xgboost import plot_importancefrom matplotlib import pyplot# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]y = dataset[:,8]# fit model no training datamodel = XGBClassifier()model.fit(X, y)# plot feature importanceplot_importance(model)pyplot.show()
# Tune learning_ratefrom numpy import loadtxtfrom xgboost import XGBClassifierfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import StratifiedKFold# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]Y = dataset[:,8]# grid searchmodel = XGBClassifier()learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3]param_grid = dict(learning_rate=learning_rate)kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)grid_result = grid_search.fit(X, Y)# summarize resultsprint("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))means = grid_result.cv_results_['mean_test_score']params = grid_result.cv_results_['params']for mean, param in zip(means, params): print("%f with: %r" % (mean, param))
1.learning rate2.tree max_depthmin_child_weightsubsample, colsample_bytreegamma 3.正则化参数lambda alpha
xgb1 = XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27)
阅读全文
0 1
- 集成算法-Xgboost
- 集成算法-xgboost/bagging/voting
- 机器学习集成算法:XGBoost思想
- 机器学习集成算法:XGBoost思想
- 机器学习集成算法:XGBoost思想
- 机器学习集成算法:XGBoost模型构造
- 深度 | 机器学习集成算法:XGBoost思想
- 干货 | 机器学习集成算法:XGBoost模型构造
- xgboost 与 GBDT算法
- xgboost 算法原理
- python xgboost算法实践
- XGBoost算法原理
- xgboost算法原理
- 机器学习-->集成学习-->Xgboost详解
- XGBoost算法原理及其实现
- xgboost算法原理与实战
- XGBoost算法原理简析
- “比赛算法”之xgboost算法系列
- 7-3 7-4 正则化在线性回归和非线性回归
- Apache和Nginx的优缺点
- 关于乱码问题
- webpack3 针对一般项目用这份配置基本够了
- Python 2 和 Python 3 之间的区别
- 集成算法-Xgboost
- swap分区
- Java 三大特性是:封装,继承,多态
- HDU6197 最长有序子序列 DP+二分查找
- PAT (Basic Level) Practise (中文) 1057. 数零壹(20)
- 学习Java小结-2
- 貌不惊人,但味道一定惊艳你
- BCD码和十进制的相互转换
- ForkJoin框架(一):ForkJoin框架概述