集成算法-Xgboost

来源:互联网 发布:淘宝嘻哈店铺 编辑:程序博客网 时间:2024/06/10 03:49

Xgboost其实是将弱分类器组合起来的一种算法
核心在于加入新分类器后提升预测能力

这里写图片描述
惩罚项:欧米伽ft
其中γ是惩罚力度,T是树的个数,w是权重
这里写图片描述

这里写图片描述

Xgboost Python实例:
数据集展示:
这里写图片描述

import xgboost# First XGBoost model for Pima Indians datasetfrom numpy import loadtxtfrom xgboost import XGBClassifierfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]Y = dataset[:,8]# split data into train and test setsseed = 7test_size = 0.33X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)# fit model no training datamodel = XGBClassifier()model.fit(X_train, y_train)# make predictions for test datay_pred = model.predict(X_test)predictions = [round(value) for value in y_pred]# evaluate predictionsaccuracy = accuracy_score(y_test, predictions)print("Accuracy: %.2f%%" % (accuracy * 100.0))

这里写图片描述

需要用到xgboost模块
from xgboost import XGBClassifier
还用到了准确率测算模块
from sklearn.metrics import accuracy_score

from numpy import loadtxtfrom xgboost import XGBClassifierfrom xgboost import plot_importancefrom matplotlib import pyplot# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]y = dataset[:,8]# fit model no training datamodel = XGBClassifier()model.fit(X, y)# plot feature importanceplot_importance(model)pyplot.show()

这里写图片描述

# Tune learning_ratefrom numpy import loadtxtfrom xgboost import XGBClassifierfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import StratifiedKFold# load datadataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")# split data into X and yX = dataset[:,0:8]Y = dataset[:,8]# grid searchmodel = XGBClassifier()learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3]param_grid = dict(learning_rate=learning_rate)kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)grid_result = grid_search.fit(X, Y)# summarize resultsprint("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))means = grid_result.cv_results_['mean_test_score']params = grid_result.cv_results_['params']for mean, param in zip(means, params):    print("%f  with: %r" % (mean, param))

这里写图片描述

1.learning rate2.tree max_depthmin_child_weightsubsample, colsample_bytreegamma 3.正则化参数lambda alpha 
xgb1 = XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27)
原创粉丝点击