基于xgboost 的贷款风险预测

来源:互联网 发布:退火算法实例matlab 编辑:程序博客网 时间:2024/05/01 05:41

    
    现在我们用传说中的xgboost 对这个数据集进行计算

#!/usr/bin/env python3# -*- coding: utf-8 -*-"""Created on Sat Aug 19 13:19:26 2017@author: luogan"""from sklearn.preprocessing import LabelEncoderfrom collections import defaultdictd = defaultdict(LabelEncoder)dff =df.apply(lambda df: d[df.name].fit_transform(df))dff.to_excel('dff.xls')import pandas as pdimport numpy as npimport xgboost as xgbfrom xgboost.sklearn import XGBClassifierfrom sklearn import cross_validation, metrics   #Additional     scklearn functionsfrom sklearn.grid_search import GridSearchCV   #Perforing grid searchimport matplotlib.pylab as plt#%matplotlib inlinefrom matplotlib.pylab import rcParamsrcParams['figure.figsize'] = 12, 4train = pd.read_excel('dff.xls')target = 'safe_loans'IDcol = 'id'def modelfit(alg, dtrain, predictors,useTrainCV=True, cv_folds=5, early_stopping_rounds=50):    if useTrainCV:        xgb_param = alg.get_xgb_params()        xgtrain = xgb.DMatrix(dtrain[predictors].values, label=dtrain[target].values)        cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,            metrics='auc', early_stopping_rounds=early_stopping_rounds)        alg.set_params(n_estimators=cvresult.shape[0])    #Fit the algorithm on the data    alg.fit(dtrain[predictors], dtrain['safe_loans'],eval_metric='auc')    #Predict training set:    dtrain_predictions = alg.predict(dtrain[predictors])    dtrain_predprob = alg.predict_proba(dtrain[predictors])[:,1]    from pandas import DataFrame    '''    gg=DataFrame(dtrain_predictions)    gg.to_excel('dtrain_predictions.xls')       tt=DataFrame(dtrain_predprob)    tt.to_excel('dtrain_predprob.xls')    '''    #Print model report:    print ("\nModel Report")    print ("Accuracy : %.4g" % metrics.accuracy_score(dtrain['safe_loans'].values, dtrain_predictions))    print ("AUC Score (Train): %f" % metrics.roc_auc_score(dtrain['safe_loans'], dtrain_predprob))    feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)    feat_imp.plot(kind='bar', title='Feature Importances')    plt.ylabel('Feature Importance Score')#Choose all predictors except target & IDcolspredictors = [x for x in train.columns if x not in [target, IDcol]]xgb1 = XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=18, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=4, scale_pos_weight=1, seed=27)modelfit(xgb1, train, predictors)     
Model ReportAccuracy : 0.9533AUC Score (Train): 0.990971

    正确率95%,甩决策树和BP网几条街啊!
可见传说中的xgboost果然厉害,难怪工业实践中xgboost 应用如痴的广泛
下图显示了每个 feature的重要性

这里写图片描述

代码文件下载

github代码

原创粉丝点击