machine learning in coding(python):使用xgboost构建预测模型

来源:互联网 发布:red hat linux 6.0 编辑:程序博客网 时间:2024/05/21 19:23

接上篇:http://blog.csdn.net/mmc2015/article/details/47304591



def xgboost_pred(train,labels,test):params = {}params["objective"] = "reg:linear"params["eta"] = 0.005params["min_child_weight"] = 6params["subsample"] = 0.7params["colsample_bytree"] = 0.7params["scale_pos_weight"] = 1params["silent"] = 1params["max_depth"] = 9        plst = list(params.items())#Using 5000 rows for early stopping. offset = 4000num_rounds = 10000xgtest = xgb.DMatrix(test)#create a train and validation dmatrices xgtrain = xgb.DMatrix(train[offset:,:], label=labels[offset:])xgval = xgb.DMatrix(train[:offset,:], label=labels[:offset])#train using early stopping and predictwatchlist = [(xgtrain, 'train'),(xgval, 'val')]model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=120)preds1 = model.predict(xgtest,ntree_limit=model.best_iteration)#reverse train and labels and use different 5k for early stopping. # this adds very little to the score but it is an option if you are concerned about using all the data. train = train[::-1,:]labels = np.log(labels[::-1])xgtrain = xgb.DMatrix(train[offset:,:], label=labels[offset:])xgval = xgb.DMatrix(train[:offset,:], label=labels[:offset])watchlist = [(xgtrain, 'train'),(xgval, 'val')]model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=120)preds2 = model.predict(xgtest,ntree_limit=model.best_iteration)#combine predictions#since the metric only cares about relative rank we don't need to averagepreds = (preds1)*1.4 + (preds2)*8.6return preds

(code from kaggle)



代码具体分析有时间写,欢迎吐槽。。。。



1 0