xgboost快速入门

来源：互联网发布：linux登陆root用户编辑：程序博客网时间：2024/05/17 22:00

xgboost快速入门

xgboost是gbdt算法的实现，可以做回归，分类，和排序。支持各种语言调用，支持单机和分布式。非常适合于大规模数据集。

项目主页

https://github.com/dmlc/xgboost

安装https://github.com/dmlc/xgboost/blob/master/doc/python/python_intro.md

我选择了python调用xgboost的方式。

1 从项目主页下载源码，解压。

2 在解压后的目录下执行make命令安装。

3 在子文件夹python-package目录下，执行pythonsetup.py install。

当然，你的电脑可能会缺失一些依赖库需要安装。比如在步骤二需要你安装g++，在步骤三需要你安全python的一些数学库。

分类算法实践

https://github.com/dmlc/xgboost/tree/master/demo/guide-python

这个页面有很多demo都值得研究一下。

下面是一个二分类的问题的具体做法。

首先，输入数据仍然支持libsvm的格式，这也是比较喜欢的一个格式。

每一行都是

label index1:value1 index2:value2……

的格式。

不过xgboost对label的有个要求，就是要从0开始。

比如2分类，label只能是0,1。

3分类，label只能是0,1,2。

#! /usr/bin/pythonimport numpy as npimport xgboost as xgbdtrain = xgb.DMatrix('train.txt')dtest = xgb.DMatrix('test.txt')# specify parameters via map, definition are same as c++ versionparam = {'max_depth':22, 'eta':0.1, 'silent':0, 'objective':'binary:logistic','min_child_weight':3,'gamma':14 }# specify validations set to watch performancewatchlist  = [(dtest,'eval'), (dtrain,'train')]num_round = 33bst = xgb.train(param, dtrain, num_round, watchlist)# this is predictionpreds = bst.predict(dtest)labels = dtest.get_label()print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))print ('correct=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)==labels[i]) /float(len(preds))))

本文作者:linger

本文链接：http://blog.csdn.net/lingerlanlan/article/details/49804551

1 0