xgboost0.7的编译安装

来源:互联网 发布:js replace 单引号 编辑:程序博客网 时间:2024/06/06 02:45

背景

    最近研究院的同事需要使用xgboost。起初是想着在python里面给装下;因为目前开放给研究院的spark主要还是用的pyspark;
    在测试服务器上安装xgboost:pip install xgboost报错:
     #pragma message: Will need g++-4.6 or higher to compile allthe features in dmlc-core, compile without c++0x, some features may be disabled
    我使用的服务器是centos6.7.查看服务器的gcc:gcc -v得到的版本是4.4.7;
     gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)
    如果要安装xgboost,就得所有的服务器都升级gcc,这个工作量有点大;无奈另想它法;准备编译出jar包;用scala的方式使用;

编译jvm-package

官方git地址:https://github.com/dmlc/xgboost
官网地址    :http://xgboost.readthedocs.io/en/latest/

查看官方安装方式:https://xgboost.readthedocs.io/en/latest/build.html
有如下描述:

It consists of two steps:

  1. First build the shared library from the C++ codes (libxgboost.so for linux/osx and libxgboost.dll for windows).
    • Exception: for R-package installation please directly refer to the R package section.
  2. Then install the language packages (e.g. Python Package)
Important the newest version of xgboost uses submodule to maintain packages. So when you clone the repo, remember to use the recursive option as follows.
大概意思是安装要两步;第一步是构建共享代码;第二步是编译安装;
需要注意的是;xgboost最新版本用子模块来维护包,需要通过递归的方式来获取包:
git clone --recursive https://github.com/dmlc/xgboost

构建共享代码

我是在centos7版本的服务器上编译的;根据官方的步骤执行如下的命令:
cd xgboost; cp make/minimum.mk ./config.mk; make -j4
如果是使用其他版本的服务器,参照如下方法:

Building on Ubuntu/Debian

On Ubuntu, one builds xgboost by

git clone --recursive https://github.com/dmlc/xgboostcd xgboost; make -j4

Building on OSX¶

On OSX, one builds xgboost by

git clone --recursive https://github.com/dmlc/xgboostcd xgboost; cp make/minimum.mk ./config.mk; make -j4
上面这一步执行成功后;在进行第二步的编译;如果有报错;可以根据错误提示,安装相关的依赖包后再次make;

编译jvm-package

参考官方步骤:https://xgboost.readthedocs.io/en/latest/jvm/index.html

Installation

Currently, XGBoost4J only support installation from source. Building XGBoost4J using Maven requires Maven 3 or newer and Java 7+.

Before you install XGBoost4J, you need to define environment variable JAVA_HOME as your JDK directory to ensure that your compiler can find jni.h correctly, since XGBoost4J relies on JNI to implement the interaction between the JVM and native libraries.

After your JAVA_HOME is defined correctly, it is as simple as run mvn package under jvm-packages directory to install XGBoost4J. You can also skip the tests by running mvn -DskipTests=true package, if you are sure about the correctness of your local setup.

To publish the artifacts to your local maven repository, run

mvn install

Or, if you would like to skip tests, run

mvn -DskipTests install

This command will publish the xgboost binaries, the compiled java classes as well as the java sources to your local repository. Then you can use XGBoost4J in your Java projects by including the following dependency inpom.xml:

<dependency>  <groupId>ml.dmlc</groupId>  <artifactId>xgboost4j</artifactId>  <version>0.7</version></dependency>

After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running mvn package, and you can specify the version of spark with mvn -Dspark.version=2.0.0 package. (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like spark.versionscala.version, andscala.binary.version. Users also need to change the implementation by replacing SparkSession with SQLContext and the type of API parameters from Dataset[_] to Dataframe


默认的spark版本是2.1.0.scala版本是2.11.8。
mvn package
耐心等待完成编译安装。
如果需要更改spark和scala的版本,只需要更改pom.xml中的spark.versionscala.version, andscala.binary.version。再执行mvn install。

使用

安装完成后,在centos7的服务器上面执行一切正常;将xgboost4j-spark-0.7-jar-with-dependencies.jar上传到生成环境的centos6服务器上面执行;报了如下错误:

/lib64/libc.so.6: version `GLIBC_2.14' not found

google后得到原因是系统的glibc版本太低,软件编译时使用了较高版本的glibc引起的.
想办法在centos6服务器上编译xgboost;发现又报了GCC版本过低的错误。按照下面的方法升级了gcc;再次按照上面的方式编译成功;
wget http://people.centos.org/tru/devtools-2/devtools-2.repomv devtools-2.repo /etc/yum.repos.dyum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
mv /usr/bin/gcc /usr/bin/gcc-4.4.7mv /usr/bin/g++ /usr/bin/g++-4.4.7mv /usr/bin/c++ /usr/bin/c++-4.4.7ln -s /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gccln -s /opt/rh/devtoolset-2/root/usr/bin/c++ /usr/bin/c++ln -s /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++gcc --version





0 0
原创粉丝点击