Ubuntu16.04分布式并行版caffe

来源:互联网 发布:吃鸡手游 知乎 编辑:程序博客网 时间:2024/06/06 04:32


主体来自参考教程:http://www.cnblogs.com/beihaidao/p/6866342.html


并行版caffe下载地址:https://github.com/yjxiong/caffe.git


下载方法:git clone https://github.com/yjxiong/caffe.git


在安装caffe之前,要先装好cuda,cudnn,opencv,openmpi,hdf5等(在我安装过程中遇到的)


CUDA和cuDNN

(安装步骤略去)

版本是CUDA8.0,cuDNN5.0


安装OpenCV

版本是2.4.13

编译命令:

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs/libcuda.so -D CUDA_ARCH_BIN=5.2 -D CUDA_ARCH_PTX="" -D WITH_CUDA=ON -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 -D WITH_NVCUVID:BOOL="1" .


安装OpenMPI

版本是2.1.0

网上教程(补个链接)


安装caffe(重点)


到caffe的目录下,即使用git clone下载好的文件夹。

将Makefile.config.example 另存一份名为Makefile.config

修改Makefile.config,最终的样子如下:

## Refer to http://caffe.berkeleyvision.org/installation.html# Contributions simplifying and improving our build system are welcome! # cuDNN acceleration switch (uncomment to build with cuDNN). USE_CUDNN := 1 # CPU-only switch (uncomment to build without GPU support).# CPU_ONLY := 1 # To customize your choice of compiler, uncomment and set the following.# N.B. the default for Linux is g++ and the default for OSX is clang++# CUSTOM_CXX := g++ # CUDA directory contains bin/ and lib/ directories that we need.CUDA_DIR := /usr/local/cuda# On Ubuntu 14.04, if cuda tools are installed via# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:# CUDA_DIR := /usr # CUDA architecture setting: going with all of them.# For CUDA < 6.0, comment the *_50 lines for compatibility.CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \        -gencode arch=compute_20,code=sm_21 \        -gencode arch=compute_30,code=sm_30 \        -gencode arch=compute_35,code=sm_35 \        -gencode arch=compute_50,code=sm_50 \        -gencode arch=compute_50,code=compute_50 # BLAS choice:# atlas for ATLAS (default)# mkl for MKL# open for OpenBlasBLAS := atlas# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.# Leave commented to accept the defaults for your choice of BLAS# (which should work)!# BLAS_INCLUDE := /path/to/your/blas# BLAS_LIB := /path/to/your/blas # Homebrew puts openblas in a directory that is not on the standard search path# BLAS_INCLUDE := $(shell brew --prefix openblas)/include# BLAS_LIB := $(shell brew --prefix openblas)/lib # This is required only if you will compile the matlab interface.# MATLAB directory should contain the mex binary in /bin. MATLAB_DIR := /usr/local/MATLAB/R2014a# MATLAB_DIR := /Applications/MATLAB_R2012b.app # NOTE: this is required only if you will compile the python interface.# We need to be able to find Python.h and numpy/arrayobject.h.PYTHON_INCLUDE := /usr/include/python2.7 \        /usr/lib/python2.7/dist-packages/numpy/core/include# Anaconda Python distribution is quite popular. Include path:# Verify anaconda location, sometimes it's in root.# ANACONDA_HOME := $(HOME)/anaconda# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \        # $(ANACONDA_HOME)/include/python2.7 \        # $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \ # We need to be able to find libpythonX.X.so or .dylib.PYTHON_LIB := /usr/lib# PYTHON_LIB := $(ANACONDA_HOME)/lib # Homebrew installs numpy in a non standard path (keg only)# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include# PYTHON_LIB += $(shell brew --prefix numpy)/lib # Uncomment to support layers written in Python (will link against Python libs) WITH_PYTHON_LAYER := 1 # Whatever else you find you need goes here.INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/includeLIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies# INCLUDE_DIRS += $(shell brew --prefix)/include# LIBRARY_DIRS += $(shell brew --prefix)/lib # Uncomment to use `pkg-config` to specify OpenCV library paths.# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)# USE_PKG_CONFIG := 1 BUILD_DIR := buildDISTRIBUTE_DIR := distribute # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171# DEBUG := 1 # The ID of the GPU that 'make runtest' will use to run unit tests.TEST_GPUID := 0 # enable pretty build (comment to see full commands)Q ?= @

然后在caffe目录下执行如下命令:

创建build文件夹并进入:

mkdir build

cd build


下面开始caffe的编译(重点重点重点!):

编译命令:

cmake -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF -DUSE_MPI=ON -DMPI_CXX_COMPILER=/home/你的路径/openmpi/bin/mpicxx ..

注意:-DCUDA_USE_STATIC_CUDA_RUNTIME=OFF在参考教程中没有,若在安装caffe时(即make all的时候)出现报错 cannot find -lopencv_dep_cudart时可以使用,重新编译后再make all


然后是安装:

make all -j8 (j8 是为了加快安装速度,可以去掉)

sudo make install (注意 sudo权限)

最后就是测试:

make runtest (参考教程中提到有两个test没有通过,我安装的时候也是。。。)


最后就是python的接口(未使用matlab接口):

这2者都是caffe装之前就装好了的。

编译python接口:

添加环境变量:

vi ~/.bashrc

写入:

export PYTHONPATH=/your/path/caffe/python:$PYTHONPATH

保存,退出,执行sourc使文件生效:
source ~/.bashrc

接着在caffe目录下:

sudo make pycaffe


make pycaffe时出现的问题

参考教程里提供了处理报错的一个链接,但是与我遇到的问题不符


我遇到的问题1:

In file included from src/caffe/solver.cpp:10:0:./include/caffe/util/io.hpp:8:18: fatal error: hdf5.h: 没有那个文件或目录compilation terminated.Makefile:516: recipe for target '.build_release/src/caffe/solver.o' failedmake: *** [.build_release/src/caffe/solver.o] Error 1

(参考链接:http://blog.csdn.net/jessir/article/details/71195115)

问题:“fatal error: hdf5.h: 没有那个文件或目录”解决方法
解决方法:
(1)在Makefile.config文件的第85行,添加/usr/include/hdf5/serial/ 到 INCLUDE_DIRS,也就是把下面第一行代码改为第二行代码。
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/



我遇到的问题2:

LD -o .build_release/lib/libcaffe.so/usr/bin/ld: 找不到 -lhdf5_hl/usr/bin/ld: 找不到 -lhdf5collect2: error: ld returned 1 exit statusMakefile:508: recipe for target '.build_release/lib/libcaffe.so' failedmake: *** [.build_release/lib/libcaffe.so] Error 1

解决方法:

(参考链接:http://blog.csdn.net/md_learning/article/details/53185992)

将# Whatever else you find you need goes here.下面的
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
修改为:
INCLUDE_DIRS :=  $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
//这是因为ubuntu16.04的文件包含位置发生了变化,尤其是需要用到的hdf5的位置,所以需要更改这一路径


cd /usr/lib/x86_64-linux-gnu

\\然后根据情况执行下面两句:
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so


最后就是输入命令:

python

import caffe 

没报错就是成功了。








原创粉丝点击