Ubuntu16.04下安装Caffe

来源：互联网发布：帝国的终结知乎编辑：程序博客网时间：2024/06/07 03:22

Ubuntu16.04下安装Caffe

最近看论文，很多模型代码给的都是Caffe，没办法只能装一下Caffe了。

一句话总结：安装Caffe一路坎坷啊～

Caffe依赖总览

Caffe需要各种依赖包，比较难整的是OpenCV，CUDA。

Caffe has several dependencies:

CUDA is required for GPU mode.
- library version 7+ and the latest driver version are recommended, but 6.* is fine too
- 5.5, and 5.0 are compatible but considered legacy
BLAS via ATLAS, MKL, or OpenBLAS.
Boost >= 1.55
protobuf, glog, gflags, hdf

Optional dependencies:

OpenCV >= 2.4 including 3.0
IO libraries: lmdb, leveldb (note: leveldb requires snappy)
cuDNN for GPU acceleration (v6)

系统环境

我的安装方法是采用一个安装脚本来做，在此之前，先看看我的环境：

系统: Ubuntu16.04
CUDA8.0+cudnn(v6) (因为我已经配置了GPU版的tensorflow，这些关于显卡的已经配置好了)
硬件: 7700K+1080 (没有显卡就不用考虑CUDA和cudnn了，关于CUDA安装可参考我安装tensorflow的blog)
Anaconda3(上面blog也有)

安装计划

我的计划是：

先配置好OpenCV(使用脚本安装，大多数依赖包安装都在脚本中写了)
安装Caffe

下来就先来装OpenCV。

安装OpenCV

其实在安装Caffe之前，我调SSD程序时候，需要用OpenCV读取视频，当时装OpenCV，折腾了半天失败
了，这次没办法了，必须要整出来了。

安装OpenCV前最好先安装ffmpeg，防止后面会有一些问题。

安装ffmpeg

再安装OpenCV前需要先安装FFmpeg。

安装ffmpeg参考redstarofsleep的blog这个教程。

安装FFmpeg依赖包

sudo apt-get install libfaac-dev libmp3lame-dev libtheora-dev sudo apt-get install libvorbis-dev libxvidcore-dev libxext-dev libxfixes-dev

在FFmpeg的官网上下载压缩包.bz2(获取直接git下来代码)

下载速度比较慢的话，可在我的CSDN上下载
解压FFmpeg安装包

# 进入压缩包在的目录内tar -xjvf ffmpeg-3.4.1.tar.bz2  # 解压cd ffmpeg-3.4.1 # 进入解压后目录

配置FFmpeg并安装

# ./configure配置 ./configure --enable-nonfree --enable-pic --enable-shared --disable-asmmakemake install

到这里，FFMEPG就算安装完事了，下面可以正式的安装OpenCV了～

安装OpenCV本身

上一次安装我是自己git下来OpenCV的源码，然后一点点配置，后面都不知道哪里出了问题，这次从github上找了一个安装脚本，方便了很多。

先下载安装脚本

git clone https://github.com/jayrambhia/install-opencv

运行安装脚本

cd Ubuntuchmod +x * ./opencv_latest.sh

默默的等待完成即可～

如果安装在OpenCV编译期间出现错误，我们解决完错误后，可以直接进入

# 刚脚本建立的opencv目录cd ~/Install-OpenCV/Ubuntu/OpenCV/opencv-3.3.1/build# 在这个目录下再编译即可# make

OpenCV安装过程出现错误

错误1

错误描述：

/usr/bin/ld: /usr/local/lib/libavcodec.a(hevc_cabac.o): relocation R_X86_64_PC32 against 符号 `ff_h264_cabac_tables' can not be used when making a shared object; recompile with -fPIC/usr/bin/ld: 最后的链结失败: 错误的值collect2: error: ld returned 1 exit status

原因：
没有安装好FFmepg，具体原因参考FFMPEG错误

解决方法：

安装FFmepg(安装方法见上文)

PS：尤其是要注意，在配置FFmepg安装时，要加入--enable-nonfree --enable-pic --enable-shared --disable-asm这几条指令。

# ./configure配置 ./configure --enable-nonfree --enable-pic --enable-shared --disable-asmmakemake install

到此为止，OpenCV算是安装完事了，接下来就要安装Caffe本身了～

安装Caffe依赖包

安装各种依赖：

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compilersudo apt-get install --no-install-recommends libboost-all-dev

接着安装：

sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev# ubuntu系统独有安装包sudo apt-get install libatlas-devsudo apt-get install liblapack-devsudo apt-get install  libatlas-base-dev

安装ATLAS：
AUTOMATIC TUNED LINEAR ALGEBRA SOFTWARE，BLAS线性算法库的优化版本

sudo apt-get install LIBATLAS-BASE-DEV

到此，Caffe的依赖算是安装完事了～

安装caffe

先下载caffe
直接从github上git下来源码：

git clone https://github.com/BVLC/caffe.git

cd caffe

执行安装指令：

cp Makefile.config.example Makefile.config # 拷贝一个安装配置文件make all -j8    # 编译caffe -j4是使用多核优化，加速编译  我使用的是-j8# make all的时间比较长，过程出现了错误，要先解决make all，再进行下一步

这里写图片描述

如果一切正常，进行下一步，否则，先解决问题～

编译test

make test -j4

这里写图片描述

如果一切正常，进行下一步～

运行测试项目

make runtest -j4

这里写图片描述

如果一切正常，会显示运行的测试项目有多少个，CPU版本的Caffe编译测试结果是共1000+项，GPU版本的测试运行结果是2000+项.

编译pycaffe

make pycaffe

到这里，算是安装完事了～

安装Caffe出现的错误

错误1

报错描述：

/usr/bin/ld: 找不到 -lhdf5_hl/usr/bin/ld: 找不到 -lhdf5collect2: error: ld returned 1 exit status# or /usr/bin/ld: cannot find -lhdf5_hl/usr/bin/ld: cannot find -lhdf5collect2: error: ld returned 1 exit status

分析原因：

可能是链接器找不到hdf5_hl和hdf5这两个库～

解决办法：

修改caffe目录下的Makefile文件：

gedit Makefile

搜索关键字，找到下面一句话，并修改:

#LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

即把第一行的内容注释掉，换成第二行的就可以了！

错误2

错误描述：

LD -o .build_release/lib/libcaffe.so.1.0.0

解决办法：

安装OpenCV，上文有安装方法

错误3

错误描述：

这是在make all成功后，但是make test一直有问题。

.build_release/src/caffe/test/test_lrn_layer.o：在函数‘caffe::CuDNNLRNLayerTest_TestGradientWithinChannel_Test<float>::TestBody()’中：test_lrn_layer.cpp:(.text._ZN5caffe48CuDNNLRNLayerTest_TestGradientWithinChannel_TestIfE8TestBodyEv[_ZN5caffe48CuDNNLRNLayerTest_TestGradientWithinChannel_TestIfE8TestBodyEv]+0x154)：对‘caffe::CuDNNLCNLayer<float>::~CuDNNLCNLayer()’未定义的引用

分析原因：

一开始我认为是CUDA配置的有问题，网上也有很多解决方法：

常见的解决办法：

locate libcudnn* # 定位到cuda目录

定位结果如下：

/root/cuda/lib64/libcudnn.so/root/cuda/lib64/libcudnn.so.6/root/cuda/lib64/libcudnn.so.6.0.21/root/cuda/lib64/libcudnn_static.a/usr/local/cuda-8.0/lib64/libcudnn.so/usr/local/cuda-8.0/lib64/libcudnn.so.6/usr/local/cuda-8.0/lib64/libcudnn.so.6.0.21/usr/local/cuda-8.0/lib64/libcudnn.so.7/usr/local/cuda-8.0/lib64/libcudnn.so.7.0.1/usr/local/cuda-8.0/lib64/libcudnn_static.a

你可能和我的定位结果不一样，没关系

进入cudnn目录，并将cudnn文件拷贝到CUDA和系统目录下

cd <cuDNN path># 注意 cudnn是在include目录下sudo cp cudnn.h /usr/local/cuda/include# libcndnn是在lib64目录下sudo cp libcudnn* /usr/local/cuda/lib64

将cudnn和CUDA配置好

# Add /usr/local/cuda/lib64 to LD_LIBRARY_PATH in ~/.bashrc# 打开~/.bashrc文件gedit ~/.bashrc#在后面添加一行export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"# 将cudnn的文件拷贝到运行目录下sudo cp libcudnn* /usr/lib/libcudnn/sudo cp libcudnn* /usr/local/libsudo cp libcudnn* /usr/local/lib# 保存并退出sudo ldconfig # 使刚配置起效

到这里，你的问题可能就已经解决了！

但是我还是报错，弄了半天，还是不行，我好好分析了一下原因。

进一步分析原因：

上面的一顿操作都是因为CUDA和cudnn没有配置好，但是我应该不存在这个问题，因为我安装TensorFlow的时候，已经配置好了。

终于，我想起来，我以前在make all的时候出错了，但是我当时手贱make test了一下(好像还make成功了，反正很玄乎)。可能是存在冲突文件，所以我用了新的解决办法：

新的解决办法：

先清空一下make缓存

make clean

再重make all

make all    # 一次成功了make test   # 也成功了make runtest # 也成功了

到此算是安装完事了～！

Caffe跑MNIST程序

Caffe官方提供了一系列的example供用户学习。可参见Caffe/examples.

本次的MNIST-LENet参考官方教程。

在提供的examples里，Caffe把数据放在./data文件夹下,处理后的数据和模型文件等放在 ./examples文件夹下。本次的MNIST数据集即在./data/mnist下，对应的模型和配置文件在 ./examples/mnist下.

准备数据集

先进入Caffe的根目录($CAFFE_ROOT)：

cd ~/Caffe

下载MNIST数据集：

# 运行get_mnist.sh脚本./data/mnist/get_mnist.sh

我们可以看一下这个脚本干啥了(gedit get_mnist.sh):

#!/usr/bin/env sh# This scripts downloads the mnist data and unzips it.DIR="$( cd "$(dirname "$0")" ; pwd -P )"cd "$DIR"echo "Downloading..."for fname in train-images-idx3-ubyte train-labels-idx1-ubyte t10k-images-idx3-ubyte t10k-labels-idx1-ubytedo    if [ ! -e $fname ]; then        wget --no-check-certificate http://yann.lecun.com/exdb/mnist/${fname}.gz        gunzip ${fname}.gz    fidone

可以看到该shell脚本从http://yann.lecun.com/exdb/mnist/${fname}.gz依次下载了train-images-idx3-ubyte ， train-labels-idx1-ubyte ， t10k-images-idx3-ubyte， t10k-labels-idx1-ubyte4个文件。

等待一段时间下载完毕后解压。

Caffe不直接接收这样的数据集，需要处理成lmdb：

使用create_mnist.sh脚本处理数据:

./examples/mnist/create_mnist.sh

这里写图片描述

我们也可以看看这个脚本干了啥:

#!/usr/bin/env sh# This script converts the mnist data into lmdb/leveldb format,# depending on the value assigned to $BACKEND.set -eEXAMPLE=examples/mnistDATA=data/mnistBUILD=build/examples/mnistBACKEND="lmdb"echo "Creating ${BACKEND}..."rm -rf $EXAMPLE/mnist_train_${BACKEND}rm -rf $EXAMPLE/mnist_test_${BACKEND}$BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte \  $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}$BUILD/convert_mnist_data.bin $DATA/t10k-images-idx3-ubyte \  $DATA/t10k-labels-idx1-ubyte $EXAMPLE/mnist_test_${BACKEND} --backend=${BACKEND}echo "Done."

可以看到使用的是./build/examples/mnist/convert_mnist_data.bin工具完成转换的，这里就不深入看了

到这里数据集算是准备好了，存储在./examples/mnist/下.

这里写图片描述

mnist_train_lmdb, and mnist_test_lmdb.

LeNet模型

这里写图片描述

Caffe的模型文件是以.prototxt结尾，Caffe提供的LeNet文件在./examples/mnist/lenet_train_test.prototxt,我们可以打开看看：

数据输入层：

name: "LeNet"layer {  name: "mnist"     //该layer名为mnist  type: "Data"      //layer类型  top: "data"       //top为输出blob，共输出两个blob  top: "label"  include {    phase: TRAIN    //指定训练阶段work  }  transform_param {    scale: 0.00390625   //数据变换(1/255 = .0039)  }  data_param {    source: "examples/mnist/mnist_train_lmdb"  //数据源地址    batch_size: 64  //batch大小    backend: LMDB   //数据集类型  }}layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {    phase: TEST     //测试时加载  }  transform_param {    scale: 0.00390625  }  data_param {    source: "examples/mnist/mnist_test_lmdb"    batch_size: 100    backend: LMDB  }}

数据层比较清晰，无论是TEST还是TRAIN都是读取数据输出data和label。

接下来就是模型的卷积层组合了：

layer {  name: "conv1"  type: "Convolution"   //类型为卷积  bottom: "data"  top: "conv1"  param {    lr_mult: 1  //  weights学习率  }  param {    lr_mult: 2  // bias学习率，设置为2更容易收敛  }  convolution_param {    num_output: 20      //输出多少个特征图个数  即卷积核数目    kernel_size: 5      // 卷积核大小    stride: 1       //步长    weight_filler {      type: "xavier"    //权重初始化类型    }    bias_filler {      type: "constant"  // bias初始化类型 constant默认填充0    }  }}layer {  name: "pool1"  type: "Pooling"   //池化  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX       //最大池化    kernel_size: 2    stride: 2  }}layer {  name: "conv2"  type: "Convolution"  bottom: "pool1"  top: "conv2"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  convolution_param {    num_output: 50    kernel_size: 5    stride: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "pool2"  type: "Pooling"  bottom: "conv2"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}

看完前面用于特征提取的卷积层，下面看看分类的FC层：

layer {  name: "ip1"  type: "InnerProduct"      // FC层  bottom: "pool2"  top: "ip1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 500    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "relu1"  type: "ReLU"      //激活函数  bottom: "ip1"  top: "ip1"}layer {  name: "ip2"  type: "InnerProduct"  bottom: "ip1"  top: "ip2"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 10    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}

FC层输出分类结果，接下来就是计算精度和损失了：

layer {  name: "accuracy"  type: "Accuracy"      //输出精度  bottom: "ip2"  bottom: "label"  top: "accuracy"  include {    phase: TEST  }}layer {  name: "loss"  type: "SoftmaxWithLoss"   //softmax and the multinomial logistic loss   bottom: "ip2"  bottom: "label"  top: "loss"}

Caffe自带了绘图工具./python/drew_net.py，可使用该工具来绘制模型图。(使用该工具需要在caffe目录下执行make pycaffe操作)：

使用绘图工具绘制该模型图:

~/caffe# python/draw_net.py  examples/mnist/lenet_train_test.prototxt  examples/mnist/lenet_train_test.png

这里写图片描述

附加笔记：定制layer 规则

在定义Layer时可以指定Layer在模型内的运行规则，模板如下:

layer{    // ... layer definition ...    inlcude: {        phase: TRAIN    }}

这就是layer规则模板，控制layer在模型的状态，可以在./src/caffe/proto/caffe.proto获取更多信息和主题。

在上面例子中，大部分的layer没有设置规则，默认情况是该layer一直存在模型中。注意到accuracylayer 只在TEST阶段使用，设置了100次迭代计算一次，设置见lenet_solver.prototxt。

模型优化器

上面定义了模型的结构，下面该设置训练模型相关参数.

参考文件./examples/mnist/lenet_solver.prototxt:

# The train/test net protocol buffer definition# train/test 模型结构net: "examples/mnist/lenet_train_test.prototxt"# test_iter specifies how many forward passes the test should carry out.# In the case of MNIST, we have test batch size 100 and 100 test iterations,# covering the full 10,000 testing images.test_iter: 100# Carry out testing every 500 training iterations.# 指定每500次计算一下精度test_interval: 500# The base learning rate, momentum and the weight decay of the network.# 学习率设置base_lr: 0.01momentum: 0.9weight_decay: 0.0005# The learning rate policylr_policy: "inv"gamma: 0.0001power: 0.75# Display every 100 iterations# 设置100次显示一下状态display: 100# The maximum number of iterations# 最大迭代次数max_iter: 10000# snapshot intermediate results# 保存快照snapshot: 5000snapshot_prefix: "examples/mnist/lenet"# solver mode: CPU or GPUsolver_mode: GPU

训练模型

Caffe提供了一个训练脚本，在./examples/mnist/train_lenet.sh,我们看看都写了啥:

#!/usr/bin/env shset -e./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@

可以看到，这里调用了./build/tools/caffe train 然后指定对应的优化器文件，即--solver=examples/mnist/lenet_solver.prototxt。

调用时输出训练信息：

I1213 17:37:21.999351 30925 layer_factory.hpp:77] Creating layer mnistI1213 17:37:21.999413 30925 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdbI1213 17:37:21.999428 30925 net.cpp:84] Creating Layer mnistI1213 17:37:21.999433 30925 net.cpp:380] mnist -> dataI1213 17:37:21.999445 30925 net.cpp:380] mnist -> labelI1213 17:37:22.000012 30925 data_layer.cpp:45] output data size: 64,1,28,28I1213 17:37:22.000969 30925 net.cpp:122] Setting up mnistI1213 17:37:22.000979 30925 net.cpp:129] Top shape: 64 1 28 28 (50176)I1213 17:37:22.000982 30925 net.cpp:129] Top shape: 64 (64)...I1213 17:37:29.454346 30925 solver.cpp:447] Snapshotting to binary proto file examples/mnist/lenet_iter_5000.caffemodelI1213 17:37:29.459178 30925 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_5000.solverstateI1213 17:37:29.460712 30925 solver.cpp:330] Iteration 5000, Testing net (#0)I1213 17:37:29.512395 30934 data_layer.cpp:73] Restarting data prefetching from start.I1213 17:37:29.513818 30925 solver.cpp:397]     Test net output #0: accuracy = 0.9882...I1213 17:37:36.706809 30925 solver.cpp:447] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodelI1213 17:37:36.710286 30925 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstateI1213 17:37:36.712179 30925 solver.cpp:310] Iteration 10000, loss = 0.00240246I1213 17:37:36.712193 30925 solver.cpp:330] Iteration 10000, Testing net (#0)I1213 17:37:36.765053 30934 data_layer.cpp:73] Restarting data prefetching from start.I1213 17:37:36.766742 30925 solver.cpp:397]     Test net output #0: accuracy = 0.9913I1213 17:37:36.766758 30925 solver.cpp:397]     Test net output #1: loss = 0.0275297 (* 1 = 0.0275297 loss)I1213 17:37:36.766762 30925 solver.cpp:315] Optimization Done.I1213 17:37:36.766764 30925 caffe.cpp:259] Optimization Done.

每一大轮迭代，都会输出相关训练信息，包括学习率，loss，accuracy等。同时因为设置了每5000次训练保存一次Snapshotting。

到此，算是初步接触了Caffe了~

阅读全文

0 0

Ubuntu16.04下安装Caffe

Ubuntu16.04下安装Caffe

Caffe依赖总览

系统环境

安装计划

安装OpenCV

安装ffmpeg

安装OpenCV本身

OpenCV安装过程出现错误

错误1

安装Caffe依赖包

安装caffe

安装Caffe出现的错误

错误1

错误2

错误3

Caffe跑MNIST程序

准备数据集

LeNet模型

附加笔记： 定制layer 规则

模型优化器

训练模型

附加笔记：定制layer 规则