Ubuntu上编译Caffe和拓展应用(faster-rcnn, pvanet)的错误及解决方案

来源：互联网发布：淘宝开店卖红酒编辑：程序博客网时间：2024/06/06 01:29

Caffe
- 错误采用make方式编译时遇到如下错误
Faster-RCNN
- 问题如何编译只采用cpu版本的Faster-RCNN
- 问题运行时遇到错误ImportError No module named cv2
- 问题编译cpu版本成功后faster-rcnn运行时遇到错误ImportError No module named gpu_nms
- 问题1运行vgg16版本的faster-rcnn的toolsdemopy遇到如下问题
- 问题如何编译cpu版本的pvanet
- 问题如何只用cpu训练caffepy-faster-rcnnpvanet
- 问题运行pvanet时报错
- 在安装pycaffe依赖库时遇到的问题
其他
- 问题 wget如何避免防火墙的影响

Caffe

错误：采用make方式编译时遇到如下错误

In file included from /usr/include/boost/python/detail/prefix.hpp:13:0,                 from /usr/include/boost/python/args.hpp:8,                 from /usr/include/boost/python.hpp:11,                 from tools/caffe.cpp:2:/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directorycompilation terminated.Makefile:575: recipe for target '.build_release/tools/caffe.o' failedmake: *** [.build_release/tools/caffe.o] Error 1

解决方案：修改Makefile.config，将

PYTHON_INCLUDE := $(ANACONDA_HOME)/include \#                 $(ANACONDA_HOME)/include/python2.7 \#                 $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

取消以下2行注释

PYTHON_INCLUDE := $(ANACONDA_HOME)/include \                 $(ANACONDA_HOME)/include/python2.7 \                 $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

Note:$(ANACONDA_HOME) #虚拟环境Anaconda2的根目录

Faster-RCNN

问题：如何编译只采用cpu版本的Faster-RCNN?

解决方案：
在./lib/setup.py中注释以下部分

...#CUDA = locate_cuda()......#self.set_executable('compiler_so', CUDA['nvcc'])......#Extension('nms.gpu_nms',#[‘nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],#library_dirs=[CUDA['lib64']],#libraries=['cudart'],#language='c++',#runtime_library_dirs=[CUDA['lib64']],## this syntax is specific to this build system## we're only going to use certain compiler args with nvcc and not with## gcc the implementation of this trick is in customize_compiler() below#extra_compile_args={'gcc': ["-Wno-unused-function"],#’nvcc': ['-arch=sm_35',#’—ptxas-options=-v',#’-c’,#’—compiler-options',#”’-fPIC'"]},#include_dirs = [numpy_include, CUDA['include']]#)

问题：运行时，遇到错误：`ImportError: No module named cv2`

File "./tools/test_net.py", line 13, in <module>    from fast_rcnn.test import test_net  File "/home/rtc5/JpHu/pva-faster-rcnn-master/tools/../lib/fast_rcnn/test.py", line 15, in <module>    import cv2ImportError: No module named cv2

解决方案
（1）检查cv2是否存在:
在${HOME}目录下运行

$find -name cv2

进行查找
（2）如果不存在cv2，安装python-opencv

sudo apt-get install python-opencv

（3）如果存在cv2,将文件夹cv2所在目录添加到.bashrc最后一行（如我将cv2安装在/home/rtc5/anaconda2/envs/tensorflow/lib/python2.7/site-packages/cv2下）

export PATHONPATH=$PYTHONPATH:/home/rtc5/anaconda2/envs/tensorflow/lib/python2.7/site-packages/cv2

运行命令

source ./bashrc #激活

激活./bashrc

问题：编译cpu版本成功后，faster-rcnn运行时，遇到错误`ImportError: No module named gpu_nms`

File "./demo.py", line 18, infrom fast_rcnn.test import im_detectFile ".../py-faster-rcnn-master/tools/../lib/fast_rcnn/test.py", line 17, infrom fast_rcnn.nms_wrapper import nmsFile ".../py-faster-rcnn-master/tools/../lib/fast_rcnn/nms_wrapper.py", line 11, infrom nms.gpu_nms import gpu_nmsImportError: No module named gpu_nms

解决方案：
注释${FCNN}/py-faster-rcnn/lib/fast_rcnn/nms_wrapper.py 中有关gpu的代码

from fast_rcnn.config import cfg#from nms.gpu_nms import gpu_nmsfrom nms.cpu_nms import cpu_nmsdef nms(dets, thresh, force_cpu=False):    """Dispatch to either CPU or GPU NMS implementations."""    if dets.shape[0] == 0:        return []    #if cfg.USE_GPU_NMS and not force_cpu:    #    return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)    else:        return cpu_nms(dets, thresh)

问题：（1）运行vgg16版本的faster-rcnn的`./tools/demo.py`遇到如下问题

WARNING: Logging before InitGoogleLogging() is written to STDERRF1207 00:08:31.251930 20944 common.cpp:66] Cannot use GPU in CPU-only Caffe: check mode.*** Check failure stack trace: ***Aborted (core dumped)

解决方案：
采用命令：

$./tools/demo.py --cpu

Note:运行pvanet示例时，遇到类似问题，则需要将测试文件*.py中set_gpu的相关代码注释

问题：如何编译cpu版本的pvanet

编译caffe，遇到问题：

src/caffe/layers/proposal_layer.cpp:321:10: error: redefinition of ‘void caffe::ProposalLayer<Dtype>::Backward_gpu(const std::vector<caffe::Blob<Dtype>*>&, const std::vector<bool>&, const std::vector<caffe::Blob<Dtype>*>&)’ STUB_GPU(ProposalLayer);          ^./include/caffe/util/device_alternate.hpp:17:6: note: in definition of macro ‘STUB_GPU’ void classname<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top, \      ^In file included from src/caffe/layers/proposal_layer.cpp:1:0:./include/caffe/fast_rcnn_layers.hpp:122:16: note: ‘virtual void caffe::ProposalLayer<Dtype>::Backward_gpu(const std::vector<caffe::Blob<Dtype>*>&, const std::vector<bool>&, const std::vector<caffe::Blob<Dtype>*>&)’ previously declared here   virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,                ^Makefile:575: recipe for target '.build_release/src/caffe/layers/proposal_layer.o' failedmake: *** [.build_release/src/caffe/layers/proposal_layer.o] Error 1make: *** Waiting for unfinished jobs....

解决方案：
由于caffe::ProposalLayer<Dtype>::Backward_gpu在./include/caffe/fast_rcnn_layers.hpp和./include/caffe/util/device_alternate.hpp（后者为模板形式）中定义了两次，被系统认为重定义。
解决方法如下：
将./include/caffe/fast_rcnn_layers.hpp的Backward_gpu代码

 virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom){}

修改如下

 virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);

由于Backward_cpu只在./include/caffe/fast_rcnn_layers.hpp中定义过一次，所以一定避免对它做如上gpu的修改。

问题：如何只用cpu训练caffe,py-faster-rcnn,pvanet?

*报错：

smooth_L1_loss_layer Not Implemented Yet

解决方案：*
补充./src/caffe/layers/smooth_L1_loss_layer.cpp函数实体SmoothL1LossLayer::Forward_cpu和SmoothL1LossLayer::Backward_cpu

`// ------------------------------------------------------------------// Fast R-CNN// Copyright (c) 2015 Microsoft// Licensed under The MIT License [see fast-rcnn/LICENSE for details]// Written by Ross Girshick// ------------------------------------------------------------------#include "caffe/fast_rcnn_layers.hpp"namespace caffe {templatevoid SmoothL1LossLayer::LayerSetUp(const vector<Blob>& bottom, const vector<Blob>& top) {SmoothL1LossParameter loss_param = this->layer_param_.smooth_l1_loss_param();sigma2_ = loss_param.sigma() * loss_param.sigma();has_weights_ = (bottom.size() >= 3);if (has_weights_) {CHECK_EQ(bottom.size(), 4) << "If weights are used, must specify both ""inside and outside weights";}}templatevoid SmoothL1LossLayer::Reshape(const vector<Blob>& bottom, const vector<Blob>& top) {LossLayer::Reshape(bottom, top);CHECK_EQ(bottom[0]->channels(), bottom[1]->channels());CHECK_EQ(bottom[0]->height(), bottom[1]->height());CHECK_EQ(bottom[0]->width(), bottom[1]->width());if (has_weights_) {CHECK_EQ(bottom[0]->channels(), bottom[2]->channels());CHECK_EQ(bottom[0]->height(), bottom[2]->height());CHECK_EQ(bottom[0]->width(), bottom[2]->width());CHECK_EQ(bottom[0]->channels(), bottom[3]->channels());CHECK_EQ(bottom[0]->height(), bottom[3]->height());CHECK_EQ(bottom[0]->width(), bottom[3]->width());}diff_.Reshape(bottom[0]->num(), bottom[0]->channels(),bottom[0]->height(), bottom[0]->width());errors_.Reshape(bottom[0]->num(), bottom[0]->channels(),bottom[0]->height(), bottom[0]->width());// vector of ones used to sumones_.Reshape(bottom[0]->num(), bottom[0]->channels(),bottom[0]->height(), bottom[0]->width());for (int i = 0; i < bottom[0]->count(); ++i) {ones_.mutable_cpu_data()[i] = Dtype(1);}}templatevoid SmoothL1LossLayer::Forward_cpu(const vector<Blob>& bottom,const vector<Blob>& top) {// NOT_IMPLEMENTED;int count = bottom[0]->count();//int num = bottom[0]->num();const Dtype* in = diff_.cpu_data();Dtype* out = errors_.mutable_cpu_data();caffe_set(errors_.count(), Dtype(0), out);caffe_sub(count,bottom[0]->cpu_data(),bottom[1]->cpu_data(),diff_.mutable_cpu_data()); // d := b0 - b1if (has_weights_) {// apply "inside" weightscaffe_mul(count,bottom[2]->cpu_data(),diff_.cpu_data(),diff_.mutable_cpu_data()); // d := w_in * (b0 - b1)}for (int index = 0;index < count; ++index){Dtype val = in[index];Dtype abs_val = abs(val);if (abs_val < 1.0 / sigma2_) {out[index] = 0.5 * val * val * sigma2_;} else {out[index] = abs_val - 0.5 / sigma2_;}}if (has_weights_) {// apply "outside" weightscaffe_mul(count,bottom[3]->cpu_data(),errors_.cpu_data(),errors_.mutable_cpu_data()); // d := w_out * SmoothL1(w_in * (b0 - b1))}Dtype loss = caffe_cpu_dot(count, ones_.cpu_data(), errors_.cpu_data());top[0]->mutable_cpu_data()[0] = loss / bottom[0]->num();}templatevoid SmoothL1LossLayer::Backward_cpu(const vector<Blob>& top,const vector& propagate_down, const vector<Blob>& bottom) {// NOT_IMPLEMENTED;int count = diff_.count();//int num = diff_.num();const Dtype* in = diff_.cpu_data();Dtype* out = errors_.mutable_cpu_data();caffe_set(errors_.count(), Dtype(0), out);for (int index = 0;index < count; ++index){Dtype val = in[index];Dtype abs_val = abs(val);if (abs_val < 1.0 / sigma2_) {out[index] = sigma2_ * val;} else {out[index] = (Dtype(0) < val) - (val < Dtype(0));}}for (int i = 0; i < 2; ++i) {if (propagate_down[i]) {const Dtype sign = (i == 0) ? 1 : -1;const Dtype alpha = sign * top[0]->cpu_diff()[0] / bottom[i]->num();caffe_cpu_axpby(count, // countalpha, // alphadiff_.cpu_data(), // xDtype(0), // betabottom[i]->mutable_cpu_diff()); // yif (has_weights_) {// Scale by "inside" weightcaffe_mul(count,bottom[2]->cpu_data(),bottom[i]->cpu_diff(),bottom[i]->mutable_cpu_diff());// Scale by "outside" weightcaffe_mul(count,bottom[3]->cpu_data(),bottom[i]->cpu_diff(),bottom[i]->mutable_cpu_diff());}}}}#ifdef CPU_ONLYSTUB_GPU(SmoothL1LossLayer);#endifINSTANTIATE_CLASS(SmoothL1LossLayer);REGISTER_LAYER_CLASS(SmoothL1Loss);} // namespace caffe

转自： zhouphd 的解答，已验证有效，caffe能够通过编译，并进行训练

问题：运行pvanet时，报错

Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

原因：由于之前安装tensorflow时，采用的是anaconda，它独自创建了一个虚拟环境（自行另安装依赖库），但由于anaconda会在~/.bashrc中添加PATH路径。所以执行caffe程序时（在虚拟环境之外），其依赖库也会受到anaconda安装软件的影响。
解决方案：屏蔽anaconda设置的PATH，在~/.bashrc中注释

#export PATH="/home/cvrsg/anaconda2/bin:$PATH"$source ~/.bashrc #激活.bashrc

注意：重开一个终端，在当前终端，source命令是没有生效的。
如何验证？

如果在当前终端下输入sudo echo $PATH你会发现anaconda2/bin还在PATH中，source未生效重开终端之后，anaconda2/bin已消失

同样由此可知，当我们需要anaconda2时，我们可以将

#export PATH="/home/cvrsg/anaconda2/bin:$PATH"

解注释，并source ~/.bashrc激活
不需要时，注释即可。
在上述命令被注释的情况下，运行source activate tensorflow会出现以下错误

bash: activate: No such file or directory

别着急，解注释，激活就好。

Note-切记：：：
另外，如果我们要用conda安装软件时，一定要切换到相应的虚拟环境下，否则安装的软件很容易和系统软件发生版本冲突，导致程序出错。

在安装pycaffe依赖库时，遇到的问题

利用命令for req in $(cat requirements.txt); do pip install $req; done安装pycaffe相关依赖库遇到问题：ImportError: No module named packaging.version
描述：这是因为采用 sudo apt-get install python-pip安装的pip有问题

sudo apt-get remove python-pip #删除原有pipwget https://bootstrap.pypa.io/get-pip.py  //获取特定pip，并进行安装sudo python get-pip.py

其他

问题: wget如何避免防火墙的影响？

解决方案：
在命令

wget xxx#如wget https://www.dropbox.com/s/87zu4y6cvgeu8vs/test.model?dl=0 -O models/pvanet/full/test.model

之后加

—no-check-certificate

0 0