Caffe源码调试

来源：互联网发布：二战苏联海军知乎编辑：程序博客网时间：2024/06/10 15:32

Mars Exploration Rovers

文章作者：Tyan
博客：noahsnail.com | CSDN | 简书

这篇文件主要介绍如何使用Linux的gdb调试Caffe的源码，源码调试主要是为了阅读并更好的了解Caffe源码。

1. 准备工作

首先要在编译Caffe源码时打开debug模式，即将Makefile.config中的DEBUG := 1打开。
下载mnist数据集，主要是在mnist数据集上进行调试，执行bash data/mnist/get_mnist.sh。
转换mnist数据集为LMDB，bash examples/mnist/create_mnist.sh。
修改examples/mnist/lenet_solver.prototxt，将GPU改为CPU。

2. 调试

1. 激活GDB

使用GDB启动调试，执行gdb --args build/tools/caffe train --solver examples/mnist/lenet_solver.prototxt，--args表示我们调试时需要输入的参数，调试的命令为build/tools/caffe，caffe命令的参数为--solver examples/mnist/lenet_solver.prototxt。

执行结果：

$ gdb --args build/tools/caffe train --solver examples/mnist/lenet_solver.prototxtGNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>...Reading symbols from /home/irteam/line-brain/deploy/caffe/.build_debug/tools/caffe.bin...done.

2. 设置断点

执行b src/caffe/layers/base_conv_layer.cpp:117，b表示插入断点（breakpoint），断点的位置是base_conv_layer.cpp文件中的117行。插入断点的命令形式为：

b path/to/code.cpp:#line

118行相关代码：

117 channels_ = bottom[0]->shape(channel_axis_);118 num_output_ = this->layer_param_.convolution_param().num_output();119 CHECK_GT(num_output_, 0);

执行结果：

(gdb) b src/caffe/layers/base_conv_layer.cpp:117No source file named src/caffe/layers/base_conv_layer.cpp.Make breakpoint pending on future shared library load? (y or [n]) yBreakpoint 1 (src/caffe/layers/base_conv_layer.cpp:117) pending.

3. 运行程序

运行程序的命令是r。

执行结果：

Starting program: /*/caffe/build/tools/caffe train --solver examples/mnist/lenet_solver.prototxt[Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".I0718 15:19:19.671941 29986 caffe.cpp:211] Use CPU.[New Thread 0x7fffd81c7700 (LWP 29991)][New Thread 0x7fffd79c6700 (LWP 29992)]I0718 15:19:20.437239 29986 solver.cpp:44] Initializing solver from parameters:test_iter: 100test_interval: 500base_lr: 0.01display: 100max_iter: 10000lr_policy: "inv"gamma: 0.0001power: 0.75momentum: 0.9weight_decay: 0.0005snapshot: 5000snapshot_prefix: "examples/mnist/lenet"solver_mode: CPUnet: "examples/mnist/lenet_train_test.prototxt"train_state {  level: 0  stage: ""}I0718 15:19:20.437687 29986 solver.cpp:87] Creating training net from net file: examples/mnist/lenet_train_test.prototxtI0718 15:19:20.438357 29986 net.cpp:294] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnistI0718 15:19:20.438398 29986 net.cpp:294] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracyI0718 15:19:20.438499 29986 net.cpp:51] Initializing net from parameters:name: "LeNet"state {  phase: TRAIN  level: 0  stage: ""}layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param {    scale: 0.00390625  }  data_param {    source: "examples/mnist/mnist_train_lmdb"    batch_size: 64    backend: LMDB  }}layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  convolution_param {    num_output: 20    kernel_size: 5    stride: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "conv2"  type: "Convolution"  bottom: "pool1"  top: "conv2"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  convolution_param {    num_output: 50    kernel_size: 5    stride: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "pool2"  type: "Pooling"  bottom: "conv2"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "ip1"  type: "InnerProduct"  bottom: "pool2"  top: "ip1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 500    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "relu1"  type: "ReLU"  bottom: "ip1"  top: "ip1"}layer {  name: "ip2"  type: "InnerProduct"  bottom: "ip1"  top: "ip2"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 10    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "loss"  type: "SoftmaxWithLoss"  bottom: "ip2"  bottom: "label"  top: "loss"}I0718 15:19:20.439380 29986 layer_factory.hpp:77] Creating layer mnistI0718 15:19:20.439625 29986 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdbI0718 15:19:20.439702 29986 net.cpp:84] Creating Layer mnistI0718 15:19:20.439735 29986 net.cpp:380] mnist -> dataI0718 15:19:20.439853 29986 net.cpp:380] mnist -> labelI0718 15:19:20.444980 29986 data_layer.cpp:45] output data size: 64,1,28,28I0718 15:19:20.445436 29986 base_data_layer.cpp:72] Initializing prefetch[New Thread 0x7fffd603d700 (LWP 29993)]I0718 15:19:20.448151 29986 base_data_layer.cpp:75] Prefetch initialized.I0718 15:19:20.448186 29986 net.cpp:122] Setting up mnistI0718 15:19:20.448216 29986 net.cpp:129] Top shape: 64 1 28 28 (50176)I0718 15:19:20.448235 29986 net.cpp:129] Top shape: 64 (64)I0718 15:19:20.448245 29986 net.cpp:137] Memory required for data: 200960I0718 15:19:20.448264 29986 layer_factory.hpp:77] Creating layer conv1I0718 15:19:20.448324 29986 net.cpp:84] Creating Layer conv1I0718 15:19:20.448345 29986 net.cpp:406] conv1 <- dataI0718 15:19:20.448393 29986 net.cpp:380] conv1 -> conv1Breakpoint 1, caffe::BaseConvolutionLayer<float>::LayerSetUp (this=0x91edd70,    bottom=std::vector of length 1, capacity 1 = {...}, top=std::vector of length 1, capacity 1 = {...})    at src/caffe/layers/base_conv_layer.cpp:117117       channels_ = bottom[0]->shape(channel_axis_);Missing separate debuginfos, use: debuginfo-install OpenEXR-libs-1.7.1-7.el7.x86_64 atk-2.14.0-1.el7.x86_64 atlas-3.10.1-10.el7.x86_64 boost-filesystem-1.53.0-26.el7.x86_64 boost-python-1.53.0-26.el7.x86_64 boost-system-1.53.0-26.el7.x86_64 boost-thread-1.53.0-26.el7.x86_64 cairo-1.14.2-1.el7.x86_64 expat-2.1.0-10.el7_3.x86_64 fontconfig-2.10.95-10.el7.x86_64 freetype-2.4.11-12.el7.x86_64 gdk-pixbuf2-2.31.6-3.el7.x86_64 gflags-2.1.1-6.el7.x86_64 glib2-2.46.2-4.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 glog-0.3.3-8.el7.x86_64 graphite2-1.3.6-1.el7_2.x86_64 gstreamer-0.10.36-7.el7.x86_64 gstreamer-plugins-base-0.10.36-10.el7.x86_64 gtk2-2.24.28-8.el7.x86_64 harfbuzz-0.9.36-1.el7.x86_64 hdf5-1.8.12-8.el7.x86_64 ilmbase-1.0.3-7.el7.x86_64 jasper-libs-1.900.1-29.el7.x86_64 jbigkit-libs-2.0-11.el7.x86_64 leveldb-1.12.0-11.el7.x86_64 libX11-1.6.3-3.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXcomposite-0.4.4-4.1.el7.x86_64 libXcursor-1.1.14-2.1.el7.x86_64 libXdamage-1.1.4-4.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libXfixes-5.0.1-2.1.el7.x86_64 libXi-1.7.4-2.el7.x86_64 libXinerama-1.1.3-2.1.el7.x86_64 libXrandr-1.4.2-2.el7.x86_64 libXrender-0.9.8-2.1.el7.x86_64 libffi-3.0.13-18.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libgfortran-4.8.5-11.el7.x86_64 libjpeg-turbo-1.2.90-5.el7.x86_64 libpng-1.5.13-7.el7_2.x86_64 libquadmath-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libtiff-4.0.3-27.el7_3.x86_64 libv4l-0.9.5-4.el7.x86_64 libxcb-1.11-4.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 lmdb-libs-0.9.18-1.el7.x86_64 opencv-2.4.5-3.el7.x86_64 opencv-core-2.4.5-3.el7.x86_64 orc-0.4.22-5.el7.x86_64 pango-1.36.8-2.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 pixman-0.34.0-1.el7.x86_64 protobuf-2.5.0-8.el7.x86_64 python-libs-2.7.5-48.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64

Breakpoint 1之前是正常的程序日志输出，程序在断点处暂停。

查看变量命令为p var，命令与结果如下：

(gdb) p channels_$1 = 0(gdb) p channel_axis_$2 = 1

此时，channels_值为0。下一行命令为n，执行结果如下：

(gdb) n118       num_output_ = this->layer_param_.convolution_param().num_output();

此时查看channels_值为1，mnist数据是灰度图像，channels_为1没问题：

(gdb) p channels_$3 = 1

命令c是继续执行直到下一个断点。

如果需要调试GPU程序，可以使用cuda-gdb，文档地址为：http://docs.nvidia.com/cuda/cuda-gdb/index.html#axzz4nAAR7ujZ。

参考资料

http://zhaok.xyz/blog/post/debug-caffe/

阅读全文

0 0