Caffe

来源：互联网发布：淘宝代购链接怎么做编辑：程序博客网时间：2024/05/23 20:12

基于Intel优化的Caffe框架训练和部署深度学习网络

原文.

目录：
- 摘要
- 安装
- 数据层
- 数据集准备
- 训练
- 多节点分布式训练
- 微调
- 测试
- 特征提取和可视化
- 使用Python API
- 调试
- 实例
- Caffe用处
- 进一步阅读

1 摘要

Caffe是BVLC开发的深度学习框架，基于C++和 CUDA C++语言，并提供了Python和Matlab接口. 该框架对于卷积神经网络CNN、循环神经网络RNN及多层感知器很有帮助. 现在已经具有对于检测、分类、分割以及Spark兼容的分支.
基于Intel结构优化的Caffe(Caffe-Intel)整合了Intel Math Kernel Library(Intel MKL) 2017，并对 Advanced Vector Extensions(AVX)-2 和AVX-512进行了优化，能够支持 Intel Xeon和Intel Xeon Phi处理器. 因此，基于Intel优化的Caffe框架除了包含BVLC Caffe的所有优点外，还能在Intel结构上有效运行，并能在许多节点进行分布式训练. 该文档主要阐述了基于Intel结构优化的Caffe框架的编译、使用一个或多个计算节点进行网络模型的训练以及网络的部署. 另外，详细介绍了Caffe的一些函数，比如网络微调、不同模型的特征提取与可视化、Caffe的Python API接口.

名词：
- weights 权重 - 也被叫做核(kernels)、滤波器(filters)、模板(templates)、或特征提取器(feature extractors)；
- blob 数据块 - 也被叫做张量(tesor)，一种N维数据结构，N-D维张量，包含了数据、梯度或权重(偏置bias)；
- units 神经元 - 也被叫做 neurons，在数据块进行非线性变化；
- feature maps 特征图 - 也被叫做通道(channels)；
- testing 测试 - 也被叫做推断(inference)、分类、得分(scoring)或部署(deployment)；
- model 模型 - 也被叫做拓扑结构或网络结构.

快速熟悉Caffe：
- 安装
- 基于MNIST训练和测试LeNet
- 在一些图片上，比如cat和fish-bike，测试训练好的模型，比如，bvlc_googlenet.caffemodel
- 在Cats vs Dogs Challenge对已有模型微调

2 安装

这里仅针对Ubuntu14.04平台说明Caffe的安装，其他Linux和OS X操作系统，BVLC官方提供了相应的安装方法.

sudo apt-get update &&sudo apt-get -y install build-essential git cmake &&sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev &&sudo apt-get -y install libopencv-dev libhdf5-serial-dev protobuf-compiler &&sudo apt-get -y install --no-install-recommends libboost-all-dev &&sudo apt-get -y install libgflags-dev libgoogle-glog-dev liblmdb-dev &&sudo apt-get -y install libatlas-base-dev

对于Ubuntu16.04，需要进行以下库的链接：

find . -type f -exec sed -i -e 's^"hdf5.h"^"hdf5/serial/hdf5.h"^g' -e 's^"hdf5_hl.h"^"hdf5/serial/hdf5_hl.h"^g' '{}' ;cd /usr/lib/x86_64-linux-gnusudo ln -s libhdf5_serial.so.10.1.0 libhdf5.sosudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

针对CentOS7，安装以下依赖项：

sudo yum -y update &&sudo yum -y groupinstall "Development Tools" &&sudo yum -y install wget cmake git &&sudo yum -y install protobuf-devel protobuf-compiler boost-devel &&sudo yum -y install snappy-devel opencv-devel atlas-devel &&sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel# The following steps are only required if some packages failed to install# add EPEL repository then install missing packageswget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpmsudo rpm -ivh epel-release-latest-7.noarch.rpmsudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel &&sudo yum -y install protobuf-devel protobuf-compiler boost-devel# if packages are still not found--download and install/build the packages, e.g.,# snappy:wget http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpmsudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm# atlas:wget http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpmsudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm# opencv:wget https://github.com/Itseez/opencv/archive/2.4.13.zipunzip 2.4.13.zipcd opencv-2.4.13/mkdir build && cd buildcmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local ..NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))make all -j $NUM_THREADSsudo make install -j $NUM_THREADS# optional (not required for Caffe)# other useful repositories for CentOS are RepoForge and IUS:wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el7.rf.x86_64.rpmsudo rpm -Uvh rpmforge-release-0.5.3-1.el7.rf.x86_64.rpmwget https://rhel7.iuscommunity.org/ius-release.rpmsudo rpm -Uvh ius-release*.rpm

各依赖项的说明(source)：
- boost - 使用math functions 和 shared pointer的C++库；
- glog、gflags - 提供日志和命令行工具，对于调试十分必要；
- leveldb、lmdb - 数据库IO，用于准备数据；
- protobuf - 用于有效的定义数据结构；
- BLAS(Basic Linear Algebra Subprograms) - 由Intel MKL提供的矩阵乘法、矩阵加法等操作库，类似的有ATLAS、openBLAS等.

Caffe安装指南指出对于CPU来说，安装MKL会有更好的表现.

为了最佳表现，采用Intel MKL 2017，可以免费从Intel® Parallel Studio XE 2017 Beta获取Beta版.
安装好后，正确的环境库可以设置如下(其中的路径需要根据实际情况修改)：

echo 'source /opt/intel/bin/compilervars.sh intel64' >> ~/.bashrc# alternatively edit <mkl_path>/mkl/bin/mklvars.sh replacing INSTALLDIR in# CPRO_PATH=<INSTALLDIR> with the actual mkl path: CPRO_PATH=<full mkl path># echo 'source <mkl path>/mkl/bin/mklvars.sh intel64' >> ~/.bashrc

克隆并准备基于Intel优化的Caffe：

cd ~# For BVLC caffe use:# git clone https://github.com/BVLC/caffe.git# For intel caffe use:git clone https://github.com/intel/caffe.git cd caffeecho "export CAFFE_ROOT=`pwd`" >> ~/.bashrcsource ~/.bashrccp Makefile.config.example Makefile.config# Open Makefile.config and modify it (see comments in the Makefile)vi Makefile.config

编辑Makefile.config：

# To run on CPU only and to avoid installing CUDA installers, uncommentCPU_ONLY := 1# To use MKL, replace atlas with mkl as follows# (make sure that the BLAS_DIR and BLAS_LIB paths are correct)BLAS := mklBLAS_DIR := $(MKLROOT)/includeBLAS_LIB := $(MKLROOT)/lib/intel64# To use MKL2017 DNN primitives as the default engine, uncomment# (however leave it commented if using multinode training)# USE_MKL2017_AS_DEFAULT_ENGINE := 1# To customized compiler choice, uncomment and set the following# CUSTOM_CXX := g++# To train on multinode uncomment and verify path# USE_MPI := 1# CXX := /usr/bin/mpicxx

如果是Ubuntu16.04，编辑Makefile：

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

并创建链接：

cd /usr/lib/x86_64-linux-gnusudo ln -s libhdf5_serial.so.10.1.0 libhdf5.sosudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

如果是CentOS7和ATLAS库(而不是推荐的MKL库)，编辑Makefile：

# Change this lineLIBRARIES += cblas atlas# toLIBRARIES += satlas

编译Caffe-Intel：

NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))make -j $NUM_THREADS# To save the output stream to file makestdout.log use this instead# make -j $NUM_THREADS 2>&1 | tee makestdout.log

另一种方式是采用cmake方式：

mkdir buildcd buildcmake -DCPU_ONLY=on -DBLAS-mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on /path/to/caffeNUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))make -j $NUM_THREADS

安装Python依赖项：

# These steps are OPTIONAL but highly recommended to use the Python interfacesudo apt-get -y install gfortran python-dev python-pipcd ~/caffe/pythonfor req in $(cat requirements.txt); do sudo pip install $req; donesudo pip install scikit-image #depends on other packagessudo ln -s /usr/include/python2.7/ /usr/local/include/python2.7sudo ln -s /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ \  /usr/local/include/python2.7/numpycd ~/caffemake pycaffe -j NUM_THREADSecho "export PYTHONPATH=$CAFFE_ROOT/python" >> ~/.bashrcsource ~/.bashrc

其它安装选项：

# These steps are OPTIONAL to test caffemake test -j $NUM_THREADSmake runtest #"YOU HAVE <some number> DISABLED TESTS" output is OK# This step is OPTIONAL to disable cam hardware OpenCV driver# alternatively, the user can skip this and ignore the harmless # libdc1394 error that may occasionally appearssudo ln /dev/null /dev/raw1394

3 数据层

该部分是可选，将对一些数据类型进行阐述，对于学习Caffe是非必须的，主要基于Caffe官方提供的材料和 src/caffe/proto/caffe.proto.
Data通过数据层进入Caffe，其位于网络的最底部，在prototxt文件中进行定义. 关于prototxt文件的更多信息会在Training部分详细介绍. Data可以来自有效的数据库(LevelDB或LMDB)，直接从内存、从磁盘HDF5格式文件或通用图像格式.
常用的输入预处理(比如中心化(mean subtraction)、尺度变换、随机裁剪、镜像处理等)变换可以通过指定transfrom_params(不是所有的数据类型都支持该参数，比如HDF5即不支持)来定义. 如果已经预先进行数据变换，则不必再使用. 常用的数据变换定义方式：

transform_param {  # randomly horizontally mirror the image  mirror: 1  # crop a `crop_size` x `crop_size` patch:  # - at random during training  # - from the center during testing  crop_size: 227  # substract mean value: these mean_values can equivalently be replaced with a mean.binaryproto file as  # mean_file: name_of_mean_file.binaryproto  mean_value: 104  mean_value: 117  mean_value: 123}

这里，图像要进行裁剪、镜像、中心化变换. 其他数据变换操作可以查看src/caffe/proto/caffe.proto文件的TransformationParameter参数.

3.1 Data

LMDB(Lightning Memory-Mapped Databases )和LevelDB数据形式可以作为输入数据的一种有效方式. 他们只对于1-of-K分类任务较适用. 由于Caffe在读取数据集效率问题，这两种数据形式被推荐用于1-of-K任务.
data_params
属性
- source - 包含数据库的路径
- batch_size - 一次处理输入的数目

参数
- backend[默认LEVELDB] - 选择采用LEVELDB或LMDB
- rand_skip - 在开始处跳过的输入数目，对于async sgd有用

详细介绍查看src/caffe/proto/caffe.proto文件中DataParameter参数.

layer {  name: "data"  type: "Data"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param {    mirror: 1    crop_size: 227    mean_value: 104    mean_value: 117    mean_value: 123  }  data_param {    source: "examples/imagenet/ilsvrc12_train_lmdb"    batch_size: 32    backend: LMDB  }}

或者，均值中心化可以通过均值图像(“data/ilsvrc12/imagenet_mean.binaryproto”)来取代mean_value. LMDB数据集的binaryproto的计算为：

cd ~/caffebuild/tools/compute_image_mean examples/imagenet/ilsvr12_train_lmdb data/ilsvrc12/imagenet_mean.binaryproto

根据实际需求，可以分别替换examples/imagenet/ilsvr12_train_lmdb和data/ilsvrc12/imagenet_mean.binaryproto为合适的lmdb文件夹和binaryproto文件.

3.2 ImageData

直接从图像文件得到images和labels.
image_data_params
属性
- source - 包含了输入数据和labels的文本文件名字

参数
- batch_size[默认为1] - 一次处理的输入数目
- new_height[默认为0] - 调整图像height值，如果为0，则忽略
- new_width[默认为0] - 调整图像width值，如果为0，则忽略
- shuffle[默认为0] - 打乱数据，如果为0，则忽略
- rand_skip[默认为0] - 在开始处跳过的输入数目，对于async sgd有用

详细介绍查看src/caffe/proto/caffe.proto文件中ImageDataParameter参数.

layer {  name: "data"  type: "ImageData"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param {    mirror: true    crop_size: 227    mean_value: 104    mean_value: 117    mean_value: 123  }  image_data_param {    source: "/path/to/file/train.txt"    batch_size: 32    shuffle: 1  }}

这里，图像进行了顺序打乱、裁剪、镜像和中心化处理.
需要注意的是，文本中每行应为图像名和对应的labels，比如， “tran.txt”形式：

/path/to/images/img3423.jpg 2/path/to/images/img3424.jpg 13/path/to/images/img3425.jpg 8...

3.3 Input

指定数据维度时，采用零值blob作为输入数据.
input_params
属性
- shape - 指定为1或top blobs的维度信息

layer {  name: "input"  type: "Input"  top: "data"  input_param {    shape {      dim: 32      dim: 3      dim: 227      dim: 227    }  }}

等价形式

input: "data"input_dim: 32input_dim: 3input_dim: 227input_dim: 227

3.4 DummyData

类似于Input类型，不同之处在于需要指定数据类型. 往往用于调试，详细可参考例子
dummy_data_params
属性
- shape - 指定为1或top blobs的维度信息

参数
- data_filler[默认是值为0的ConstantFiller] - 指定top blob的值

layer {  name: "data"  type: "DummyData"  top: "data"  include {    phase: TRAIN  }  dummy_data_param {    data_filler {      type: "constant"      value: 0.01    }    shape {      dim: 32      dim: 3      dim: 227      dim: 227    }  }}layer {  name: "data"  type: "DummyData"  top: "label"  include {    phase: TRAIN  }  dummy_data_param {    data_filler {      type: "constant"    }    shape {      dim: 32    }  }}

3.5 MemoryData

直接从内存读取数据，调用方式为：调用MemoryDataLayer::Reset (from C++) 和Net.set_input_arrays (from Python)来读取连续的数据，一般是4D array，一次读取一个batch_size.
由于该方式需要将数据首先送到内存中，速率可能会慢，但一旦放到内存中，这种方式很有效率.
memory_data_param
属性
- bacth_size，channels， height， width - 数据的维度信息

layers {  name: "data"  type: MEMORY_DATA  top: "data"  top: "label"  transform_param {    crop_size: 227    mirror: true    mean_file: "mean.binaryproto"  }  memory_data_param {   batch_size: 32   channels: 3   height: 227   width: 227  }

3.6 HDF5Data

以HDF5格式文件来读取数据，对于很多任务都是可用的，但一般只用于FP32和FP64数据，不是uint8，故图像数据会很大. 不允许使用transform_param. 只在必要的时候使用该方式.
hdf5_data_param
属性
- source - 包含输入数据和labels路径的文本文件名
- batch_size

参数
- shuffle[默认false] - 打乱HDF5文件顺序

layer {  name: "data"  type: "HDF5_DATA"  top: "data"  top: "label"  include {    phase: TRAIN  }  hdf5_data_param {    source: "examples/hdf5_classification/data/train.txt"    batch_size: 32  }}

3.7 HDF5DataOutput

HDF5输出层的作用与其他数据层相反，将输入数据块写入磁盘
hdf5_output_param
属性
- file_name

layer {  name: "data_output"  type: "HDF5_OUTPUT"  bottom: "data"  bottom: "label"  include {    phase: TRAIN  }  hdf5_output_param {    file_name: "output_file.h5"  }}

3.8 WindowData

用于detection，Read windows from image files class labels.
window_data_param
属性
- source - 指定数据源
- mean_file
- batch_size

参数
- mirror
- crop_size - 随机裁剪图像
- crop_mode[默认”warp”] - 裁剪detection window的模式，比如，”warp”裁剪为固定尺寸， “square”在window四周裁剪紧凑方框
- fg_threshold[默认0.5] - 前景重叠阈值(foreground (object) overlap threshold)
- bg_threshold[默认0.5] - 背景重叠阈值(background (object) overlap threshold)
- fg_fraction[默认0.25]: 前景物体交集(fraction of batch that should be foreground) objects
- context_pad[默认10]: 围绕window补零数目(amount of contextual padding around a window)

详细信息可参考src/caffe/proto/caffe.proto文件中的WindowDataParameter参数.

layers {  name: "data"  type: "WINDOW_DATA"  top: "data"  top: "label"  window_data_param {    source: "/path/to/file/window_train.txt"    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"    batch_size: 128    mirror: true    crop_size: 227    fg_threshold: 0.5    bg_threshold: 0.5    fg_fraction: 0.25    context_pad: 16  }}

4 数据集准备

对于1-of-K分类任务推荐使用LMDB数据格式. 在使用Caffe工具生成LMDB格式数据需要：
- 数据所在目录
- 输出目录，比如mydataset_train_lmdb，必须
- 包含图像名和对应labels的文本文件，比如，”train.txt“，内容格式为：

img3423.jpg 2img3424.jpg 13img3425.jpg 8...

如果数据分散在不同的文件夹， “train.txt“需要包含数据的绝对路径.

create_label_file.py可以生成针对 Kaggle’s Dog vs Cats Competition任务的training和validation数据集划分，同样适用于其它任务.
create_label_file.py

#!/usr/bin/env pythonimport sysimport osimport os.pathdef main():  TRAIN_TEXT_FILE = 'train.txt'  VAL_TEXT_FILE = 'val.txt'  IMAGE_FOLDER = 'train'  # Selects 10% of the images (the ones that end in '2') for validation  fr = open(TRAIN_TEXT_FILE, 'w')  fv = open(VAL_TEXT_FILE, 'w')  filenames = os.listdir(IMAGE_FOLDER)  for filename in filenames:    if filename[0:3] == 'cat':      if filename[-5] == '2':# or filename[-5] == '8':        fv.write(filename + ' 0\n')      else:        fr.write(filename + ' 0\n')    if filename[0:3] == 'dog':      if filename[-5] == '2':# or filename[-5] == '8':        fv.write(filename + ' 1\n')      else:        fr.write(filename + ' 1\n')  fr.close()  fv.close()# Standard boilerplate to call the main() function to begin the program.if __name__ == '__main__':  main()

在测试阶段，假设labels不存在的. 如果labels可用，可以采用相同的方法生成 test LMDB数据集.

4.1 准备三通道数据（图像）

下面的例子生成training LMDB，工作路径位于$CAFFE_ROOT

#!/usr/bin/env sh# folder containing the training and validation imagesTRAIN_DATA_ROOT=/path/to/training/images# folder containing the file with the name of training imagesDATA=/path/to/file# folder for the lmdb datasetsOUTPUT=/path/to/output/directoryTOOLS=/path/to/caffe/build/tools# Set to resize the images to 256x256RESIZE_HEIGHT=256RESIZE_WIDTH=256echo "Creating train lmdb..."# Delete the shuffle line if shuffle is not desiredGLOG_logtostderr=1 $TOOLS/convert_imageset     --resize_height=$RESIZE_HEIGHT     --resize_width=$RESIZE_WIDTH     --shuffle     $TRAIN_DATA_ROOT/     $DATA/train.txt     $OUTPUT/mydataset_train_lmdbecho "Done."

计算LMDB数据集的图像均值：

#!/usr/bin/env sh# Compute the mean image in lmdb datasetOUTPUT=/path/to/output/directory # folder for the lmdb datasets and output for mean imageTOOLS=/path/to/caffe/build/tools$TOOLS/compute_image_mean $OUTPUT/mydataset_train_lmdb   $OUTPUT/train_mean.binaryproto$TOOLS/compute_image_mean $OUTPUT/mydataset_val_lmdb   $OUTPUT/val_mean.binaryproto

4.2 准备不同通道数据

灰度值图像(Gray scale images，单通道)、RADAR图像(双通道)、视频(videos，四通道)、图像+深度信息(四通道)、brometry(单通道)以及频谱图(spectrograms，单通道)需要进行变换以生成LMDB数据集(参考资料).

4.3 调整图像尺寸

有两种调整图像尺寸的方式：
- 变换图像到指定尺寸
- 按比例调整到比指定尺寸相对较小的尺寸，然后中心裁剪大的一边以达到指定尺寸

调整图像尺寸的方法有：
- 基于OPENCV* - build/tools/convert_imageset –resize_height=256 –resize_width=256 将图像裁剪到指定尺寸，其中convert_imageset 调用了ReadImageToDatum函数，后者调用了caffe/src/util/io.cpp中的ReadImageToCVMat函数；
- 基于ImageMagick - convert -resize 256x256! 将图像裁剪到指定尺寸；
- 基于OPENCV - 采用脚本tools/extra/resize_and_crop_images.py来进行多线程图像变换，对图像进行比例地变换，再进行中心裁剪

sudo pip install git+https://github.com/Yangqing/mincepie.gitsudo apt-get install -y python-opencvvi tools/extra/launch_resize_and_crop_images.sh # set number of clients (use num_of_cores*2); file.txt, input, and output folders

另外，网络中的图像可以在数据层定义参数来进行裁剪或者调整尺寸：

layer {  name: "data"  transform_param {    crop_size: 227...}

layer {  name: "data"  image_data_param {    new_height: 227    new_width: 227...

5 训练 Training

网络训练需要：
- train_val.prototxt - 定义了网络结构、初始化参数和学习率
- solver.prototxt - 定义了优化参数的方式，训练深度网络的文件
- deploy.prototxt - 只用于testing，与train_val.prototxt基本一致，除了没有输入层、loss层

参数初始化十分重要，其主要方式有：
- gaussian - 从高斯分布 N(0,std)采样权重值
- xavier - 从uniform distribution U(-a,a)采样权重，其中 a=sqrt(3/fan_in), where fan_in is the number of incoming inputs
- MSRAFiller - 从正态分布normal distribution N(0,a)采样权重, 其中a=sqrt(2/fan_in)

网络层关于学习率的参数：
- base_lr - 初始化学习率，默认为0.01，训练时如果出现NAN，则将值调小
- lr_mult - 偏置的lr_mult一般设为2×非偏置权重的lr_mult

以LeNet为例，分别定义 lenet_train_test.prototxt, deploy.prototxt, solver.prototxt：
solver.prototxt

# 网络定义net: "examples/mnist/lenet_train_test.prototxt"# 每500次训练迭代进行一次validation testtest_interval: 500 # 指定validation test迭代的次数，推荐值设为 num_val_imgs / batch_sizetest_iter: 100 # 训练网络的基础学习率、动量和权重衰减base_lr: 0.01momentum: 0.9 weight_decay: 0.0005# 不同的学习策略#  fixed: always return base_lr.#  step: return base_lr * gamma ^ (floor(iter / step))#  exp: return base_lr * gamma ^ iter#  inv: return base_lr * (1 + gamma * iter) ^ (- power)#  multistep: similar to step but it allows non uniform steps defined by stepvalue#  poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter: return base_lr (1 - iter/max_iter) ^ (power)#  sigmoid: the effective learning rate follows a sigmod decay: return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))lr_policy: "step"gamma: 0.1 stepsize: 10000 # Drop the learning rate in steps by a factor of gamma every stepsize iterations# 每100次迭代显示一次结果display: 100 # 最大迭代次数max_iter: 10000# 每5000次迭代输出一次快照，即模型训练状态和模型参数snapshot: 5000snapshot_prefix: "examples/mnist/lenet_multistep"# solver mode: CPU or GPUsolver_mode: CPU

训练网络：

$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt

训练网络会输出两种类型的文件，比如：
- lenet_multistep_10000.caffemodel - 网络的权重，即用于测试的模型参数
- lenet_multistep_10000.solverstate - 如果中间训练过程中断，便于恢复训练

训练网络，并画出验证数据集上的精度或loss vs迭代的曲线：

#CHART_TYPE=[0-7]#  0: Test accuracy  vs. Iters#  1: Test accuracy  vs. Seconds#  2: Test loss  vs. Iters#  3: Test loss  vs. Seconds#  4: Train learning rate  vs. Iters#  5: Train learning rate  vs. Seconds#  6: Train loss  vs. Iters#  7: Train loss  vs. SecondsCHART_TYPE=0$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt 2>&1 | tee logfile.logpython $CAFFE_ROOT/tools/extra/plot_training_log.py.example $CHART_TYPE name_of_plot.png logfile.log

Dropout被用于全连接层，在forward-pass过程只激活部分权重来避免权重间的协同性，以降低过拟合. 在测试过程被忽略.

layer {  name: "fc6"  type: "InnerProduct"  bottom: "pool5"  top: "fc6"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output: 4096    weight_filler {      type: "gaussian"      std: 0.005    }       bias_filler {      type: "constant"      value: 1    }     }}layer {  name: "relu6"  type: "ReLU"  bottom: "fc6"  top: "fc6"}layer {  name: "drop6"  type: "Dropout"  bottom: "fc6"  top: "fc6"  dropout_param {    dropout_ratio: 0.5   }}

估计前向传播和后向传播的时间，不更新权重：

# 计算NUMITER=50次前向和后向传播的时间，总时间以及平均时间# 可能需要训练样本和mean.binaryprotoNUMITER=50/path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

Linux的numactl工具可以进行内存分配管理：

numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

Model Zoo

Caffe Model Zoo给出了针对不同任务的网络模型以及模型参数，便于fine-tuning或者testing.

6 多节点分布式训练 Multinode distributed training

该部分内容基于Intel’s Caffe Github wiki. 主要有两种方式进行多节点的分布式训练：
- 模型并行
- 数据并行

模型并行是指，将模型置于不同的节点，每个节点都进行全部的数据处理；
数据并行是指，将数据块置于不同的节点，每个节点都有全部的模型参数.
对于模型中权重数较少，数据块较大时，数据并行比较使用.
混合模型和数据并行可以同时进行，对于网络层权重较少，比如卷积层采用数据并行训练，对于网络层权重较多，比如全连接层采用模型并行训练.
Intel的论文对混合方法中数据并行和模型并行间的优化平衡进行了理论分析.

结合当前比较流行的权重较少的深度网络，比如GoogleNet和ResNet，以及采用数据并行分布式训练的成功案例，可以看出，Caffe-Intel支持数据并行计算的. 多节点分布式训练也是当前比较活跃的发展方向.

多节点网络训练对 Makefile.config进行修改：

USE_MPI := 1# update with the path to binary MPI libraryCXX := /usr/bin/mpicxx

采用多节点进行训练也比较简单：

mpirun --hostfile path/to/hostfile -n <num_processes> /path/to/caffe/build/tools/caffe train --solver=/path/to/solver.prototxt --param_server=mpi

其中，
- - 使用节点的数目
- hostfile - 包含了每条线节点的ip地址
solver.prototxt中指定了各节点的train.prototxt，而且每个train.prototxt需要指定到数据集的不同部分. 更多细节，参考相关材料.

7 微调Fine-tuning

重复利用prototxt中定义的网络结构，并进行两处修改如下：
- 1 修改网络数据层，以适应新数据

layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  transform_param {    scale: 0.00390625 # 1/255  }  data_param {    source: "newdata_lmdb" # 指定到新的数据集    batch_size: 64    backend: LMDB  }}

2 修改输出层，这里是ip2网络层(注：在deploy.prototxt文件中进行同样的修改)

layer {  name: "ip2-ft" # 修改网络名  type: "InnerProduct"  bottom: "ip1"  top: "ip2-ft" # 修改网络输出名  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 2 # 修改为新数据集的类别数目，这里是2    bias_filler {      type: "constant"    }  }}

在Caffe中fine-tuning：

#From the command line on $CAFFE_ROOT./build/tools/caffe train -solver /path/to/solver.prototxt -weights  /path/to/trained_model.caffemodel

微调技巧：
- 首先学习最后网络输出层，其它层不变动
- 减小初始学习率，一般为10×或100×
- 可定义Caffe网络层的局部学习率 lr_mult
- 保持除了最后输出层或倒数第二层网络不变，以进行快速优化，即: 局部学习率lr_mult=0
- 增大最后输出层的局部学习率为10×，倒数第二层的局部学习率为5×
- 如果效果已足够好，停止，或者微调其它网络层

微调网络的特点：
- 创建了新的网络结构
- 复制初始化网络权重
- 类似于网络的训练，参考实例.

8 测试Testing

测试也被叫做推断、分类、或者打得分，可以使用Caffe提供的Python接口或者C++工具进行. C++工具不够灵活，推荐使用Python.
分类一张图片或信号或图像集，需要：
- 图片
- 网络结构
- 网络权重

8.1 测试图片集

模型的prototxt中应该有TEST数据层，指定了testing数据集，以测试模型表现：

/path/to/caffe/build/tools/caffe test -model /path/to/train_val.prototxt - weights /path/to/trained_model.caffemodel -iterations <num_iter>

该实例参考了材料.

8.2 测试单张图片

首先，在使用训练好的模型进行图片分类前，需要下载模型：

./scripts/download_model_binary.py models/bvlc_reference_caffenet

然后，下载数据集labels，来映射网络预测结果到图片类别，这里以ILSVRC2012为例：

./data/ilsvrc12/get_ilsvrc_aux.sh

最后，分类图片：

./build/examples/cpp_classification/classification.bin   models/bvlc_reference_caffenet/deploy.prototxt   models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel   data/ilsvrc12/imagenet_mean.binaryproto   data/ilsvrc12/synset_words.txt   examples/images/cat.jpg

输出结果样式：

---------- Prediction for examples/images/cat.jpg ----------0.3134 - "n02123045 tabby, tabby cat"0.2380 - "n02123159 tiger cat"0.1235 - "n02124075 Egyptian cat"0.1003 - "n02119022 red fox, Vulpes vulpes"0.0715 - "n02127052 lynx, catamount"

9 特征提取和可视化

网络卷积层的权重数据格式为： output_feature_maps x height x width x input_feature_maps，feature_maps也被叫做channels. Caffe的特征提取方式有两种： Python API和C++ API.

# 下载模型参数scripts/download_model_binary.py models/bvlc_reference_caffenet# Generate a list of the files to process# Use the images that ship with caffefind `pwd`/examples/images -type f -exec echo {} ; > examples/images/test.txt# Add a 0 to the end of each line# input data structures expect labels after each image file namesed -i "s/$/ 0/" examples/images/test.txt# Get the mean of trainint set to subtract it from images./data/ilsvrc12/get_ilsvrc_aux.sh# Copy and modify the data layer to load and resize the images:cp examples/feature_extraction/imagenet_val.prototxt examples/imagesvi examples/iamges/imagenet_val.prototxt# 提取特征./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel   examples/images/imagenet_val.prototxt fc7 examples/images/features 10 lmdb

这里提取了 fc7网络层的特征图，表现的是模型的最高层特征. 同样的，也可以提取其它层的特征，比如conv5、pool3等. 最后的参数10 lmdb是最小的batch size，提取的特征被保存在examples/images/features的LevelDB文件夹内.

10 Python API

Caffe提供了testing、分类、特定提取、网络定义和网络训练的Python API.

10.1 Caffe Python API 设置

编译Caffe后需要再执行make pycaffe，成功后即可进行调用：

import sys CAFFE_ROOT = '/path/to/caffe/' #路径要设置正确sys.path.insert(0, CAFFE_ROOT + 'python')import caffecaffe.set_mode_cpu() # CPU模式

10.2 加载网络结构API

网络结构定义在train_val.prototxt或者deploy.prototxt中：

net = caffe.Net('train_val.prototxt', caffe.TRAIN)

如果指定了权重，则：

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)

net中包含了数据块(net.blobs)和权重参数块(net.params)，以conv1网络层为例：
- net.blobs[‘conv1’] - conv1层的输出数据，也被叫做特征图(feature maps)
- net.params[‘conv1’][0] - conv1层权重项
- net.params[‘conv1’][1] - conv1层偏置项
- net.blobs.items() - 所有网络层的数据块

10.3 网络可视化API

这里需要安装pydot和graphviz模块：

sudo apt-get install -y GraphVizsudo pip install pydot

利用caffe的draw_net.py脚本实现可视化：

python python/draw_net.py examples/net_surgery/deploy.prototxt train_val_net.pngopen train_val_net.png

10.4 数据输入API

方式1：修改数据层以匹配图像大小

import numpy as np# get input image and arrange it as a 4-D tensorim = np.array(Image.open('/path/to/caffe/examples/images/cat_gray.jpg'))im = im[np.newaxis, np.newaxis, :, :]# resize the blob to be the size of the input imagenet.blobs['data'].reshape(im.shape) # if the image input is different # compute the blobs given the input datanet.blobs['data'].data[...] = im

方式2：修改输入数据以匹配网络数据层的图像大小

im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')shape = net.blobs['data'].data.shape# resize the img to be the size of the data blobim = caffe.io.resize(im, shape[3], shape[2], shape[1])# compute the blobs given the input datanet.blobs['data'].data[...] = im

数据层对输入数据一般会进行数据变换

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})ilsvrc_mean = 'python/caffe/imagenet/ilsvrc_2012_mean.npy'transformer.set_mean('data', np.load(ilsvrc_mean).mean(1).mean(1))# puts the channel as the first dimentiontransformer.set_transpose('data', (2,0,1))# (2,1,0) maps RGB to BGR for exampletransformer.set_channel_swap('data', (2,1,0))transformer.set_raw_scale('data', 255.0)# the batch size can be changed on-the-flynet.blobs['data'].reshape(1,3,227,227)# load the image in the data layerim = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')# transform the image and store it in the net.blobnet.blobs['data'].data[...] = transformer.preprocess('data', im)

图像可视化：

import matplotlib.pyplot as pltplt.imshow(im)

10.5 推断 Inference API

输入图像的网络预测：

# assumes that images are loadedprediction = net.forward()print 'predicted class:', prediction['prob'].argmax()

也可以统计forward propagation的时间(不包括数据处理的时间)：

timeit net.forward()

Caffe还提供了对多个输入数据同时进行数据变换和分类的Python API - net.Classifier，可以取代net.Net和caffe.io.Transformer.

im1 = caffe.io.load.images('/path/to/caffe/examples/images/cat.jpg')im2 = caffe.io.load.images('/path/to/caffe/examples/images/fish-bike.jpg')imgs = [im1, im2]ilsvrc_mean = '/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy'net = caffe.Classifier('deploy.prototxt', 'trained_model.caffemodel',                       mean=np.load(ilsvrc_mean).mean(1).mean(1),                       channel_swap=(2,1,0),                       raw_scale=255,                       image_dims=(256, 256))prediction = net.predict(imgs) # predict takes any number of imagesprint 'predicted classes:', prediction[0].argmax(), prediction[1].argmax()

对于多张图片的文件夹，只需修改 imgs部分：

IMAGES_FOLDER = '/path/to/folder/w/images/'import osimages = os.listdir(IMAGES_FOLDER)imgs = [ caffe.io.load_image(IMAGES_FOLDER + im) for im in images ]

plt.plot(prediction[0])  # 以bar chart的形式可视化所有类别的概率timeit net.predict([im1])  # 时间统计timeit net.predict([im1], oversample=0)

10.6 特征提取和可视化API

以fc7层为例，

# Retrieve details of the network's layers[(k, v.data.shape) for k, v in net.blobs.items()]# Retrieve weights of the network's layers[(k, v[0].data.shape) for k, v in net.params.items()]# Retrieve the features in the last fully connected layer# prior to outputting class probabilitiesfeat = net.blobs['fc7'].data[4]# Retrieve size/dimensions of the arrayfeat.shape# Assumes that the "net = caffe.Classifier" module has been called# and data has been formatted as in the example above# Take an array of shape (n, height, width) or (n, height, width, channels)# and visualize each (height, width) section in a grid# of size approx. sqrt(n) by sqrt(n)def vis_square(data, padsize=1, padval=0):    # values between 0 and 1    data -= data.min()    data /= data.max()    # force the number of filters to be square    n = int(np.ceil(np.sqrt(data.shape[0])))    padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))    # tile the filters into an image    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])    plt.imshow(data)plt.rcParams['figure.figsize'] = (25.0, 20.0)# visualize the weights after the 1st conv layernet.params['conv1'][0].data.shapefilters = net.params['conv1'][0].datavis_square(filters.transpose(0, 2, 3, 1))# visualize the feature maps after 1st conv layernet.blobs['conv1'].data.shapefeat = net.blobs['conv1'].data[0,:96]vis_square(feat, padval=1)# visualize the weights after the 2nd conv layernet.blobs['conv2'].data.shapefeat = net.blobs['conv2'].data[0,:96]vis_square(feat, padval=1)# visualize the weights after the 2nd pool layernet.blobs['pool2'].data.shapefeat = net.blobs['pool2'].data[0,:256] # change 256 data = np.pad(data, padding, mode='constanto number of pool outputsvis_square(feat, padval=1)# Visualize the neuron activations for the 2nd fully-connected layernet.blobs['ip2'].data.shapefeat = net.blobs['ip2'].data[0]plt.plot(feat.flat)plt.legend()plt.show()

10.7 网络定义API

from caffe import layers as Lfrom caffe import params as Pdef lenet(lmdb, batch_size):    # auto generated LeNet    n = caffe.NetSpec()    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2)    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)    n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))    n.relu1 = L.ReLU(n.ip1, in_place=True)    n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))    n.loss = L.SoftmaxWithLoss(n.ip2, n.label)    return n.to_proto()with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f:    f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64)))with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f:    f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))

生成的prototxt文件内容如下：

layer {  name: "data"  type: "Data"  top: "data"  top: "label"  transform_param {    scale: 0.00392156862745  }  data_param {    source: "examples/mnist/mnist_train_lmdb"    batch_size: 64    backend: LMDB  }}layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  convolution_param {    num_output: 20    kernel_size: 5    weight_filler {      type: "xavier"    }  }}layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "conv2"  type: "Convolution"  bottom: "pool1"  top: "conv2"  convolution_param {    num_output: 50    kernel_size: 5    weight_filler {      type: "xavier"    }  }}layer {  name: "pool2"  type: "Pooling"  bottom: "conv2"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "ip1"  type: "InnerProduct"  bottom: "pool2"  top: "ip1"  inner_product_param {    num_output: 500    weight_filler {      type: "xavier"    }  }}layer {  name: "relu1"  type: "ReLU"  bottom: "ip1"  top: "ip1"}layer {  name: "ip2"  type: "InnerProduct"  bottom: "ip1"  top: "ip2"  inner_product_param {    num_output: 10    weight_filler {      type: "xavier"    }  }}layer {  name: "loss"  type: "SoftmaxWithLoss"  bottom: "ip2"  bottom: "label"  top: "loss"}

10.8 网络训练API

solver = caffe.get_solver('models/bvlc_reference_caffenet/solver.prototxt')net = caffe.Net('train_val.prototxt', caffe.TRAIN)solver.net.forward()  # train netsolver.test_nets[0].forward()  # test net (there can be more than one)solver.net.backward() # 计算梯度# data gradientsnet.blobs['conv1'].diff# weight gradientsnet.params['conv1'][0].diff# biases gradientsnet.params['conv1'][1].diffsolver.step(1) # 进行一次迭代，包括一次forward propagation 和一次backward propagationsolver.step() # 进行solver.prototxt中定义的max_iter次迭代

11 调试 Debugging

Debugging是可选部分，只针对Caffe开发者.
Debugging有用的小技巧：
- 移除随机性 remove randomness
- 对比caffemodels compare caffemodels
- 利用Caffe的调试信息 use Caffe’s debug info

移除随机性有利于重用和输出. 随机性出现在很多阶段，如
- 权重的随机初始化，一般是从概率分布在进行初始化，比如Gaussion分布
- 输入图像的水平随机翻转、随机裁剪以及图像顺序的随机打乱等随机性
- dropout层随机训练部分权重，忽略其它权重

一中解决方案是使用seed，即在solver.prototxt中加入以下内容：

# pick some value for random_seed that is greater or equal to 1, for example:random_seed: 42

保证每次都是相同的’random’值. 不过在不同的机器上，seed会产生不同的值.
针对多台机器，一种鲁棒的方式是：
- 采用相同的打乱顺序的图片进行数据准备，即每次实验中不再打乱顺序
- train.prototxt的 ImageDataLayer层中，定义 transform_param不进行图片裁剪和镜像：

layer {  name: "data"  type: "ImageData"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param { #   mirror: true #   crop_size: 227    mean_value: 104    mean_value: 117    mean_value: 123  }  image_data_param {    source: "/path/to/file/train.txt"    batch_size: 32    new_height: 224    new_width: 224  }}

train.prototxt的dropout层，设置dropout_ratio=0
solver.prototxt中设置lr_policy=’fixed’
solver.prototxt中添加debug_info: 1

为了对比两个caffemodels，下面的脚本统计了两个caffemodels的所有权重间的差异之和：

# Intel Corporation# Author: Ravi Panchumarthyimport sys, os, argparse, timeimport pdbimport numpy as npdef get_args():    parser = argparse.ArgumentParser('Compare weights of two caffe models')    parser.add_argument('-m1', dest='modelFile1', type=str, required=True,                        help='Caffe model weights file to compare')    parser.add_argument('-m2', dest='modelFile2', type=str, required=True,                        help='Caffe model weights file to compare aganist')    parser.add_argument('-n', dest='netFile', type=str, required=True,                        help='Network prototxt file associated with model')    return parser.parse_args()if __name__ == "__main__":    import caffe    args = get_args()    net = caffe.Net(args.netFile, args.modelFile1, caffe.TRAIN)    net2compare = caffe.Net(args.netFile, args.modelFile2, caffe.TRAIN)    wt_sumOfAbsDiffByName = dict()    bias_sumOfAbsDiffByName = dict()    for name, blobs in net.params.iteritems():        wt_diffTensor = np.subtract(net.params[name][0].data, net2compare.params[name][0].data)        wt_absDiffTensor = np.absolute(wt_diffTensor)        wt_sumOfAbsDiff = wt_absDiffTensor.sum()        wt_sumOfAbsDiffByName.update({name : wt_sumOfAbsDiff})        # if args.layerDebug == 1:        #     print("%s : %s" % (name,wt_sumOfAbsDiff))        bias_diffTensor = np.subtract(net.params[name][1].data, net2compare.params[name][1].data)        bias_absDiffTensor = np.absolute(bias_diffTensor)        bias_sumOfAbsDiff = bias_absDiffTensor.sum()        bias_sumOfAbsDiffByName.update({name : bias_sumOfAbsDiff})    print("\nThe sum of absolute difference of all layer's weight is : %s" % sum(wt_sumOfAbsDiffByName.values()))    print("The sum of absolute difference of all layer's bias is : %s" % sum(bias_sumOfAbsDiffByName.values()))    finalDiffVal = sum(wt_sumOfAbsDiffByName.values())+ sum(bias_sumOfAbsDiffByName.values())    print("The sum of absolute difference of all layers weight's and bias's is : %s" % finalDiffVal )

在Makefile.config中取消注释 DEBUG := 1，以进一步的debugging：

gdb /path/to/caffe/build/caffe

gdb开始后，运行命令：

run train -solver /path/to/solver.prototxt

12 实例

12.1 LeNet on MNIST 手写字体

# 准备数据集cd $CAFFE_ROOT./data/mnist/get_mnist.sh # downloads MNIST dataset./examples/mnist/create_mnist.sh # creates dataset in LMDB format# 训练模型# Reduce the number of iterations from 10K to 1K to quickly run through this examplesed -i 's/max_iter: 10000/max_iter: 1000/g' examples/mnist/lenet_solver.prototxt./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt# 估计forward propagation和backward propagation的时间./build/tools/caffe time --model=examples/mnist/lenet_train_test.prototxt -iterations 50 # runs on CPU# 测试模型# the file with the model should have a 'phase: TEST'./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt   -weights examples/mnist/lenet_iter_1000.caffemodel -iterations 50

12.2 Dogs vs Cats

Kaggle下载Dogs vs Cats Dataset. 解压 dogvscat.zip，并运行 dogvscat.sh.

#!/usr/bin/env shCAFFE_ROOT=/path/to/caffemkdir dogvscatDOG_VS_CAT_FOLDER=/path/to/dogvscatcd $DOG_VS_CAT_FOLDER## Download datasets (requires first a login)#https://www.kaggle.com/c/dogs-vs-cats/download/train.zip#https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip# Unzip train and test datasudo apt-get -y install unzipunzip train.zip -d .unzip test1.zip -d .# Format datapython create_label_file.py # creates 2 text files with labels for training and validation./build_datasets.sh # build lmdbs# Download ImageNet pretrained weights (takes ~20 min)$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet # Fine-tune weights in the AlexNet architecture (takes ~100 min)$CAFFE_ROOT/build/tools/caffe train -solver $DOG_VS_CAT_FOLDER/dogvscat_solver.prototxt     -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel # Classify test datasetcd $DOGVSCAT_FOLDERpython convert_binaryproto2npy.pypython dogvscat_classify.py # Returns prediction.txt (takes ~30 min)# A better approach is to train five AlexNets w/init parameters from the same distribution,# fine-tune those five, and compute the average of the five networks

12.3 PASCAL VOC Classification

解压voc2012.zip，运行 voc2012.sh，以训练AlexNet.

#!/usr/bin/env sh# Copy and unzip voc2012.zip (it contains this file) then run this file. But first#  change paths in: voc2012.sh; build_datasets.sh; solvers/*; nets/*; classify.py# As you run various files, you can ignore the following error if it shows up:#  libdc1394 error: Failed to initialize libdc1394# set Caffe root directoryCAFFE_ROOT=$CAFFE_ROOTVOC=/path/to/voc2012chmod 700 *.sh# Download datasets# Details: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkitif [ ! -f VOCtrainval_11-May-2012.tar ]; then  wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tarfi# VOCtraival_11-May-2012.tar contains the VOC folder with:#  JPGImages: all jpg images#  Annotations: objects and corresponding bounding box/pose/truncated/occluded per jpg#  ImageSets: breaks the images by the type of task they are used for#  SegmentationClass and SegmentationObject: segmented images (duplicate directories)tar -xvf VOCtrainval_11-May-2012.tar# Run Python scripts to create labeled text filespython create_labeled_txt_file.py# Execute shell script to create training and validation lmdbs# Note that lmdbs directories w/the same name cannot exist prior to creating them./build_datasets.sh# Execute following command to download caffenet pre-trained weights (takes ~20 min)#  if weights exist already then the command is ignored$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet# Fine-tune weights in the AlexNet architecture (takes ~60 min)# you can also chose one of six solvers: pascal_solver[1-6].prototxt$CAFFE_ROOT/build/tools/caffe train -solver $VOC/solvers/voc2012_solver.prototxt   -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel# The lines below are not really needed; they served as examples on how to do some tasks# Test against voc2012_val_lmbd dataset (name of lmdb is the model under PHASE: test) $CAFFE_ROOT/build/tools/caffe test -model $VOC/nets/voc2012_train_val_ft678.prototxt    -weights $VOC/weights_iter_5000.caffemodel -iterations 116# Classify validation dataset: returns a file w/the labels of the val dataset#  but it doesn't report accuracy (that would require some adjusting of the code)python convert_binaryproto2npy.pymkdir resultspython cls_confidence.pypython average_precision.py

VOC相关信息：
- PASCAL VOC datasets
- 20 classes
- Training: 5,717 images, 13,609 objects
- Validation: 5,823 images, 13,841 objects
- Testing: 10,991 images

13 相关材料

Caffe Model-Zoo
Caffe主页
Soumith Chintala, “Intel are CPU magicians.” Oct. 2015
Dipankar Das, et al., “Distributed Deep Learning Using Synchronous Stochastic Gradient Descent.” Feb. 2016
Jeff Donahue, “Sequences in Caffe.” CVPR Tutorial, June 2015
Andrej Karpathy, “Caffe Tutorial.” Stanford CS 231n, 2015
Xinlei Chen, “Caffe Tutorial.” Carnegie Mellon University 16824, 2015
MIT Scene Recognition demo: Pick an image of a scene from an URL or give it your own