Caffe 在自己的数据库上训练步骤

来源：互联网发布：小蚂蚁微信编辑器源码编辑：程序博客网时间：2024/05/01 10:12

回忆ImageNet的步骤：http://caffe.berkeleyvision.org/gathered/examples/imagenet.html

Brewing ImageNet

This guide is meant to get you ready to train your own model on your own data. If you just want an ImageNet-trained network, then note that since training takes a lot of energy and we hate global warming, we provide the CaffeNet model trained as described below in the model zoo.

Data Preparation

The guide specifies all paths and assumes all commands are executed from the root caffe directory.

By “ImageNet” we here mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time.

We assume that you already have downloaded the ImageNet training data and validation data, and they are stored on your disk like:

/path/to/imagenet/train/n01440764/n01440764_10026.JPEG/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG

You will first need to prepare some auxiliary data for training. This data can be downloaded by:

./data/ilsvrc12/get_ilsvrc_aux.sh

The training and validation input are described in train.txt and val.txt as text listing all the files and their labels. Note that we use a different indexing for labels than the ILSVRC devkit: we sort the synset names in their ASCII order, and then label them from 0 to 999. See synset_words.txt for the synset/name mapping.

You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightweight mincepie package. If you prefer things to be simpler, you can also use shell commands, something like:

for name in /path/to/imagenet/val/*.JPEG; do    convert -resize 256x256\! $name $namedone

Take a look at examples/imagenet/create_imagenet.sh. Set the paths to the train and val dirs as needed, and set “RESIZE=true” to resize all images to 256x256 if you haven’t resized the images in advance. Now simply create the leveldbs with examples/imagenet/create_imagenet.sh. Note thatexamples/imagenet/ilsvrc12_train_leveldb and examples/imagenet/ilsvrc12_val_leveldb should not exist before this execution. It will be created by the script. GLOG_logtostderr=1 simply dumps more information for you to inspect, and you can safely ignore it.

Compute Image Mean

The model requires us to subtract the image mean from each image, so we have to compute the mean. tools/compute_image_mean.cpp implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with them. Anyway, the mean computation can be carried out as:

./examples/imagenet/make_imagenet_mean.sh

which will make data/ilsvrc12/imagenet_mean.binaryproto.

Model Definition

We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their NIPS 2012 paper.

The network definition (models/bvlc_reference_caffenet/train_val.prototxt) follows the one in Krizhevsky et al. Note that if you deviated from file paths suggested in this guide, you’ll need to adjust the relevant paths in the .prototxt files.

If you look carefully at models/bvlc_reference_caffenet/train_val.prototxt, you will notice severalinclude sections specifying either phase: TRAIN or phase: TEST. These sections allow us to define two closely related networks in one file: the network used for training and the network used for testing. These two networks are almost identical, sharing all layers except for those marked with include { phase: TRAIN } or include { phase: TEST }. In this case, only the input layers and one output layer are different.

Input layer differences: The training network’s data input layer draws its data fromexamples/imagenet/ilsvrc12_train_leveldb and randomly mirrors the input image. The testing network’s data layer takes data from examples/imagenet/ilsvrc12_val_leveldb and does not perform random mirroring.

Output layer differences: Both networks output the softmax_loss layer, which in training is used to compute the loss function and to initialize the backpropagation, while in validation this loss is simply reported. The testing network also has a second output layer, accuracy, which is used to report the accuracy on the test set. In the process of training, the test network will occasionally be instantiated and tested on the test set, producing lines like Test score #0: xxx and Test score #1: xxx. In this case score 0 is the accuracy (which will start around 1/1000 = 0.001 for an untrained network) and score 1 is the loss (which will start around 7 for an untrained network).

We will also lay out a protocol buffer for running the solver. Let’s make a few plans:

We will run in batches of 256, and run a total of 450,000 iterations (about 90 epochs).
For every 1,000 iterations, we test the learned net on the validation data.
We set the initial learning rate to 0.01, and decrease it every 100,000 iterations (about 20 epochs).
Information will be displayed every 20 iterations.
The network will be trained with momentum 0.9 and a weight decay of 0.0005.
For every 10,000 iterations, we will take a snapshot of the current status.

Sound good? This is implemented in models/bvlc_reference_caffenet/solver.prototxt.

Training ImageNet

Ready? Let’s train.

./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

Sit back and enjoy!

数据集准备：

ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality.Therefore, we down-sampled the images to a fixed resolution of 256 × 256. Given arectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256 patch from the resulting image. We did not pre-process the imagesin any other way, except for subtracting the mean activity over the training set from each pixel.

参照 http://blog.csdn.net/u010417185/article/details/52651761

Data augmentation中的crop：

[python] view plain copy
 
layer {  
  name: "data"  
  type: "Data"  
  top: "data"  
  top: "label"  
  include {  
    phase: TRAIN  
  }  
  transform_param {  
    mirror: true  
    crop_size: 600  
    mean_file: "examples/images/imagenet_mean.binaryproto"  
  }  
  data_param {  
    source: "examples/images/train_lmdb"  
    batch_size: 256  
    backend: LMDB  
  }  
}  
layer {  
  name: "data"  
  type: "Data"  
  top: "data"  
  top: "label"  
  include {  
    phase: TEST  
  }  
  transform_param {  
    mirror: false  
    crop_size: 600  
    mean_file: "examples/images/imagenet_mean.binaryproto"  
  }  
  data_param {  
    source: "examples/images/val_lmdb"  
    batch_size: 50  
    backend: LMDB  
  }  
}  

从上面的数据层的定义,看得出用了镜像和crop_size,还定义了 mean_file。

利用crop_size这种方式可以剪裁中心关注点和边角特征,mirror可以产生镜像,弥补小数据集的不足.

这里要重点讲一下crop_size在训练层与测试层的区别：

首先我们需要了解mean_file和crop_size没什么大关系。mean_file是根据训练集图片制作出来的，crop_size是对训练集图像进行裁剪，两个都是对原始的训练集图像进行处理。如果原始训练图像的尺寸大小为800*800，crop_size的图片为600*600，则mean_file与crop_size的图片均为800*800的图像集。

文中用的是从256x256图像上crop224x224区域，而如果尺寸超过256，则crop size也需要增大，尽管在multi-scale training中，提倡将同一大小的crop用在不同大小输入图像上，但那里最大也就是512，差距还好。

在caffe中，如果定义了crop_size，那么在train时会对大于crop_size的图片进行随机裁剪，而在test时只是截取中间部分（详见/caffe/src/caffe/data_transformer.cpp）：

[python] view plain copy
 
//We only do random crop when we do training.  
    if (phase_ == TRAIN) {  
      h_off = Rand(datum_height - crop_size + 1);  
      w_off = Rand(datum_width - crop_size + 1);  
    } else {  
      h_off = (datum_height - crop_size) / 2;  
      w_off = (datum_width - crop_size) / 2;  
    }  
  }  

从上述的代码可以看出，如果我们输入的图片尺寸大于crop_size，那么图片会被裁剪。当 phase 模式为 TRAIN 时，裁剪是随机进行裁剪，而当为TEST 模式时，其裁剪方式则只是裁剪图像的中间区域。

下面是我在网上找到的自己进行图像裁剪的程序：

可对照给出的网址进行详细阅读：http://blog.csdn.NET/u011762313/article/details/48343799

我们可以手动将图片裁剪并导入pycaffe中，这样能够提高识别率（pycaffe利用caffemodel进行分类中：进行分类这一步改为如下）：

[python] view plain copy
 

#记录分类概率分布
pridects = np.zeros((1, CLASS_NUM))
# 图片维度（高、宽）
img_shape = np.array(img.shape)
# 裁剪的大小（高、宽）
crop_dims = (32, 96)
crop_dims = np.array(crop_dims)
# 这里使用的图片高度全部固定为32，长度可变，最小为96
# 裁剪起点为0，终点为w_range
w_range = img_shape[1] - crop_dims[1]
# 从左往右剪一遍，再从右往左剪一遍，步长为96/4=24
for k in range(0, w_range + 1, crop_dims[1] / 4) + range(w_range, 1, -crop_dims[1] / 4):
# 裁剪图片
crop_img = img[:, k:k + crop_dims[1], :]
# 数据输入、预处理
net.blobs['data'].data[...] = transformer.preprocess('data', crop_img)
# 前向迭代，即分类
out = net.forward()
# 每一次分类，概率分布叠加
pridects += out['prob']
# 取最大的概率分布为最终结果
pridect = pridects.argmax()

caffe中提供了过采样的方法（oversample），详见/caffe/python/caffe/io.py，裁剪的是图片中央、4个角以及镜像共10张图片。

在使用pycaffe定义网络、使用pycaffe进行网络训练与测试之后得到`caffemodel`文件，下面利用`caffemodel`进行分类：

导入相关库

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> caffe</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>

配置

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># caffemodel文件</span>MODEL_FILE = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'model/_iter_10000.caffemodel'</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># deploy文件，参考/caffe/models/bvlc_alexnet/deploy.prototxt</span>DEPLOY_FILE = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'deploy.prototxt'</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 测试图片存放文件夹</span>TEST_ROOT = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'datas/'</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li></ul>

GPU模式测试

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">caffe.set_mode_gpu()net = caffe.Net(DEPLOY_FILE, MODEL_FILE, caffe.TEST)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

数据输入预处理

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 'data'对应于deploy文件：</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input: "data"</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input_dim: 1</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input_dim: 3</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input_dim: 32</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># input_dim: 96</span>transformer = caffe.io.Transformer({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>: net.blobs[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>].data.shape})<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># python读取的图片文件格式为H×W×K，需转化为K×H×W</span>transformer.set_transpose(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>, (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>))<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># python中将图片存储为[0, 1]，而caffe中将图片存储为[0, 255]，</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 所以需要一个转换</span>transformer.set_raw_scale(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">255</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># caffe中图片是BGR格式，而原始格式是RGB，所以要转化</span>transformer.set_channel_swap(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>, (<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>))<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 将输入图片格式转化为合适格式（与deploy文件相同）</span>net.blobs[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>].reshape(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">32</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">96</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li></ul>

读取图片

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 详见/caffe/python/caffe/io.py</span>img = caffe.io.load_image(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'temp.jpg'</span>)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 读取的图片文件格式为H×W×K，需转化</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

进行分类

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;"><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 数据输入、预处理</span>net.blobs[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>].data[...] = transformer.preprocess(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'data'</span>, img)<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 前向迭代，即分类</span>out = net.forward()<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出结果为各个可能分类的概率分布</span>pridects = out[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'prob'</span>]<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 上述'prob'来源于deploy文件：</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># layer {</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#   name: "prob"</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#   type: "Softmax"</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#   bottom: "ip2"</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#   top: "prob"</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># }</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; background-color: rgb(238, 238, 238); top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right;"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li></ul>

最可能分类

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; background: transparent; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal;">pridect = pridects.argmax()</code>

注：如果图片过大，需要适当缩小batch_size的值，否则使用GPU时可能超出其缓存大小而报错

在AlexNet训练中，trainset的batch_size是256，testset的batch_size是50，与两个集合的大小不成比例。

关于输入图像尺寸问题：http://caffecn.cn/?/question/74

建议读一下caffe.proto文件，里面有对每种layer的详细参数定义，在ConvolutioalParameter里可以找到你想找到的。

看examples/imagenet里面的convert_imageset.sh

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/ilsvrc12_train_lmdb

构造一个网络首先要保证数据流是通的，即各层的输出形状的是整数，不能是小数。至于构造出来的网络效果好不好，按下不表。只要数据流通畅，你的输入图像是什么形状的都无所谓了。

如果你的图像是边长为 256 的正方形。那么卷积层的输出就满足 [ (256 - kernel_size)/ stride ] + 1 ，这个数值得是整数才行，否则没有物理意义。例如，你算得一个边长为 7.7 的 feature map 是没有物理意义的。 pooling 层同理可得。FC 层的输出形状总是满足整数，其唯一的要求就是整个训练过程中 FC 层的输入得是定长的。

如果你的图像不是正方形。那么可以在制作 leveldb / lmdb 数据库时，缩放到统一大小（非正方形）。然后再使用非正方形的 kernel_size 来使得卷积层的输出依然是整数。

其他问题：http://blog.csdn.net/u010417185/article/details/52649178

1、均值计算是否需要统一图像的尺寸？

在图像计算均值时，应该先统一图像的尺寸，否则会报出错误的。

粘贴一部分官方语言：

均值削减是数据预处理中常见的处理方式，按照之前在学习ufldl教程PCA的一章时，对于图像介绍了两种：第一种常用的方式叫做dimension_mean（个人命名），是依据输入数据的维度，每个维度内进行削减，这个也是常见的做法；第二种叫做per_image_mean，ufldl教程上说，在natural images上训练网络时；给每个像素（这里只每个dimension）计算一个独立的均值和方差是make little sense的；这是因为图像本身具有统计不变性，即在图像的一部分的统计特性和另一部分相同。作者最后建议，如果你训练你的算法在非natural images（如mnist，或者在白背景存在单个独立的物体），其他类型的规则化是值得考虑的。但是当在natural images上训练时，per_image_mean是一个合理的默认选择。

这段话意在告诉我们在训练的图像不同，我们均值采用的方法亦可发生变化。

了解完后我们来看一下如果图像尺寸不统一会报出什么样子的错误：

上图中很明显爆出了“size_in_datum == data_size ” 的错误。

下面是小编找到的问题原因：

在把图片转化到levelDB中遇到了Check failed: data.size() == data_size，归根究底还是源码没细看，找到出错的行在F0714 20:31:14.899121 26565 convert_imageset.cpp:84] convert_imageset.cpp中的第84行， CHECK_EQ(data.size(), data_size) << "Incorrect data field size " << data.size();就是说两个大小不一致，再看代码

[cpp] view plain copy
 
int data_size;  
   bool data_size_initialized = false;  
   for (int line_id = 0; line_id < lines.size(); ++line_id) {  
     if (!ReadImageToDatum(root_folder + lines[line_id].first,lines[line_id].second, datum)) {  
       continue;  
     }  
     if (!data_size_initialized) {  
       data_size = datum.channels() * datum.height() * datum.width();  
       data_size_initialized = true;  
     } else {  
       const string& data = datum.data();  
       CHECK_EQ(data.size(), data_size) << "Incorrect data field size "  
           << data.size();  
     }  

从上面的代码可知，第一次循环中，data_size_initialized=false，然后进入到if (!data_size_initialized) 中，把data_size设为了datum.channels() * datum.height() * datum.width()，同时把data_size_initialized=true，在以后的迭代中，都是执行else语句，从而加入图片大小不一致会报错，处理的办法可选的是，在转换到数据库levelDB前，让图片resize到一样的大小，或者把ReadImageToDatum改成ReadImageToDatum(root_folder + lines[line_id].first,lines[line_id].second,width,height ,datum)。

参考博文地址：http://blog.csdn.NET/alan317/article/details/37772457

2、caffe实际运行中图像大小不一，放大缩小时都有可能失真，此时该如何处理数据？

如果处理的图像大小不一且过度放大或者过度缩小会造成图像严重失真且丢失信息，则不能直接对图像尺寸进行归一化。

措施：

可以采用一个居中的尺寸，例如统一图像的宽度为600，而高度根据宽度的大小按照比例进行缩放。处理完之后可以对图像进行切片处理，进而将图像尺寸进行归一化。

3、Crop_size的作用？

对图像进行裁剪，如果原图为800*800，而我们只需进行600*600图像检测时，我们可以使用crop_size进行图像截取。当截取的模式为TRAIN时，截取方式为随机截取。其他的模式则只截取图像的中间区域。

具体可查看http://blog.csdn.net/u010417185/article/details/52651761

4、在网络配置文件中的 test_iter 值得确定

[python] view plain copy
 
# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10  
  
# The train/test net protocol buffer definition  
net: "examples/cifar10/cifar10_quick_train_test.prototxt"  
# test_iter specifies how many forward passes the test should carry out.  
# In the case of MNIST, we have test batch size 100 and 100 test iterations,  
# covering the full 10,000 testing images.  
test_iter: 100  
# Carry out testing every 500 training iterations.  
test_interval: 100  
# The base learning rate, momentum and the weight decay of the network.  
base_lr: 0.001  
momentum: 0.9  
weight_decay: 0.004  
# The learning rate policy  
lr_policy: "fixed"  
# Display every 100 iterations  
display: 100  
# The maximum number of iterations  
max_iter: 4000  
# snapshot intermediate results  
snapshot: 4000  
snapshot_format: HDF5  
snapshot_prefix: "examples/cifar10/cifar10_quick"  
# solver mode: CPU or GPU  
solver_mode: CPU  

在设置配置时，对于test_iter值的计算有一点模糊，不知是根据batch size 值与整体图像库（测试集合与训练集合）还是单独的某个图像集合数据计算获得。后来通过认真读给出的解释与实例，最终确定该值是batch size 值与测试图像集合计算获得的。若batch size 值为100，而训练集合含有6000幅图片，测试集含有1000幅图片，则test_iter值为1000/10，与训练集的图片量无关。

整体步骤：参照http://blog.csdn.net/alexqiweek/article/details/51281240

1.数据准备

在caffe/data下新建目录myself，并在myself里又新建两个目录train、val。

注意：图片的格式必须为.jpeg格式

train存放训练用的数据源；该目录下又两个目录bird(70张图)、cat(70张图)

val存放用于测试的数据源；bird和cat各20张图

在终端下切换到caffe/data/myself目录下，利用上面的数据源生成train.txt、val.txt、test.txt。

test.txt的内容和val.txt相同，只是没有后面的数字标识。

生成val.txt的命令：find -name *.jpeg |grep -v train | cut -d/ -f3>val.txt

生成train.txt的命令：find -name *.jpeg |grep train | cut -d/ -f3-4 > train.txt；但由于bird和cat的图片需要通过在后面添加不同的数字区分开来，因此还需命令：sed -i '1,70s/.*/& 0/' train.txt和sed -i'71,141s/.*/& 1/' train.txt

2创建数据库

在caffe/example目录下新建目录myself。并将caffe/examples/imagenet目录下create_imagenet.sh文件拷贝到myself中。

create_imagenet.sh的内容如下：

第5行的EXAMPLE指定生成的数据库文件存放路径。

第6行的DATA指定生成数据库所需文件来源路径。

第9行的TRAIN_DATA_ROOT指明存放训练数据的绝对路径。

第10行的VAL_DATA_ROOT指明存放测试数据的绝对路径。TRAIN_DATA_ROOT和VAL_DATA_ROOT写错了，就会报一堆找不到图片的错误。

第12行到21行用于将图片调节成统一大小，256X256。

第45、55行指定生成的数据库文件夹的名称。

在caffe的主目录下输了命令./examples/myself/create_imagenet.sh就会在create_imagenet.sh中的EXAMPLE所指定的目录下(此次为example/myself)生成两个数据库文件。

3训练网络【使用CaffeNet网络进行训练的时间可能比LeNet网络用的时间多,本次实验使用的网络是CaffeNet】

① 拷贝models/bvlc_alexnet目录下的train_val.prototxt文件到example/myself目录下。

该文件的定义的为待训练网络的结构。

②拷贝models/bvlc_alexnet目录下的solver.prototxt文件到example/myself目录下。

该文件为训练网络时的所需的一些配置和设置

第1行指定定义网络结构的文件的相对路径。

③ 拷贝examples/imagenet目录下的make_imagenet_mean.sh文件到examples/myself目录下。用于计算图像均值，使用的源文件在/tools/compute_image_mean.cpp。

④ 拷贝examples/imagenet目录下的train_caffenet.sh文件到example/myself目录下。

该文件为一个脚本文件，内容为训练网络的命令

在caffe的主目录下输入命令：./ examples/myself/train_caffenet.sh开始训练网络。

4使用测试数据测试网络

使用命令：./build/tools/caffe.bintest --model=examples/myself/train_val.prototxt --weights=examples/myself/caffenet_model/caffenet_train_iter_16000.caffemodel对网络进行测试。Train_val.prototxt为网络的定义；caffenet_train_iter_16000.caffemodel为训练网络时生成的模型。

[出现的问题]