使用Caffe对图片进行训练并分类的简单流程

来源：互联网发布：土拨鼠装修怎么样知乎编辑：程序博客网时间：2024/05/21 11:10

step 1. 首先，确保caffe已经正确安装，并且make runtest基本通过。

step 2. 准备训练集：

在训练之前，要准备训练需要的训练集，为了验证训练的效果，最好也准备一定数量的测试集。由于caffe的数据集的输入是leveldb格式，因此在训练前需要将输入转为相应格式。

在caffe_root/example 文件夹中，提供了一些例子，cifar10与imagenet均是将图片形式数据库转换为leveldb格式，mnist则是将其本身的数据集转换为leveldb格式。这就要求我们需要把自己的数据集转换成leveldb格式，需要自己编写程序实现。下面以jpg格式图片为例说明。

在此，假设数据库本身是以图片形式给出，并且给出了label或者同一类别的图片已经分类。这样我们就可以通过imagenet例子中给出的create_imagenet.sh 对我们自己的数据库进行转换，要求数据集图片格式为jpg，以下为具体操作步骤：

A．若数据集已经给出label则忽略此步骤。图片按照类别装在不同文件夹下的情况，自己编写mklabel.sh命令，对图片进行处理并标明label。具体操作参照mklabel.sh 程序说明。

<pre name="code" class="plain">mklabel.sh#!/bin/sh#----------------------------------------------------#文件存放形式为#dir/subdir1/files...#dir/subdir2/files...#dir/subdir3/files...#dir/subdirX/files...#用法：#1.$ sh mklabel.sh dir startlabel ;dir 为目标文件夹名称#2.$ chmod a+x mklabel.sh ；然后可以直接用文件名运行#3.默认label信息显示在终端，请使用转向符'>'生成文本，例：#$ sh ./mklabel.sh  data/faces94/male  > label.txt#4.确保文件夹下除了图片不含其他文件(若含有则需自行添加判断语句)#-----------------------------------------------------DIR=~/codes/mklabel.sh#命令位置（无用）label=1#label起始编号(为数字，根据自己需要修改)testnum=0#保留的测试集大小if test $# -eq 0;then#无参数，默认为当前文件夹下，label=1$DIR . 0 $labelelseif test $# -eq 1;then#仅有位置参数，默认testnum=0,label=1$DIR $1 0 $labelelseif test $# -eq 2;then#两个参数时,label=1$DIR $1 $2 $labelelsetestnum=$2#每个类别保留测试集大小label=$3#自定义label起始cd $1#转到目标文件夹if test $testnum -ne 0;thenmkdir "testdata"#建立测试集fifor i in * ; doexist=`expr "$i" != "testdata"`if test -d $i && test $exist -eq 1;then#文件夹存在#echo #echo 'DIR:' $icd $i#进入文件夹num=1#图片数目for j in *doif test $num -gt $testnum;thenecho  $j  $labelmv $j ../finum=`expr $num + 1`donecd ..#回到上层目录if test $testnum -eq 0;thenrmdir $ielsemv $i ./testdatafilabel=`expr $label + 1`#计算labelfidonefififi

B．修改create_imagenet.sh 文件，使其中的图片源地址与标明label的txt文件对应到自己数据库的相应文件。其中的convert_imageset.cpp 函数的参数依次为图片文件夹位置，label.txt文本文件，目标数据集名称，是否随机存储图片与label（即打乱图片数据的读入顺序）。

若你所使用的数据集不是通过图片形式给出，为了方便，可以根据leveldb数据的存储格式，自己编写程序转换数据集。

C．多通道图片需要用到meanfile，通过example/imagenet文件夹下的shell函数make_imagenet_mean.sh，更改相应函数，很轻松的得到binaryproto文件，在训练时可能需要用到。

step 3. 使用自己的数据进行训练：

以最简单的mnist网络为例，因为数据集的不同则需要更改的参数也不同。在训练前最好在example下新建个文件夹命名为数据集的名称，然后把mnist下的5个文件：

lenet.prototxt

lenet_solver.prototxt

lenet_train.prototxt

lenet_test.prototxt

train_lenet.sh

复制到新文件夹下，以上5个文件为必需的文件。按顺序对以上文件进行修改，在忽略网络结构的情况下，需要修改的有:

a. lenet.prototxt：

input_dim: 64input_dim: 1input_dim: 28input_dim: 28

分别为一次导入的图片个数，channel，heigth ，width。

倒数第二层，即输入给softmax层数据的那层，其中的num_output需要更改为实际图片的label数，即图片类别数。否则在进行训练时，会出现错误。

b. lenet_solver.prototxt：

如果之前文件名没改的话则不需要更改以上两项，否则改为对应文件。其他参数根据实际需要更改。

c. lenet_train.prototxt：

需要把data层的数据源文件替换为自己的数据。

在训练多通道图片时，此处最好需要有一个meanfile参数。例如cifar10

num_output参数参照lenet.prototxt修改。

d. lenet_test.prototxt：

参照lenet_train.prototxt 进行相应修改。

e. train_lenet.sh：

lenet_solver.prototxt文件名未更改的话则不需要进行改动。

step 4. 使用自己的model进行分类：

Reference: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/classification.ipynb

假设之前的所有步骤已经成功，并且已经训练出正确率还行的model，则现在就可以使用model对图片进行分类。

a. 首先确保已经正确安装了pythonwrapper，以及caffe_root/python/文件夹下requirements.txt文件中所需要的组件。

b. 另外，还需要meanfile的npy文件。即需要把数据对应的binaryproto文件转换到npy文件。Caffe_root/python/caffe/io.cpp文件中已经给了对应的API。

具体参照：https://github.com/BVLC/caffe/issues/420

需要对blobproto_to_array 进行修改，得到blobproto_to_array2.

即删去了blob.num 项。

通过调用此API进行处理，具体python函数如下：

convert_bproto_to_npy.py

#!/usr/bin/pythonimport numpy as npfrom caffe.io import blobproto_to_array2from caffe.proto import caffe_pb2blob = caffe_pb2.BlobProto()filename = './imagenet_mean.binaryproto'data = open(filename, "rb").read() blob.ParseFromString(data) nparray =blobproto_to_array2(blob) f = file("mean.npy","wb") np.save(f,nparray) f.close()

c. 按照demo步骤进行分类：

代码的主要部分参照

classify.py

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/imagenet_classification.ipynb

classifymap.py

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

输出部分的说明

1) Classify.py

输出的是prediction[0],一维数组，以imagenet为例就是大小为1000的数组，每个值对应概率大小。

处理部分代码：

preDict = {}for i in xrange(len(prediction[0])):preDict[prediction[0][i]] = ifor i in xrange(5):val = sorted(preDict.keys())[-i -1]print("%d %f %s" %(preDict[val], val * 100, linecache.getline(SYNSET, preDict[val])))

把数组的值和索引对应到字典中，值作为键，把数组值进行排序，再用前五个作为键，找出索引值，也就是对应的类别。

为了能够直观地显示数字所代表的类别名，需要输出synset_words.txt文件中对应行数的内容。这里用的是linecache.getline()函数，需要

import caffe

为了修改方便，把synset_words.txt的路径设为变量

SYNSET = '../../data/ilsvrc12/synset_words.txt'

（序号概率值（百分比）对应种类）

2) Classifymap.py

输出的是

outMat =out['prob'][0].argmax(axis=0)

是一个二维矩阵，8*8大小，每个值对应的都是一种类别，出现的越多代表概率越高。

处理部分代码：

h1, w1 = np.shape(outMat)outList = list(np.reshape(outMat,h1 * w1))#print(outList) outSet = set(outList)outdict = {}for x in outSet:  outdict[outList.count(x)]=x appear = outdict.keys()applen = len(appear)if len <= 5:    for i in xrange(applen):        print('%d %d: %s'%(outdict[appear[-(i+1)]], appear[-(i+1)],linecache.getline(SYNSET,outdict[appear[-(i+1)]])))   else:    for i in xrange(5):        print('%d %d: %s'%(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))

和上面的文件大致思路相同。但是需要先把矩阵展开成一维列表，用set()变成集合，也就是去掉重复元素，再一一对应到字典中，然后通过count()找到每个值在矩阵中出现的次数，进行排序即可。

（序号出现次数对应种类）

3) 源代码：

Classiy.py

import numpy as npimport matplotlib.pyplot as pltimport pylabimport caffecaffe_root = '../'MODEL_FILE = '../imagenet/imagenet_deploy.prototxt'PRETRAINED = '../imagenet/caffe_reference_imagenet_model'IMAGE_FILE = '../images/cat.jpg'#net = caffe.Classifier(MODEL_FILE, PRETRAINED, mean_file = caffe_root + '../python/caffe/imagenet/ilsvrc_2012_mean.npy', channel_swap = (2,1,0), input_scale = 255)net = caffe.Classifier(MODEL_FILE, PRETRAINED, mean_file = caffe_root + 'mean2.npy', channel_swap = (2,1,0), input_scale = 255)net.set_phase_test()net.set_mode_gpu()input_image = caffe.io.load_image(IMAGE_FILE)pylab.ion()plt.imshow(input_image)#pylab.show()prediction = net.predict([input_image])print 'prediction shape:', prediction[0].shape#print(prediction[0])plt.plot(prediction[0])#pylab.show()preDict = {}preList = list(prediction[0])for i in preList:    preDict[preList[i]] = ipreLen = len(preList)for i in xrange(5):print('%d %d: %s' %(preDict[preDict.keys[-(i+1)]], preDict.values[-(i+1)], linecache.getline(SYNSET,preDict.values[-(i+1)])))

classifymap.py

import caffeimport matplotlib.pyplot as pltimport pylabimport numpy as npimport linecacheIMAGE_FILE ='../images/dog.jpg'SYNSET = '../../data/ilsvrc12/synset_words.txt'# Load the original network and extract the fully-connected layers' parameters.net = caffe.Net('../imagenet/imagenet_deploy.prototxt', '../imagenet/caffe_reference_imagenet_model')params = ['fc6', 'fc7', 'fc8']# fc_params = {name: (weights, biases)}fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}#for fc in params:    #print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)# Load the fully-convolutional network to transplant the parameters.net_full_conv = caffe.Net('../imagenet/imagenet_full_conv.prototxt', '../imagenet/caffe_reference_imagenet_model')params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']# conv_params = {name: (weights, biases)}conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}#for conv in params_full_conv:    #print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)for pr, pr_conv in zip(params, params_full_conv):    conv_params[pr_conv][1][...] = fc_params[pr][1]for pr, pr_conv in zip(params, params_full_conv):    out, in_, h, w = conv_params[pr_conv][0].shape    W = fc_params[pr][0].reshape((out, in_, h, w))    conv_params[pr_conv][0][...] = W# net_full_conv.save('../imagenet/caffe_imagenet_full_conv')# load input and configure preprocessingim = caffe.io.load_image(IMAGE_FILE)plt.imshow(im)#pylab.show()net_full_conv.set_mean('data', '../../python/caffe/imagenet/ilsvrc_2012_mean.npy')net_full_conv.set_channel_swap('data', (2,1,0))net_full_conv.set_input_scale('data', 255.0)# make classification map by forward pass and show top prediction index per locationout = net_full_conv.forward_all(data=np.asarray([net_full_conv.preprocess('data', im)]))outMat = out['prob'][0].argmax(axis=0)h1, w1 = np.shape(outMat)outList = list(np.reshape(outMat,h1 * w1))#print(outList)outSet = set(outList)outdict = {}for x in outSet:outdict[outList.count(x)]=xappear = outdict.keys()applen = len(appear)if len <= 5:    for i in xrange(applen):        print('%d %d: %s' %(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))else:    for i in xrange(5):        print('%d %d: %s' %(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))

<pre name="code" class="plain">mklabel.sh#!/bin/sh#----------------------------------------------------#文件存放形式为#dir/subdir1/files...#dir/subdir2/files...#dir/subdir3/files...#dir/subdirX/files...#用法：#1.$ sh mklabel.sh dir startlabel ;dir 为目标文件夹名称#2.$ chmod a+x mklabel.sh ；然后可以直接用文件名运行#3.默认label信息显示在终端，请使用转向符'>'生成文本，例：#$ sh ./mklabel.sh  data/faces94/male  > label.txt#4.确保文件夹下除了图片不含其他文件(若含有则需自行添加判断语句)#-----------------------------------------------------DIR=~/codes/mklabel.sh#命令位置（无用）label=1#label起始编号(为数字，根据自己需要修改)testnum=0#保留的测试集大小if test $# -eq 0;then#无参数，默认为当前文件夹下，label=1$DIR . 0 $labelelseif test $# -eq 1;then#仅有位置参数，默认testnum=0,label=1$DIR $1 0 $labelelseif test $# -eq 2;then#两个参数时,label=1$DIR $1 $2 $labelelsetestnum=$2#每个类别保留测试集大小label=$3#自定义label起始cd $1#转到目标文件夹if test $testnum -ne 0;thenmkdir "testdata"#建立测试集fifor i in * ; doexist=`expr "$i" != "testdata"`if test -d $i && test $exist -eq 1;then#文件夹存在#echo #echo 'DIR:' $icd $i#进入文件夹num=1#图片数目for j in *doif test $num -gt $testnum;thenecho  $j  $labelmv $j ../finum=`expr $num + 1`donecd ..#回到上层目录if test $testnum -eq 0;thenrmdir $ielsemv $i ./testdatafilabel=`expr $label + 1`#计算labelfidonefififi

若你所使用的数据集不是通过图片形式给出，为了方便，可以根据leveldb数据的存储格式，自己编写程序转换数据集。

step 3. 使用自己的数据进行训练：

lenet.prototxt

lenet_solver.prototxt

lenet_train.prototxt

lenet_test.prototxt

train_lenet.sh

复制到新文件夹下，以上5个文件为必需的文件。按顺序对以上文件进行修改，在忽略网络结构的情况下，需要修改的有:

a. lenet.prototxt：

input_dim: 64input_dim: 1input_dim: 28input_dim: 28

分别为一次导入的图片个数，channel，heigth ，width。

倒数第二层，即输入给softmax层数据的那层，其中的num_output需要更改为实际图片的label数，即图片类别数。否则在进行训练时，会出现错误。

b. lenet_solver.prototxt：

如果之前文件名没改的话则不需要更改以上两项，否则改为对应文件。其他参数根据实际需要更改。

c. lenet_train.prototxt：

需要把data层的数据源文件替换为自己的数据。

在训练多通道图片时，此处最好需要有一个meanfile参数。例如cifar10

num_output参数参照lenet.prototxt修改。

d. lenet_test.prototxt：

参照lenet_train.prototxt 进行相应修改。

e. train_lenet.sh：

lenet_solver.prototxt文件名未更改的话则不需要进行改动。

step 4. 使用自己的model进行分类：

Reference: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/classification.ipynb

假设之前的所有步骤已经成功，并且已经训练出正确率还行的model，则现在就可以使用model对图片进行分类。

a. 首先确保已经正确安装了pythonwrapper，以及caffe_root/python/文件夹下requirements.txt文件中所需要的组件。

b. 另外，还需要meanfile的npy文件。即需要把数据对应的binaryproto文件转换到npy文件。Caffe_root/python/caffe/io.cpp文件中已经给了对应的API。

具体参照：https://github.com/BVLC/caffe/issues/420

需要对blobproto_to_array 进行修改，得到blobproto_to_array2.

即删去了blob.num 项。

通过调用此API进行处理，具体python函数如下：

convert_bproto_to_npy.py

#!/usr/bin/pythonimport numpy as npfrom caffe.io import blobproto_to_array2from caffe.proto import caffe_pb2blob = caffe_pb2.BlobProto()filename = './imagenet_mean.binaryproto'data = open(filename, "rb").read() blob.ParseFromString(data) nparray =blobproto_to_array2(blob) f = file("mean.npy","wb") np.save(f,nparray) f.close()

c. 按照demo步骤进行分类：

代码的主要部分参照

classify.py

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/imagenet_classification.ipynb

classifymap.py

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

输出部分的说明

1) Classify.py

输出的是prediction[0],一维数组，以imagenet为例就是大小为1000的数组，每个值对应概率大小。

处理部分代码：

preDict = {}for i in xrange(len(prediction[0])):preDict[prediction[0][i]] = ifor i in xrange(5):val = sorted(preDict.keys())[-i -1]print("%d %f %s" %(preDict[val], val * 100, linecache.getline(SYNSET, preDict[val])))

把数组的值和索引对应到字典中，值作为键，把数组值进行排序，再用前五个作为键，找出索引值，也就是对应的类别。

为了能够直观地显示数字所代表的类别名，需要输出synset_words.txt文件中对应行数的内容。这里用的是linecache.getline()函数，需要

import caffe

为了修改方便，把synset_words.txt的路径设为变量

SYNSET = '../../data/ilsvrc12/synset_words.txt'

（序号概率值（百分比）对应种类）

2) Classifymap.py

输出的是

outMat =out['prob'][0].argmax(axis=0)

是一个二维矩阵，8*8大小，每个值对应的都是一种类别，出现的越多代表概率越高。

处理部分代码：

h1, w1 = np.shape(outMat)outList = list(np.reshape(outMat,h1 * w1))#print(outList) outSet = set(outList)outdict = {}for x in outSet:  outdict[outList.count(x)]=x appear = outdict.keys()applen = len(appear)if len <= 5:    for i in xrange(applen):        print('%d %d: %s'%(outdict[appear[-(i+1)]], appear[-(i+1)],linecache.getline(SYNSET,outdict[appear[-(i+1)]])))   else:    for i in xrange(5):        print('%d %d: %s'%(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))

（序号出现次数对应种类）

3) 源代码：

Classiy.py

import numpy as npimport matplotlib.pyplot as pltimport pylabimport caffecaffe_root = '../'MODEL_FILE = '../imagenet/imagenet_deploy.prototxt'PRETRAINED = '../imagenet/caffe_reference_imagenet_model'IMAGE_FILE = '../images/cat.jpg'#net = caffe.Classifier(MODEL_FILE, PRETRAINED, mean_file = caffe_root + '../python/caffe/imagenet/ilsvrc_2012_mean.npy', channel_swap = (2,1,0), input_scale = 255)net = caffe.Classifier(MODEL_FILE, PRETRAINED, mean_file = caffe_root + 'mean2.npy', channel_swap = (2,1,0), input_scale = 255)net.set_phase_test()net.set_mode_gpu()input_image = caffe.io.load_image(IMAGE_FILE)pylab.ion()plt.imshow(input_image)#pylab.show()prediction = net.predict([input_image])print 'prediction shape:', prediction[0].shape#print(prediction[0])plt.plot(prediction[0])#pylab.show()preDict = {}preList = list(prediction[0])for i in preList:    preDict[preList[i]] = ipreLen = len(preList)for i in xrange(5):print('%d %d: %s' %(preDict[preDict.keys[-(i+1)]], preDict.values[-(i+1)], linecache.getline(SYNSET,preDict.values[-(i+1)])))

classifymap.py

import caffeimport matplotlib.pyplot as pltimport pylabimport numpy as npimport linecacheIMAGE_FILE ='../images/dog.jpg'SYNSET = '../../data/ilsvrc12/synset_words.txt'# Load the original network and extract the fully-connected layers' parameters.net = caffe.Net('../imagenet/imagenet_deploy.prototxt', '../imagenet/caffe_reference_imagenet_model')params = ['fc6', 'fc7', 'fc8']# fc_params = {name: (weights, biases)}fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}#for fc in params:    #print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)# Load the fully-convolutional network to transplant the parameters.net_full_conv = caffe.Net('../imagenet/imagenet_full_conv.prototxt', '../imagenet/caffe_reference_imagenet_model')params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']# conv_params = {name: (weights, biases)}conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}#for conv in params_full_conv:    #print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)for pr, pr_conv in zip(params, params_full_conv):    conv_params[pr_conv][1][...] = fc_params[pr][1]for pr, pr_conv in zip(params, params_full_conv):    out, in_, h, w = conv_params[pr_conv][0].shape    W = fc_params[pr][0].reshape((out, in_, h, w))    conv_params[pr_conv][0][...] = W# net_full_conv.save('../imagenet/caffe_imagenet_full_conv')# load input and configure preprocessingim = caffe.io.load_image(IMAGE_FILE)plt.imshow(im)#pylab.show()net_full_conv.set_mean('data', '../../python/caffe/imagenet/ilsvrc_2012_mean.npy')net_full_conv.set_channel_swap('data', (2,1,0))net_full_conv.set_input_scale('data', 255.0)# make classification map by forward pass and show top prediction index per locationout = net_full_conv.forward_all(data=np.asarray([net_full_conv.preprocess('data', im)]))outMat = out['prob'][0].argmax(axis=0)h1, w1 = np.shape(outMat)outList = list(np.reshape(outMat,h1 * w1))#print(outList)outSet = set(outList)outdict = {}for x in outSet:outdict[outList.count(x)]=xappear = outdict.keys()applen = len(appear)if len <= 5:    for i in xrange(applen):        print('%d %d: %s' %(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))else:    for i in xrange(5):        print('%d %d: %s' %(outdict[appear[-(i+1)]], appear[-(i+1)], linecache.getline(SYNSET,outdict[appear[-(i+1)]])))

1 0