利用caffe pre-trained model进行图像分类

来源：互联网发布：mysql 博客编辑：程序博客网时间：2024/05/15 07:08

本人主要分析如何利用caffe pre-trained model进行图像分类
caffe的examples中给出了该任务的具体程序，想要了解该过程，只要阅读该程序即可

Setup

配置python环境，导入numpy，并对显示部分进行设置

# set up Python environment: numpy for numerical routines, and matplotlib for plottingimport numpy as npimport matplotlib.pyplot as plt# display plots in this notebook%matplotlib inline# set display defaultsplt.rcParams['figure.figsize'] = (10, 10)        # large imagesplt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixelsplt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

导入caffe（其实是pycaffe）

# The caffe module needs to be on the Python path;#  we'll add it here explicitly.import syscaffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)sys.path.insert(0, caffe_root + 'python')import caffe# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

下载models

下面，判断caffe_root路径下的models路径下，是否有caffemodel存在，如果不存在，则利用caffe_root下的scripts文件夹中的download_model_binary.py文件下载该caffe model
e.g., caffenet的caffemodel的名称为：bvlc_reference_caffenet.caffemodel，置于caffe_root路径下的models路径下的bvlc_reference_caffenet文件夹下（models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel）

caffe_root 二级目录三级目录/文件四级目录/文件 /models /bvlc_reference_caffenet /bvlc_reference_caffenet.caffemodel /scripts /download_model_binary.py /exampes /当前运行的程序 /python /caffe /imagenet/…

-‘../’即表示当前运行程序的上一级目录，以上表为例，记为caffe_root文件夹

导入model，并且进行预处理

从硬盘中读入net

# 设置caffe的模式，这里设置为CPU模式caffe.set_mode_cpu()# caffenet的网络结果prototxt文件model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'# caffenet的pre-trained model，即caffenet的整个训练好的模型参数model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'# 从硬盘中读入caffenetnet = caffe.Net(model_def,      # defines the structure of the model                model_weights,  # contains the trained weights                caffe.TEST)     # use test mode (e.g., don't perform dropout)

设置预处理transformer

# load the mean ImageNet image (as distributed with Caffe) for subtractionmu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel valuesprint 'mean-subtracted values:', zip('BGR', mu)# create transformer for the input called 'data'transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimensiontransformer.set_mean('data', mu)            # subtract the dataset-mean value in each channeltransformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

caffenet在traning图像时，对training images进行了一些预处理，那么，为了能够利用该pre-trained model对新的图像进行分类，必须要对new image进行preprocessing，在该程序中，利用了caffe.io.Transformer
具体地代码如下，下面对该代码进行简单解释（没有完全理解，但会逐步改进）

导入imagenet数据的所有图像的均值

imagenet数据集所有图像的均值

这里的ilsvrc_2012_mean.npy文件时numpy的数据文件，类型为

创建transformer

该transformer的主要作用是
（1）对读取到的图像所对应的array的维度进行转换
想要识别图像，需要利用python读取图像，python读取的图像格式为：图像的高、图像的宽、图像的channel
为了适应caffe的数据格式，需要将其转化为：图像的channel、图像的高、图像的宽
（2）输入图像的每个channel的所有像素值都减去imagenet数据库中的所有图像的三个channel的均值，即mu
（3）对测试图像进行rescale，python中的图像像素值为[0,1]，为了利用caffe model，需要将该图像像素值变回[0,255]
（4）对输入图像的三个通道顺序进行变换，普通的图像都是R-G-B，但caffe在处理RGB图像时，将其变换为B-G-R

导入图像，进行分类

设置net的输入shape

# set the size of the input (we can skip this if we're happy#  with the default; we can also change it later, e.g., for different batch sizes)net.blobs['data'].reshape(50,        # batch size                          3,         # 3-channel (BGR) images                          227, 227)  # image size is 227x227

load image, 并利用transformer进行预处理

# 利用load_iamge从硬盘中导入图像，得到的image是一个(360, 480, 3)的ndarrayimage = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')# 对该图像进行preprocessing，得到ndarray的形状为(3, 227, 227)transformed_image = transformer.preprocess('data', image)print transformed_image.shape# 显示该图像plt.imshow(image)

利用网络对该输入的图像进行分类

# 将preprocessed的图像复制到分配给改net的内存中net.blobs['data'].data[...] = transformed_image# 计算网络输出，它是一个dict，key-prob对应的即为该输入图像的prob数值output = net.forward()# 从dict中取出该输入图像对应的prob向量，它的尺度为(1000,) output_prob = output['prob'][0]  # the output probability vector for the first image in the batchprint 'predicted class is:', output_prob.argmax()

输入的是一副cat，该段程序运行结果为：

predicted class is: 281

找到prob最大的那个位置所对应的label

# 导入imagenet数据集的label文件# 判断该label文件是否存在，如果不存在，则下载该文档if not os.path.exists(labels_file):    !../data/ilsvrc12/get_ilsvrc_aux.sh# 从txt文件中导入lables，它是一个(1000,)的ndarraylabels = np.loadtxt(labels_file, str, delimiter='\t')print 'output label:', labels[output_prob.argmax()]

结果为

output label: n02123045 tabby, tabby cat

查看5-top 预测结果

# sort top five predictions from softmax outputtop_inds = output_prob.argsort()[::-1][:5]  # reverse sort and take five largest itemsprint 'probabilities and labels:'zip(output_prob[top_inds], labels[top_inds])

结果如下：

probabilities and labels:
Out[27]:
[(0.31243625, ‘n02123045 tabby, tabby cat’),
(0.23797157, ‘n02123159 tiger cat’),
(0.12387245, ‘n02124075 Egyptian cat’),
(0.10075716, ‘n02119022 red fox, Vulpes vulpes’),
(0.070957333, ‘n02127052 lynx, catamount’)]

0 0