利用caffe pre-trained model进行图像分类

来源:互联网 发布:mysql 博客 编辑:程序博客网 时间:2024/05/15 07:08

本人主要分析如何利用caffe pre-trained model进行图像分类
caffe的examples中给出了该任务的具体程序,想要了解该过程,只要阅读该程序即可

Setup

配置python环境,导入numpy,并对显示部分进行设置

# set up Python environment: numpy for numerical routines, and matplotlib for plottingimport numpy as npimport matplotlib.pyplot as plt# display plots in this notebook%matplotlib inline# set display defaultsplt.rcParams['figure.figsize'] = (10, 10)        # large imagesplt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixelsplt.rcParams['image.cmap'] = 'gray'  # use grayscale output rather than a (potentially misleading) color heatmap

导入caffe(其实是pycaffe)

# The caffe module needs to be on the Python path;#  we'll add it here explicitly.import syscaffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)sys.path.insert(0, caffe_root + 'python')import caffe# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

下载models

下面,判断caffe_root路径下的models路径下,是否有caffemodel存在,如果不存在,则利用caffe_root下的scripts文件夹中的download_model_binary.py文件下载该caffe model
e.g., caffenet的caffemodel的名称为:bvlc_reference_caffenet.caffemodel,置于caffe_root路径下的models路径下的bvlc_reference_caffenet文件夹下(models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel)

caffe_root 二级目录 三级目录/文件 四级目录/文件 /models /bvlc_reference_caffenet /bvlc_reference_caffenet.caffemodel /scripts /download_model_binary.py /exampes /当前运行的程序 /python /caffe /imagenet/…

-‘../’即表示当前运行程序的上一级目录,以上表为例,记为caffe_root文件夹

导入model,并且进行预处理

从硬盘中读入net

# 设置caffe的模式,这里设置为CPU模式caffe.set_mode_cpu()# caffenet的网络结果prototxt文件model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'# caffenet的pre-trained model,即caffenet的整个训练好的模型参数model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'# 从硬盘中读入caffenetnet = caffe.Net(model_def,      # defines the structure of the model                model_weights,  # contains the trained weights                caffe.TEST)     # use test mode (e.g., don't perform dropout)

设置预处理transformer

# load the mean ImageNet image (as distributed with Caffe) for subtractionmu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel valuesprint 'mean-subtracted values:', zip('BGR', mu)# create transformer for the input called 'data'transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimensiontransformer.set_mean('data', mu)            # subtract the dataset-mean value in each channeltransformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

caffenet在traning图像时,对training images进行了一些预处理,那么,为了能够利用该pre-trained model对新的图像进行分类,必须要对new image进行preprocessing,在该程序中,利用了caffe.io.Transformer
具体地代码如下,下面对该代码进行简单解释(没有完全理解,但会逐步改进)

导入imagenet数据的所有图像的均值

imagenet数据集所有图像的均值

这里的ilsvrc_2012_mean.npy文件时numpy的数据文件,类型为

创建transformer

该transformer的主要作用是
(1)对读取到的图像所对应的array的维度进行转换
想要识别图像,需要利用python读取图像,python读取的图像格式为:图像的高、图像的宽、图像的channel
为了适应caffe的数据格式,需要将其转化为:图像的channel、图像的高、图像的宽
(2)输入图像的每个channel的所有像素值都减去imagenet数据库中的所有图像的三个channel的均值,即mu
(3)对测试图像进行rescale,python中的图像像素值为[0,1],为了利用caffe model,需要将该图像像素值变回[0,255]
(4)对输入图像的三个通道顺序进行变换,普通的图像都是R-G-B,但caffe在处理RGB图像时,将其变换为B-G-R

导入图像,进行分类

设置net的输入shape

# set the size of the input (we can skip this if we're happy#  with the default; we can also change it later, e.g., for different batch sizes)net.blobs['data'].reshape(50,        # batch size                          3,         # 3-channel (BGR) images                          227, 227)  # image size is 227x227

load image, 并利用transformer进行预处理

# 利用load_iamge从硬盘中导入图像,得到的image是一个(360, 480, 3)的ndarrayimage = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')# 对该图像进行preprocessing,得到ndarray的形状为(3, 227, 227)transformed_image = transformer.preprocess('data', image)print transformed_image.shape# 显示该图像plt.imshow(image)

利用网络对该输入的图像进行分类

# 将preprocessed的图像复制到分配给改net的内存中net.blobs['data'].data[...] = transformed_image# 计算网络输出,它是一个dict,key-prob对应的即为该输入图像的prob数值output = net.forward()# 从dict中取出该输入图像对应的prob向量,它的尺度为(1000,) output_prob = output['prob'][0]  # the output probability vector for the first image in the batchprint 'predicted class is:', output_prob.argmax()

输入的是一副cat,该段程序运行结果为:

predicted class is: 281

找到prob最大的那个位置所对应的label

# 导入imagenet数据集的label文件# 判断该label文件是否存在,如果不存在,则下载该文档if not os.path.exists(labels_file):    !../data/ilsvrc12/get_ilsvrc_aux.sh# 从txt文件中导入lables,它是一个(1000,)的ndarraylabels = np.loadtxt(labels_file, str, delimiter='\t')print 'output label:', labels[output_prob.argmax()]

结果为

output label: n02123045 tabby, tabby cat

查看5-top 预测结果

# sort top five predictions from softmax outputtop_inds = output_prob.argsort()[::-1][:5]  # reverse sort and take five largest itemsprint 'probabilities and labels:'zip(output_prob[top_inds], labels[top_inds])

结果如下:

probabilities and labels:
Out[27]:
[(0.31243625, ‘n02123045 tabby, tabby cat’),
(0.23797157, ‘n02123159 tiger cat’),
(0.12387245, ‘n02124075 Egyptian cat’),
(0.10075716, ‘n02119022 red fox, Vulpes vulpes’),
(0.070957333, ‘n02127052 lynx, catamount’)]

0 0
原创粉丝点击