caffe之python接口实战 :00-classification 官方教程源码解析

来源:互联网 发布:杭州高达软件怎么样 编辑:程序博客网 时间:2024/06/05 15:18

本文是官方文档的源码解析笔记系列之一

注1:本文内容属于caffe_root/example/下的ipynb文件的源码解析,旨在通过源码注释,加速初学者的学习进程。
注2:以下解析中,未对各部分英文注释做翻译,旨在告诫初学者,应该去适应原汁原味的英文教程阅读,这样有助于提升自己阅读技术文献的能力,也是高级程序员的必备素养。
注3:建议大家在jupyter nootebook环境下结合源码注释,运行程序。

Classification: Instant Recognition with Caffe

In this example we’ll classify an image with the bundled CaffeNet model (which is based on the network architecture of Krizhevsky et al. for ImageNet).

We’ll compare CPU and GPU modes and then dig into the model to inspect features and the output.

1. Setup

  • First, set up Python, numpy, and matplotlib.
# set up Python environment: numpy for numerical routines, and matplotlib for plottingimport numpy as npimport matplotlib.pyplot as plt# display plots in this notebook%matplotlib inline# set display defaultsplt.rcParams['figure.figsize'] = (10, 10)        # large imagesplt.rcParams['image.interpolation'] = 'nearest'  # don't interpolate: show square pixelsplt.rcParams['image.cmap'] = 'gray'  #RGB即gray参数, use grayscale output rather than a (potentially misleading) color heatmap 。#cmap: 颜色图谱(colormap), 默认绘制为RGB(A)颜色空间,gray为进行plt绘图显示时值都会映射到0-255之间
  • Load caffe.
import sys#caffe_root路径应该改成你的实际路径,注意多余空格的影响caffe_root = '/home/slb103/softwares/caffe/'  # this file should be run from {caffe_root}/examples (otherwise change this line),sys.path.insert(0,  caffe_root + 'python') #也可以用的sys.path.append()import caffe# If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.
  • If needed, download the reference model (“CaffeNet”, a variant of AlexNet).
import os#检查caffenet参数模型是否存在,不存在则执行脚本下载模型到指定目录if os.path.isfile(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'):    print 'CaffeNet found.'else:    print 'download pre-trained CaffeNet model...'    !caffe_root/scripts/download_model_binary.py caffe_root/models/bvlc_reference_caffenet #!是ipython里的用法,可以执行一个脚本

2. Load net and set up input preprocessing

  • Set Caffe to CPU mode and load the net from disk.
caffe.set_mode_cpu()model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'net = caffe.Net(model_def,      # defines the structure of the model初始化网络结构                model_weights,  # contains the trained weights装入权重参数                caffe.TEST)     # use test mode (e.g., don't perform dropout),设置Test则在前向计算时不会执行dropout操作。
  • Set up input preprocessing. (We’ll use Caffe’s caffe.io.Transformer to do this, but this step is independent of other parts of Caffe, so any custom preprocessing code may be used).

    Our default CaffeNet is configured to take images in BGR format. Values are expected to start in the range [0, 255] and then have the mean ImageNet pixel value subtracted from them. In addition, the channel dimension is expected as the first (outermost) dimension.

    As matplotlib will load images with values in the range [0, 1] in RGB format with the channel as the innermost dimension, we are arranging for the needed transformations here.

# load the mean ImageNet image (as distributed with Caffe) for subtractionmu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')#1表示计算均值的轴向,CHW为图像各位置像素的均值,第一个1为求竖向求均值,得CW,在用第二个1横向求均值,得每个通道的均值!mu = mu.mean(1).mean(1)  # average over pixels to obtain the mean (BGR) pixel values,print 'mean-subtracted values:', zip('BGR', mu)#输出三个值#装载的图像为RGB,HWC,【0-1】的图像# create transformer for the input called 'data',设置参数不分先后顺序transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGRtransformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension   HWC变为CHWtransformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255],且均值为0-255之间,所以需要转化transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel

3. CPU classification

  • Now we’re ready to perform classification. Even though we’ll only classify one image, we’ll set a batch size of 50 to demonstrate batching.
#deploy.prototxt中data的shape默认设为了10*3*227*227,这里采用了reshap可改变batchsize大小,#当然batchsize可以大于或等于实际传入data层的图像数,但不可以小于# set the size of the input (we can skip this if we're happy#  with the default; we can also change it later, e.g., for different batch sizes)#输入图像加一个第0轴,变为50X3X227X227net.blobs['data'].reshape(50,        # batch size                          3,         # 3-channel (BGR) images                          227, 227)  # image size is 227x227
  • Load an image (that comes with Caffe) and perform the preprocessing we’ve set up.
#装载的图像为RGB,HWC,【0-1】的图像,cat为480X360image = caffe.io.load_image(caffe_root + 'examples/images/cat.jpg')transformed_image = transformer.preprocess('data', image)plt.imshow(image)print image
  • Adorable! Let’s classify it!
# copy the image data into the memory allocated for the netnet.blobs['data'].data[...] = transformed_image#输入图像在被forward之前会resize到227X227### perform classificationoutput = net.forward()#这里的一个前向操作为50X3X227X227的bloboutput_prob = output['prob'][0]  # the output probability vector for the first image in the batch输出batchsize中第一张图像的输出概率结果print 'predicted class is:', output_prob.argmax()
  • The net gives us a vector of probabilities; the most probable class was the 281st one. But is that correct? Let’s check the ImageNet labels…
# load ImageNet labelslabels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'if not os.path.exists(labels_file):                            #同os.path.isfile(labels_file):    !caffe_root/data/ilsvrc12/get_ilsvrc_aux.shlabels = np.loadtxt(labels_file, str, delimiter='\t') #txt中每一行为一个列表,列表元素为str型字符串,print 'output label:', labels[output_prob.argmax()]#对应txt中某一行的字符串
  • “Tabby cat” is correct! But let’s also look at other top (but less confident predictions).
# sort top five predictions from softmax outputtop_inds = output_prob.argsort()[::-1][:5]  # reverse sort and take five largest itemsprint 'probabilities and labels:'zip(output_prob[top_inds], labels[top_inds])#两个list合成一个list,输出以元组为元素的list
  • We see that less confident predictions are sensible.

4. Switching to GPU mode

  • Let’s see how long classification took, and compare it to GPU mode.
%timeit net.forward() #第一个参数3指定调用timeit()多少次。第二个参数指定timeit()的number参数
  • That’s a while, even for a batch of 50 images. Let’s switch to GPU mode.
caffe.set_device(0) #if we have multiple GPUs, pick the first onecaffe.set_mode_gpu()net.forward()  # run once before timing to set up memory%timeit net.forward()
  • That should be much faster!

5. Examining intermediate output

  • A net is not just a black box; let’s take a look at some of the parameters and intermediate activations.

First we’ll see how to read out the structure of the net in terms of activation and parameter shapes.

  • For each layer, let’s look at the activation shapes, which typically have the form (batch_size, channel_dim, height, width).

    The activations are exposed as an OrderedDict, net.blobs.

# for each layer, show the output shape输出每一层的层名和blob视图for layer_name, blob in net.blobs.iteritems():    print layer_name + '\t' + str(blob.data.shape)
  • Now look at the parameter shapes. The parameters are exposed as another OrderedDict, net.params. We need to index the resulting values with either [0] for weights or [1] for biases.

    The param shapes typically have the form (output_channels, input_channels, filter_height, filter_width) (for the weights) and the 1-dimensional shape (output_channels,) (for the biases).

for layer_name, param in net.params.iteritems():    print layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape)#0代表权重项,1代表偏置项
  • Since we’re dealing with four-dimensional data here, we’ll define a helper function for visualizing sets of rectangular heatmaps.
#看的不是很懂def vis_square(data): #data为numpy数组,第0轴为filter或输出的特征图数,其他为H, W, C,方便用plt显示    """Take an array of shape (n, height, width) or (n, height, width, 3)       and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)"""    # normalize data for display    data = (data - data.min()) / (data.max() - data.min())    # force the number of filters to be square    n = int(np.ceil(np.sqrt(data.shape[0])))    padding = (((0, n ** 2 - data.shape[0]),               (0, 1), (0, 1))                 # add some space between filters               + ((0, 0),) * (data.ndim - 3))  # don't pad the last dimension (if there is one)    data = np.pad(data, padding, mode='constant', constant_values=1)  # pad with ones (white)    # tile the filters into an image    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])    plt.imshow(data); plt.axis('off')
  • First we’ll look at the first layer filters, conv1
# the parameters are a list of [weights, biases]filters = net.params['conv1'][0].data #conver1会生成96个特征图。解释:用96个卷积核做卷积,每个卷积核含3个fillters,因为输入为3通道,filter参数量为96*3*k*kvis_square(filters.transpose(0, 2, 3, 1))#96*3*k*k变为96*k*k*3再用plt可视化,由于每个卷积核三通道所以会形成彩色可视化结果
  • The first layer output, conv1 (rectified responses of the filters above, first 36 only)
feat = net.blobs['conv1'].data[0, :36]#代表batchsize的第0个图像,前36个通道特征图,vis_square(feat)#输入为36×H×W
  • The fifth layer after pooling, pool5
feat = net.blobs['pool5'].data[0]#代表batchsize的第0个图像,所有通道特征图vis_square(feat)#输入为所有通道×H×W
  • The first fully connected layer, fc6 (rectified)

    We show the output values and the histogram of the positive values

#输出batchsize第一个图像输出特征的值图和直方统计图feat = net.blobs['fc6'].data[0]plt.subplot(2, 1, 1)plt.plot(feat.flat)#feat.flat会变成向量plt.subplot(2, 1, 2)_ = plt.hist(feat.flat[feat.flat > 0], bins=100)#_代表用不到返回值
  • The final probability output, prob
feat = net.blobs['prob'].data[0] #batchsize的第一张图片的prob层处理后的输出结果plt.figure(figsize=(15, 3))plt.plot(feat.flat)

Note the cluster of strong predictions; the labels are sorted semantically. The top peaks correspond to the top predicted labels, as shown above.

6. Try your own image

Now we’ll grab an image from the web and classify it using the steps above.

  • Try setting my_image_url to any JPEG image URL.
# download an image#my_image_url = "..."  # paste your URL here# for example:my_image_url = "https://upload.wikimedia.org/wikipedia/commons/b/be/Orang_Utan%2C_Semenggok_Forest_Reserve%2C_Sarawak%2C_Borneo%2C_Malaysia.JPG"!wget -O image.jpg $my_image_url #对下载的文件重命名 为  image.jpg 。O –output-document=FILE即将下载文件保存为别的文件名,这里为  image.jpg # transform it and copy it into the netimage = caffe.io.load_image('image.jpg') #该函数读入RGB HWC 0-1之间值形式的图像net.blobs['data'].data[...] = transformer.preprocess('data', image)#deploy.prototxt中data的shape默认设为了10*3*227*227,这里采用了reshap可改变batchsize大小#当然batchsize可以大于或等于实际传入data层的图像数,但不可以小于# set the size of the input (we can skip this if we're happy#  with the default; we can also change it later, e.g., for different batch sizes)#输入图像加一个第0轴,变为50X3X227X227#net.blobs['data'].reshape(50,        # batch size #                         3,         # 3-channel (BGR) images #                         227, 227)  # image size is 227x227# perform classificationnet.forward()#这里前向的batchsize为1# obtain the output probabilitiesoutput_prob = net.blobs['prob'].data[0]# sort top five predictions from softmax outputtop_inds = output_prob.argsort()[::-1][:5] plt.imshow(image)print 'probabilities and labels:'zip(output_prob[top_inds], labels[top_inds]) #两个list合成一个list,输出以元组为元素的list
阅读全文
0 0