R-CNN detection

来源：互联网发布：linux top wa 过高编辑：程序博客网时间：2024/05/18 16:54

R-CNN detection

R-CNN是一个非常优秀的目标检测模型，虽然相比今天很多state-of-the-art方法，它的精度和效率都有略显不足，但是该模型是很多算法的基础思想。本文以官网文本为基础，只做本地运行的修改和中文说明：http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb

细节信息可参考作者论文：Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014.Arxiv 2013.

---Last update 2015年6月8日

准备工作

1.下载训练好的R-CNN模型，也可以自己使用Caffe训练一个自己的模型。预训练模型基于Imagenet数据集，并在ILSVRC13上进行微调，输出200个检测分类。

下载方法：~/caffe-master$ ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13

2.下载Selective Search，并运行matlab编译相关mex文件。

(1) 下载方法：https://github.com/sergeyk/selective_search_ijcv_with_python ，下载后解压，改名，并复制到～/caffe-master/python/selective_search_ijcv_with_python/

(2) 编译方法：启动matlab客户端，并运行～/caffe-master/python/selective_search_ijcv_with_python/demo.m ，无报错信息运行后关闭matlab即可。

3.执行python/detect.py报错时，可参考如下修改方法：

(1) 报错信息：OSError: [Errno 2] No such file or directory

修改文件：~/caffe-master/python/selective_search_ijcv_with_python/selective_search.py修改前：mc = "matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)修改后：mc = "/usr/local/MATLAB/R2014a/bin/matlab -nojvm -r \"try; {}; catch; exit; end; exit\"".format(command)

(2) 报错信息：ValueError: 'axis' entry 2 is out of bounds (-2, 2)

修改文件：~/caffe-master/python/caffe/detector.py修改前：predictions = out[self.outputs[0]].squeeze(axis=(2, 3))修改后：predictions = out[self.outputs[0]].squeeze()

生成Region Proposals，并提取特征

创建临时目录，并导入检测样本。检测样本可以同时导入多个，但会被作为一个样本进行处理，这种方式适合多预处理融合。

% cd '/home/ouxinyu/caffe-master'! mkdir -p _temp! echo examples/images/fish-bike.jpg > _temp/det_input.txt

/home/ouxinyu/caffe-master

调用Selective Search进行Region Proposal，然后调用Caffe进行分类预测。默认运行于GPU模式，若需要运行于CPU模式，可去掉--gpu

! python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5

GPU modeWARNING: Logging before InitGoogleLogging() is written to STDERRI0608 10:32:38.067106  6131 net.cpp:42] Initializing net from parameters: name: "R-CNN-ilsvrc13"input: "data"input_dim: 10input_dim: 3input_dim: 227input_dim: 227state {  phase: TEST}layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  convolution_param {    num_output: 96    kernel_size: 11    stride: 4  }}layer {  name: "relu1"  type: "ReLU"  bottom: "conv1"  top: "conv1"}layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layer {  name: "norm1"  type: "LRN"  bottom: "pool1"  top: "norm1"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layer {  name: "conv2"  type: "Convolution"  bottom: "norm1"  top: "conv2"  convolution_param {    num_output: 256    pad: 2    kernel_size: 5    group: 2  }}layer {  name: "relu2"  type: "ReLU"  bottom: "conv2"  top: "conv2"}layer {  name: "pool2"  type: "Pooling"  bottom: "conv2"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layer {  name: "norm2"  type: "LRN"  bottom: "pool2"  top: "norm2"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layer {  name: "conv3"  type: "Convolution"  bottom: "norm2"  top: "conv3"  convolution_param {    num_output: 384    pad: 1    kernel_size: 3  }}layer {  name: "relu3"  type: "ReLU"  bottom: "conv3"  top: "conv3"}layer {  name: "conv4"  type: "Convolution"  bottom: "conv3"  top: "conv4"  convolution_param {    num_output: 384    pad: 1    kernel_size: 3    group: 2  }}layer {  name: "relu4"  type: "ReLU"  bottom: "conv4"  top: "conv4"}layer {  name: "conv5"  type: "Convolution"  bottom: "conv4"  top: "conv5"  convolution_param {    num_output: 256    pad: 1    kernel_size: 3    group: 2  }}layer {  name: "relu5"  type: "ReLU"  bottom: "conv5"  top: "conv5"}layer {  name: "pool5"  type: "Pooling"  bottom: "conv5"  top: "pool5"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layer {  name: "fc6"  type: "InnerProduct"  bottom: "pool5"  top: "fc6"  inner_product_param {    num_output: 4096  }}layer {  name: "relu6"  type: "ReLU"  bottom: "fc6"  top: "fc6"}layer {  name: "drop6"  type: "Dropout"  bottom: "fc6"  top: "fc6"  dropout_param {    dropout_ratio: 0.5  }}layer {  name: "fc7"  type: "InnerProduct"  bottom: "fc6"  top: "fc7"  inner_product_param {    num_output: 4096  }}layer {  name: "relu7"  type: "ReLU"  bottom: "fc7"  top: "fc7"}layer {  name: "drop7"  type: "Dropout"  bottom: "fc7"  top: "fc7"  dropout_param {    dropout_ratio: 0.5  }}layer {  name: "fc-rcnn"  type: "InnerProduct"  bottom: "fc7"  top: "fc-rcnn"  inner_product_param {    num_output: 200  }}I0608 10:32:38.067556  6131 net.cpp:370] Input 0 -> dataI0608 10:32:38.067576  6131 layer_factory.hpp:74] Creating layer conv1I0608 10:32:38.067585  6131 net.cpp:90] Creating Layer conv1I0608 10:32:38.067589  6131 net.cpp:410] conv1 <- dataI0608 10:32:38.067595  6131 net.cpp:368] conv1 -> conv1I0608 10:32:38.067603  6131 net.cpp:120] Setting up conv1I0608 10:32:38.108999  6131 net.cpp:127] Top shape: 10 96 55 55 (2904000)I0608 10:32:38.109035  6131 layer_factory.hpp:74] Creating layer relu1I0608 10:32:38.109048  6131 net.cpp:90] Creating Layer relu1I0608 10:32:38.109055  6131 net.cpp:410] relu1 <- conv1I0608 10:32:38.109063  6131 net.cpp:357] relu1 -> conv1 (in-place)I0608 10:32:38.109076  6131 net.cpp:120] Setting up relu1I0608 10:32:38.109233  6131 net.cpp:127] Top shape: 10 96 55 55 (2904000)I0608 10:32:38.109244  6131 layer_factory.hpp:74] Creating layer pool1I0608 10:32:38.109257  6131 net.cpp:90] Creating Layer pool1I0608 10:32:38.109263  6131 net.cpp:410] pool1 <- conv1I0608 10:32:38.109269  6131 net.cpp:368] pool1 -> pool1I0608 10:32:38.109277  6131 net.cpp:120] Setting up pool1I0608 10:32:38.109311  6131 net.cpp:127] Top shape: 10 96 27 27 (699840)I0608 10:32:38.109318  6131 layer_factory.hpp:74] Creating layer norm1I0608 10:32:38.109325  6131 net.cpp:90] Creating Layer norm1I0608 10:32:38.109329  6131 net.cpp:410] norm1 <- pool1I0608 10:32:38.109335  6131 net.cpp:368] norm1 -> norm1I0608 10:32:38.109341  6131 net.cpp:120] Setting up norm1I0608 10:32:38.109349  6131 net.cpp:127] Top shape: 10 96 27 27 (699840)I0608 10:32:38.109352  6131 layer_factory.hpp:74] Creating layer conv2I0608 10:32:38.109360  6131 net.cpp:90] Creating Layer conv2I0608 10:32:38.109364  6131 net.cpp:410] conv2 <- norm1I0608 10:32:38.109370  6131 net.cpp:368] conv2 -> conv2I0608 10:32:38.109376  6131 net.cpp:120] Setting up conv2I0608 10:32:38.109931  6131 net.cpp:127] Top shape: 10 256 27 27 (1866240)I0608 10:32:38.109947  6131 layer_factory.hpp:74] Creating layer relu2I0608 10:32:38.109954  6131 net.cpp:90] Creating Layer relu2I0608 10:32:38.109959  6131 net.cpp:410] relu2 <- conv2I0608 10:32:38.109966  6131 net.cpp:357] relu2 -> conv2 (in-place)I0608 10:32:38.109972  6131 net.cpp:120] Setting up relu2I0608 10:32:38.110002  6131 net.cpp:127] Top shape: 10 256 27 27 (1866240)I0608 10:32:38.110008  6131 layer_factory.hpp:74] Creating layer pool2I0608 10:32:38.110014  6131 net.cpp:90] Creating Layer pool2I0608 10:32:38.110018  6131 net.cpp:410] pool2 <- conv2I0608 10:32:38.110024  6131 net.cpp:368] pool2 -> pool2I0608 10:32:38.110030  6131 net.cpp:120] Setting up pool2I0608 10:32:38.110136  6131 net.cpp:127] Top shape: 10 256 13 13 (432640)I0608 10:32:38.110144  6131 layer_factory.hpp:74] Creating layer norm2I0608 10:32:38.110152  6131 net.cpp:90] Creating Layer norm2I0608 10:32:38.110157  6131 net.cpp:410] norm2 <- pool2I0608 10:32:38.110162  6131 net.cpp:368] norm2 -> norm2I0608 10:32:38.110168  6131 net.cpp:120] Setting up norm2I0608 10:32:38.110175  6131 net.cpp:127] Top shape: 10 256 13 13 (432640)I0608 10:32:38.110179  6131 layer_factory.hpp:74] Creating layer conv3I0608 10:32:38.110187  6131 net.cpp:90] Creating Layer conv3I0608 10:32:38.110191  6131 net.cpp:410] conv3 <- norm2I0608 10:32:38.110198  6131 net.cpp:368] conv3 -> conv3I0608 10:32:38.110203  6131 net.cpp:120] Setting up conv3I0608 10:32:38.111160  6131 net.cpp:127] Top shape: 10 384 13 13 (648960)I0608 10:32:38.111176  6131 layer_factory.hpp:74] Creating layer relu3I0608 10:32:38.111183  6131 net.cpp:90] Creating Layer relu3I0608 10:32:38.111189  6131 net.cpp:410] relu3 <- conv3I0608 10:32:38.111194  6131 net.cpp:357] relu3 -> conv3 (in-place)I0608 10:32:38.111202  6131 net.cpp:120] Setting up relu3I0608 10:32:38.111232  6131 net.cpp:127] Top shape: 10 384 13 13 (648960)I0608 10:32:38.111238  6131 layer_factory.hpp:74] Creating layer conv4I0608 10:32:38.111243  6131 net.cpp:90] Creating Layer conv4I0608 10:32:38.111248  6131 net.cpp:410] conv4 <- conv3I0608 10:32:38.111253  6131 net.cpp:368] conv4 -> conv4I0608 10:32:38.111260  6131 net.cpp:120] Setting up conv4I0608 10:32:38.112344  6131 net.cpp:127] Top shape: 10 384 13 13 (648960)I0608 10:32:38.112357  6131 layer_factory.hpp:74] Creating layer relu4I0608 10:32:38.112365  6131 net.cpp:90] Creating Layer relu4I0608 10:32:38.112370  6131 net.cpp:410] relu4 <- conv4I0608 10:32:38.112375  6131 net.cpp:357] relu4 -> conv4 (in-place)I0608 10:32:38.112381  6131 net.cpp:120] Setting up relu4I0608 10:32:38.112411  6131 net.cpp:127] Top shape: 10 384 13 13 (648960)I0608 10:32:38.112416  6131 layer_factory.hpp:74] Creating layer conv5I0608 10:32:38.112422  6131 net.cpp:90] Creating Layer conv5I0608 10:32:38.112427  6131 net.cpp:410] conv5 <- conv4I0608 10:32:38.112432  6131 net.cpp:368] conv5 -> conv5I0608 10:32:38.112439  6131 net.cpp:120] Setting up conv5I0608 10:32:38.113263  6131 net.cpp:127] Top shape: 10 256 13 13 (432640)I0608 10:32:38.113279  6131 layer_factory.hpp:74] Creating layer relu5I0608 10:32:38.113286  6131 net.cpp:90] Creating Layer relu5I0608 10:32:38.113291  6131 net.cpp:410] relu5 <- conv5I0608 10:32:38.113297  6131 net.cpp:357] relu5 -> conv5 (in-place)I0608 10:32:38.113303  6131 net.cpp:120] Setting up relu5I0608 10:32:38.113333  6131 net.cpp:127] Top shape: 10 256 13 13 (432640)I0608 10:32:38.113339  6131 layer_factory.hpp:74] Creating layer pool5I0608 10:32:38.113347  6131 net.cpp:90] Creating Layer pool5I0608 10:32:38.113350  6131 net.cpp:410] pool5 <- conv5I0608 10:32:38.113356  6131 net.cpp:368] pool5 -> pool5I0608 10:32:38.113363  6131 net.cpp:120] Setting up pool5I0608 10:32:38.113502  6131 net.cpp:127] Top shape: 10 256 6 6 (92160)I0608 10:32:38.113520  6131 layer_factory.hpp:74] Creating layer fc6I0608 10:32:38.113528  6131 net.cpp:90] Creating Layer fc6I0608 10:32:38.113533  6131 net.cpp:410] fc6 <- pool5I0608 10:32:38.113538  6131 net.cpp:368] fc6 -> fc6I0608 10:32:38.113545  6131 net.cpp:120] Setting up fc6I0608 10:32:38.140440  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.140478  6131 layer_factory.hpp:74] Creating layer relu6I0608 10:32:38.140492  6131 net.cpp:90] Creating Layer relu6I0608 10:32:38.140498  6131 net.cpp:410] relu6 <- fc6I0608 10:32:38.140506  6131 net.cpp:357] relu6 -> fc6 (in-place)I0608 10:32:38.140516  6131 net.cpp:120] Setting up relu6I0608 10:32:38.140576  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.140583  6131 layer_factory.hpp:74] Creating layer drop6I0608 10:32:38.140589  6131 net.cpp:90] Creating Layer drop6I0608 10:32:38.140594  6131 net.cpp:410] drop6 <- fc6I0608 10:32:38.140599  6131 net.cpp:357] drop6 -> fc6 (in-place)I0608 10:32:38.140605  6131 net.cpp:120] Setting up drop6I0608 10:32:38.140611  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.140616  6131 layer_factory.hpp:74] Creating layer fc7I0608 10:32:38.140622  6131 net.cpp:90] Creating Layer fc7I0608 10:32:38.140630  6131 net.cpp:410] fc7 <- fc6I0608 10:32:38.140636  6131 net.cpp:368] fc7 -> fc7I0608 10:32:38.140643  6131 net.cpp:120] Setting up fc7I0608 10:32:38.153045  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.153095  6131 layer_factory.hpp:74] Creating layer relu7I0608 10:32:38.153105  6131 net.cpp:90] Creating Layer relu7I0608 10:32:38.153112  6131 net.cpp:410] relu7 <- fc7I0608 10:32:38.153120  6131 net.cpp:357] relu7 -> fc7 (in-place)I0608 10:32:38.153129  6131 net.cpp:120] Setting up relu7I0608 10:32:38.153200  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.153206  6131 layer_factory.hpp:74] Creating layer drop7I0608 10:32:38.153214  6131 net.cpp:90] Creating Layer drop7I0608 10:32:38.153219  6131 net.cpp:410] drop7 <- fc7I0608 10:32:38.153224  6131 net.cpp:357] drop7 -> fc7 (in-place)I0608 10:32:38.153231  6131 net.cpp:120] Setting up drop7I0608 10:32:38.153237  6131 net.cpp:127] Top shape: 10 4096 (40960)I0608 10:32:38.153242  6131 layer_factory.hpp:74] Creating layer fc-rcnnI0608 10:32:38.153249  6131 net.cpp:90] Creating Layer fc-rcnnI0608 10:32:38.153254  6131 net.cpp:410] fc-rcnn <- fc7I0608 10:32:38.153259  6131 net.cpp:368] fc-rcnn -> fc-rcnnI0608 10:32:38.153267  6131 net.cpp:120] Setting up fc-rcnnI0608 10:32:38.154058  6131 net.cpp:127] Top shape: 10 200 (2000)I0608 10:32:38.154080  6131 net.cpp:194] fc-rcnn does not need backward computation.I0608 10:32:38.154085  6131 net.cpp:194] drop7 does not need backward computation.I0608 10:32:38.154090  6131 net.cpp:194] relu7 does not need backward computation.I0608 10:32:38.154095  6131 net.cpp:194] fc7 does not need backward computation.I0608 10:32:38.154100  6131 net.cpp:194] drop6 does not need backward computation.I0608 10:32:38.154105  6131 net.cpp:194] relu6 does not need backward computation.I0608 10:32:38.154110  6131 net.cpp:194] fc6 does not need backward computation.I0608 10:32:38.154115  6131 net.cpp:194] pool5 does not need backward computation.I0608 10:32:38.154129  6131 net.cpp:194] relu5 does not need backward computation.I0608 10:32:38.154134  6131 net.cpp:194] conv5 does not need backward computation.I0608 10:32:38.154139  6131 net.cpp:194] relu4 does not need backward computation.I0608 10:32:38.154145  6131 net.cpp:194] conv4 does not need backward computation.I0608 10:32:38.154150  6131 net.cpp:194] relu3 does not need backward computation.I0608 10:32:38.154155  6131 net.cpp:194] conv3 does not need backward computation.I0608 10:32:38.154160  6131 net.cpp:194] norm2 does not need backward computation.I0608 10:32:38.154165  6131 net.cpp:194] pool2 does not need backward computation.I0608 10:32:38.154170  6131 net.cpp:194] relu2 does not need backward computation.I0608 10:32:38.154175  6131 net.cpp:194] conv2 does not need backward computation.I0608 10:32:38.154180  6131 net.cpp:194] norm1 does not need backward computation.I0608 10:32:38.154193  6131 net.cpp:194] pool1 does not need backward computation.I0608 10:32:38.154198  6131 net.cpp:194] relu1 does not need backward computation.I0608 10:32:38.154203  6131 net.cpp:194] conv1 does not need backward computation.I0608 10:32:38.154208  6131 net.cpp:235] This network produces output fc-rcnnI0608 10:32:38.154220  6131 net.cpp:482] Collecting Learning Rate and Weight Decay.I0608 10:32:38.154227  6131 net.cpp:247] Network initialization done.I0608 10:32:38.154232  6131 net.cpp:248] Memory required for data: 62425920E0608 10:32:38.221285  6131 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodelI0608 10:32:38.324671  6131 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameterLoading input...selective_search_rcnn({'/home/ouxinyu/caffe-master/examples/images/fish-bike.jpg'}, '/tmp/tmpu85WGa.mat')Processed 1570 windows in 17.131 s./usr/lib/python2.7/dist-packages/pandas/io/pytables.py:2487: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannotmap directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']]  warnings.warn(ws, PerformanceWarning)Saved to _temp/det_output.h5 in 0.025 s.

下面的内容没什么问题，路径继续改改，说明直接贴原作的....

Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)

import numpy as npimport pandas as pdimport matplotlib.pyplot as plt%matplotlib inlinedf = pd.read_hdf('_temp/det_output.h5', 'df')print(df.shape)print(df.iloc[0])

(1570, 5)prediction    [-2.64134, -2.90464, -2.84325, -3.23465, -1.97...ymin                                                     79.846xmin                                                       9.62ymax                                                     246.31xmax                                                    339.624Name: /home/ouxinyu/caffe-master/examples/images/fish-bike.jpg, dtype: object

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

with open('data/ilsvrc12/det_synset_words.txt') as f:    labels_df = pd.DataFrame([        {            'synset_id': l.strip().split(' ')[0],            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]        }        for l in f.readlines()    ])labels_df.sort('synset_id')predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])print(predictions_df.iloc[0])

nameaccordion      -2.641338airplane       -2.904639ant            -2.843245antelope       -3.234649apple          -1.976960armadillo      -2.488007artichoke      -2.218568axe            -2.338795baby bed       -2.755479backpack       -2.180768bagel          -2.697270balance beam   -2.780527banana         -2.433329band aid       -1.631823banjo          -2.317316...trombone        -2.587927trumpet         -2.396858turtle          -2.376043tv or monitor   -2.763605unicycle        -2.254395vacuum          -1.918464violin          -2.746913volleyball      -2.758842waffle iron     -2.421376washer          -2.415665water bottle    -2.175697watercraft      -2.949454whale           -3.157514wine bottle     -2.790261zebra           -2.768192Name: 0, Length: 200, dtype: float32

Let's look at the activations.

plt.gray()plt.matshow(predictions_df.values)plt.xlabel('Classes')plt.ylabel('Windows')

<matplotlib.text.Text at 0x7faa0d591f50>

<matplotlib.figure.Figure at 0x7faa365216d0>

Now let's take max across all windows and plot the top classes.

max_s = predictions_df.max(0)max_s.sort(ascending=False)print(max_s[:10])

nameperson          1.839884bicycle         0.855625unicycle        0.068060motorcycle      0.003604banjo          -0.001440turtle         -0.030387electric fan   -0.220595cart           -0.225192lizard         -0.365948helmet         -0.477555dtype: float32

The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

# Find, print, and display the top detections: person and bicycle.i = predictions_df['person'].argmax()j = predictions_df['bicycle'].argmax()# Show top predictions for top detection.f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])print('Top detection:')print(f.order(ascending=False)[:5])print('')# Show top predictions for second-best detection.f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])print('Second-best detection:')print(f.order(ascending=False)[:5])# Show top detection in red, second-best top detection in blue.im = plt.imread('examples/images/fish-bike.jpg')plt.imshow(im)currentAxis = plt.gca()det = df.iloc[i]coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))det = df.iloc[j]coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))

Top detection:nameperson             1.839884swimming trunks   -1.157806turtle            -1.168884tie               -1.217268rubber eraser     -1.246662dtype: float32Second-best detection:namebicycle     0.855625unicycle   -0.334367scorpion   -0.824552lobster    -0.965544lamp       -1.076225dtype: float32

<matplotlib.patches.Rectangle at 0x7faa0d746c50>

That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.

def nms_detections(dets, overlap=0.3):    """    Non-maximum suppression: Greedily select high-scoring detections and    skip detections that are significantly covered by a previously    selected detection.    This version is translated from Matlab code by Tomasz Malisiewicz,    who sped up Pedro Felzenszwalb's code.    Parameters    ----------    dets: ndarray        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']    overlap: float        minimum overlap ratio (0.3 default)    Output    ------    dets: ndarray        remaining after suppression.    """    x1 = dets[:, 0]    y1 = dets[:, 1]    x2 = dets[:, 2]    y2 = dets[:, 3]    ind = np.argsort(dets[:, 4])    w = x2 - x1    h = y2 - y1    area = (w * h).astype(float)    pick = []    while len(ind) > 0:        i = ind[-1]        pick.append(i)        ind = ind[:-1]        xx1 = np.maximum(x1[i], x1[ind])        yy1 = np.maximum(y1[i], y1[ind])        xx2 = np.minimum(x2[i], x2[ind])        yy2 = np.minimum(y2[i], y2[ind])        w = np.maximum(0., xx2 - xx1)        h = np.maximum(0., yy2 - yy1)        wh = w * h        o = wh / (area[i] + area[ind] - wh)        ind = ind[np.nonzero(o <= overlap)[0]]    return dets[pick, :]

scores = predictions_df['bicycle']windows = df[['xmin', 'ymin', 'xmax', 'ymax']].valuesdets = np.hstack((windows, scores[:, np.newaxis]))nms_dets = nms_detections(dets)

Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.

plt.imshow(im)currentAxis = plt.gca()colors = ['r', 'b', 'y']for c, det in zip(colors, nms_dets[:3]):    currentAxis.add_patch(        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],        fill=False, edgecolor=c, linewidth=5)    )print 'scores:', nms_dets[:3, 4]

scores: [ 0.85562468 -0.73134422 -1.33959854]

This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.

You should try out detection on an image of your own next!

(Remove the temp directory to clean up, and we're done.)

!rm -rf _temp

0 0