【object detection】RCNN 实践篇

来源：互联网发布：mac创建电脑账户怎么弄编辑：程序博客网时间：2024/06/03 22:04

前言

最近根据制定的“Deep Learning”学习计划，11月份的主要任务是：熟悉各大DL网络模型，主要以分类和检测为主；看论文；熟悉病理数据等。我们有一个2人组的小分队，我这个月的主要工作集中在学习目标检测的经典算法以及基于tensorflow或者keras跑一些经典的案例，主要有R-CNN，SPP-Net，Fast-RCNN，Faster-RCNN，YOLO等；另一名成员主要学习分类相关的经典网络模型，主要是google-net一系列的模型（inception-v1，inception-v2，inception-v3，resnet 等）。我们分别要整理出一份关于检测和分类的详细报告，然后不断完善、互相交流讨论、分享，发挥小分队的优势。

本文主要结合实践对R-CNN 进行整理。其中 R 表示的是候选区（Region Proposal）， CNN 部分使用经典的 AlexNet 网络模型，先来看个关于 AlexNet 的图示感受一下。

2012 年，Hinton 的学生 Alex Krizhevsky 提出了深度卷积神经网络模型AlexNet，它可以算是 LeNet 的一种更深更宽的版本，几个关键点整理如下：

1）首次在CNN 中成功应用了 ReLU（非线性激活函数）、Dropout （防止过拟合）和 LRN 等Trick。
2）使用了GPU 进行运算加速，作者开源了他们在 GPU 上训练卷积神经网络 CUDA 代码。
3）包含了6 亿 3000 万个连接，6000 万个参数和65 万个神经元，拥有 5 个卷积层，其中 3 个卷积层后面连接了最大池化层(MaxPooling Layer)，最后还有 3 个全连接层。
4）AlexNet 以显著的优势赢得了竞争激烈的ILSVRC 2012 比赛，top-5 的错误率降低至了16.4%，相比第二名的成绩 26.2%错误率有了巨大的提升。
5）AlexNet 可以说是神经网络在低谷期后的第一次发声，确立了深度学习（深度卷积网络）在计算机视觉的统治地位，同时也推动了深度学习在语音识别、自然语言处理、强化学习等领域的拓展。

本文不过多的介绍理论知识，待后续有更深刻的理解，再针对性的整理相关的理论篇。

AlexNet 模型结构

再来看看R-CNN的图示：

RossB.Girshick（RBG）大神使用 Region Proposal + CNN 代替传统目标检测使用的滑动窗口 + 手工设计特征，设计了R-CNN框架。与 R-CNN 相关的几个 关键词 如下所示：

1）Region proposals：

2）Selective Search：选择性搜索，找出可能含有物体的框，这些框之间是可以互相重叠互相包含，这样就可以避免暴力枚举出所有框。

3）Warp and Crop：修正区域大小，以适合CNN的输入（AlexNet 的 input layer 的图像大小为 224 x 224 x3，3表示 RGB 3个颜色通道）。

4）Supervisedpre-training：有监督预训练也称迁移学习

5）IOU：交并比IOU=(A∩B)/(A∪B)

6）NMS：非极大值抑制（参考：http://blog.csdn.net/H2008066215019910120/article/details/25917609）

7）DPM：使用判别训练的部件模型进行目标检测

关于选择性搜索算法，如下图所示，有这么多相关的算法，“Selective Search”只是其中一种。

R-CNN的缺点：

1）要求输入固定大小的图片，需要对原始图片进行crop（裁剪）或者wrap（缩放），在一定程度上导致图片信息的丢失和变形，限制了识别精度。
2）重复计算：R-CNN虽然不再是穷举，但依然有两千个左右的候选框，这些候选框都需要进行CNN操作，计算量依然很大，其中有不少其实是重复计算；
3）SVM模型：线性模型，在标注数据不缺的时候显然不是最好的选择；
4）训练测试分为多步：区域提名、特征提取、分类、回归都是断开的训练的过程，中间数据还需要单独保存；训练的空间和时间代价很高。

本文主要介绍R-CNN的实践，不过多介绍理论知识。具体的理论知识网上有很多相关的教程，我也贴出一些参考网址放在 Reference 部分。

Reference

github 源码（1）：https://github.com/edwardbi/DeepLearningModels/tree/master/RCNN

github源码（2）本文使用的代码：http://download.csdn.net/download/houchaoqun_xmu/10138539

selectivesearch：https://github.com/AlpacaDB/selectivesearch

本文实践部分主要参考：https://www.cnblogs.com/edwardbi/p/5647522.html

我看AlexNet【简书】：http://www.jianshu.com/p/58168fec534d

【卷积神经网络-进化史】从LeNet到AlexNet：http://blog.csdn.net/cyh_24/article/details/51440344

深度学习（十八）基于R-CNN的物体检测：http://blog.csdn.net/hjimce/article/details/50187029

深度学习与计算机视觉，看这一篇就够了：https://www.leiphone.com/news/201605/zZqsZiVpcBBPqcGG.html#rd

tflearn官网：http://tflearn.org/installation/

Ubuntu 常用软件安装：http://blog.csdn.net/houchaoqun_xmu/article/details/72461592

准备工作

1）本文实践环境：python3 + tensorflow-1.2.0 + tflearn

tensorflow.__version__ = 1.2.0tflearn：先装好 tensorflow，然后直接使用 pip install tflearn 命令安装 tflearn

2）下载 github 源码：

git clone https://github.com/Houchaoqun/MachineLearning_DeepLearning.git

3）下载数据集并放在该 project 的根目录下（参考下文的代码结构）：

- 链接: https://pan.baidu.com/s/1hrSKz56

- 密码: gput

4）安装 python 提供的 SelectiveSearch 插件，输入如下命令：

pip install selectivesearch

flower 数据集

1）2-flowers：2种花朵的类别（不同时期，不同姿态，不同颜色），每个类别下有30张图片

该目录结构如下所示：

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn/2flowers$ tree -L 2.└── jpg    ├── 0    └── 1

2）17-flowers：17种花朵的类别，每个类别下有80张图片

该目录结构如下所示：

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn/17flowers$ tree -L 2.└── jpg    ├── 0    ├── 1    ├── 10    ├── 11    ├── 12    ├── 13    ├── 14    ├── 15    ├── 16    ├── 2    ├── 3    ├── 4    ├── 5    ├── 6    ├── 7    ├── 8    ├── 9    └── files.txt

tflearn-rcnn 代码结构

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn$ tree -L 2.├── 17flowers│   └── jpg├── 2flowers│   └── jpg├── fine_tune_RCNN.py├── output│   └── alexnet_oxflowers17├── preprocessing_RCNN.py├── RCNN.md├── RCNN_output.py├── refine_backup.txt├── refine_list.txt├── svm_train│   ├── 1.txt│   └── 2.txt├── testimg7.jpg├── train_alexnet.py└── train_list.txt

tflearn 构建 AlexNet 的核心代码

def create_alexnet(num_classes):    # Building 'AlexNet'    network = input_data(shape=[None, 224, 224, 3])    network = conv_2d(network, 96, 11, strides=4, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = conv_2d(network, 256, 5, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = conv_2d(network, 384, 3, activation='relu')    network = conv_2d(network, 384, 3, activation='relu')    network = conv_2d(network, 256, 3, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = fully_connected(network, 4096, activation='tanh')    network = dropout(network, 0.5)    network = fully_connected(network, 4096, activation='tanh')    network = dropout(network, 0.5)    network = fully_connected(network, num_classes, activation='softmax')    network = regression(network, optimizer='momentum',                         loss='categorical_crossentropy',                         learning_rate=0.001)    return network

实践步骤

1）做好准备工作；

2）执行如下命令，基于 flower17 数据集（标签只有类别）做预训练，得到分类器

python train_alexnet.py

epoch = 200，训练效果如下：

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn$ python train_alexnet.py 2017-11-27 20:07:56.694124: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694144: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694162: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694166: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694170: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:57.024534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero2017-11-27 20:07:57.025026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: GeForce GTX 1080 Timajor: 6 minor: 1 memoryClockRate (GHz) 1.582pciBusID 0000:01:00.0Total memory: 10.90GiBFree memory: 10.29GiB2017-11-27 20:07:57.025053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2017-11-27 20:07:57.025063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 2017-11-27 20:07:57.025081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)2017-11-27 20:08:05.490085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)loading previous parameters2017-11-27 20:08:05.569326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)2017-11-27 20:08:06.435150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)---------------------------------Run id: alexnet_oxflowers17Log directory: output/---------------------------------Training samples: 1224Validation samples: 136--Training Step: 200  | total loss: 3.37247 | time: 9.586ss| Momentum | epoch: 004 | loss: 3.37247 - acc: 0.0630 | val_loss: 2.98178 - val_acc: 0.0515 -- iter: 0160/1224--Training Step: 400  | total loss: 3.32241 | time: 17.929s| Momentum | epoch: 009 | loss: 3.32241 - acc: 0.0576 | val_loss: 2.77432 - val_acc: 0.0588 -- iter: 0320/1224--Training Step: 600  | total loss: 3.00749 | time: 26.467s| Momentum | epoch: 014 | loss: 3.00749 - acc: 0.0976 | val_loss: 2.46418 - val_acc: 0.1544 -- iter: 0480/1224-- ... ... ... Training Step: 7600  | total loss: 0.00295 | time: 66.383s| Momentum | epoch: 193 | loss: 0.00295 - acc: 0.9999 | val_loss: 0.78864 - val_acc: 0.8088 -- iter: 1088/1224--Training Step: 7800  | total loss: 0.00561 | time: 76.434s| Momentum | epoch: 198 | loss: 0.00561 - acc: 0.9998 | val_loss: 0.73804 - val_acc: 0.8088 -- iter: 1224/1224--Training Step: 7878  | total loss: 0.00451 | time: 75.050s| Momentum | epoch: 200 | loss: 0.00451 - acc: 0.9998 -- iter: 1224/1224

由上述提示信息可见，经过200轮训练后，模型的准确率已经从 0.0630 达到 0.9998。而此时还未涉及到 R-CNN 的核心部分 - 候选框的提取。本文使用的是 Selective Search 算法，直接使用 pip install selectivesearch 进行下载即可，有兴趣的读者也可以自己编写python脚本。

3）执行如下命令，基于 flower2 数据集（标签既有类别，又有位置信息 [x,y,w,h]）做 fine-tuning

python fine_tune_RCNN.py

fine_tune_Alexnet 函数如下所示：

def fine_tune_Alexnet(network, X, Y):    # Training    model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet',                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN')    if os.path.isfile('fine_tune_model_save.model'):        print("Loading the fine tuned model")        model.load('fine_tune_model_save.model')    # saver.restore(sess, './alexnet-cnn.model')    elif os.path.isfile('model_save.model.meta'):        print("Loading the alexnet")        # saver = tf.train.Saver()        try:            model.load('model_save.model')            print("successful loaded [model_save.model]...")        except Exception as e:            print(e)            pass        # saver.restore(model.session,'./model_save.model')    else:        print("No file to load, error")        return False    model.fit(X, Y, n_epoch=10, validation_set=0.1, shuffle=True,              show_metric=True, batch_size=32, snapshot_step=200,              snapshot_epoch=False, run_id='alexnet_rcnnflowers2')  # epoch = 1000    # Save the model    model.save('fine_tune_model_save.model')

fine-tuning 结束后，提示如下所示（会生成 fine_tune_model_save.model 的相关文件）：

successful loaded [model_save.model]...---------------------------------Run id: alexnet_rcnnflowers2Log directory: output_RCNN/---------------------------------Training samples: 2034Validation samples: 226--Training Step: 8000  | total loss: 0.18893 | time: 112.817s| Momentum | epoch: 002 | loss: 0.18893 - acc: 0.9366 | val_loss: 0.20319 - val_acc: 0.9336 -- iter: 1856/2034--Training Step: 8200  | total loss: 0.18021 | time: 4.906s8s| Momentum | epoch: 006 | loss: 0.18021 - acc: 0.9413 | val_loss: 0.19566 - val_acc: 0.9336 -- iter: 0064/2034--Training Step: 8400  | total loss: 0.18054 | time: 20.381ss| Momentum | epoch: 009 | loss: 0.18054 - acc: 0.9347 | val_loss: 0.16186 - val_acc: 0.9336 -- iter: 0320/2034--Training Step: 8518  | total loss: 0.11383 | time: 122.717s| Momentum | epoch: 010 | loss: 0.11383 - acc: 0.9564 -- iter: 2034/2034

由上述提示信息可知，训练10轮后，模型的准确率为0.9564。适当增加epoch的值，再进行训练可以得到更优的效果。本人准备将epoch的值设置成1000再训练一次模型，感受一下效果。

至此，模型就已经训练好了。此处有点不同的是，该网络模型不是直接使用CNN后接softmax做分类，而是换成SVM。因为SVM适用于小样本训练，这里这么做可以提高准确率。详细的解释可以参考博文：http://blog.csdn.net/hjimce/article/details/50187029

训练SVM的代码如下所示：

# Construct cascade svmsdef train_svms(train_file_folder, model):    listings = os.listdir(train_file_folder)    svms = []    for train_file in listings:        if "pkl" in train_file:        continue        X, Y = generate_single_svm_train(train_file_folder+train_file)        train_features = []        for i in X:            feats = model.predict([i])            train_features.append(feats[0])    print("feature dimension")        print(np.shape(train_features))        clf = svm.LinearSVC()        print("fit svm")        clf.fit(train_features, Y)    svms.append(clf)    return svms# Load training imagesdef generate_single_svm_train(one_class_train_file):    trainfile = one_class_train_file    savepath = one_class_train_file.replace('txt', 'pkl')    images = []    Y = []    if os.path.isfile(savepath):    print("restoring svm dataset " + savepath)        images, Y = prep.load_from_pkl(savepath)    else:    print("loading svm dataset " + savepath)        images, Y = prep.load_train_proposals(trainfile, 2, threshold=0.3, svm=True, save=True, save_path=savepath)    return images, Y

generate_single_svm_train 函数分别根据 ./svm_train 目录下的 ”1.txt“和”2.txt“生成对应的”1.pkl“和”2.pkl“，都是很大的文件（> 1GB）。

测试结果

执行如下命令，作用包括：

- 训练 SVM

- 使用”testimg7.jpg“测试模型

python RCNN_output.py

此时，你可能会遇到如下问题：

Traceback (most recent call last):  File "RCNN_output.py", line 151, in <module>    pred = i.predict(f)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 324, in predict    scores = self.decision_function(X)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 300, in decision_function    X = check_array(X, accept_sparse='csr')  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array    "if it contains a single sample.".format(array))ValueError: Expected 2D array, got 1D array instead:array=[-0.62386084  0.64800894  0.54052156 ...,  0.29537463  0.49037218  0.40998983].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

你需要在 RCNN_output.py 脚本里加一条语句 f = f.reshape(1, -1) ，如下所示：

if __name__ == '__main__':    train_file_folder = 'svm_train/'    img_path = 'testimg7.jpg'    imgs, verts = image_proposal(img_path)    net = create_alexnet(3)    model = tflearn.DNN(net)    model.load('fine_tune_model_save.model')    svms = train_svms(train_file_folder, model)    print("Done fitting svms")    features = model.predict(imgs)  #     print("predict image:")    print(np.shape(features))   # (107, 4096)    results = []    results_label = []    count = 0    for f in features:        f = f.reshape(1, -1)    # add by hcq 20171128        for i in svms:            pred = i.predict(f)            print(pred)            if pred[0] != 0:                results.append(verts[count])                results_label.append(pred[0])        count += 1    print("result:")    print(results)    print("result label:")    print(results_label)    img = skimage.io.imread(img_path)    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))    ax.imshow(img)    for x, y, w, h in results:        rect = mpatches.Rectangle(            (x, y), w, h, fill=False, edgecolor='red', linewidth=1)        ax.add_patch(rect)    plt.show()

调试成功后，效果如下所示（此处使用的是经过200轮fine-tuning后的模型）：

执行如下命令，加入非极大值抑制（nms）后的效果如下所示：

python RCNN_output_nms.py

实践效果解释（后续补上）

实践过程中遇到的问题：

1）使用python2（建议使用python3）：

restoring svm dataset svm_train/2.pklTraceback (most recent call last):  File "RCNN_output.py", line 139, in <module>    svms = train_svms(train_file_folder, model)  File "RCNN_output.py", line 119, in train_svms    X, Y = generate_single_svm_train(train_file_folder + train_file)  File "RCNN_output.py", line 80, in generate_single_svm_train    images, Y = prep.load_from_pkl(savepath)  File "/home/hcq/document/deepLearning/github/rcnn-tflearn/preprocessing_RCNN.py", line 133, in load_from_pkl    X, Y = pickle.load(open(dataset_file, 'rb'))  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 1384, in load    return Unpickler(file).load()  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 864, in load    dispatch[key](self)  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 892, in load_proto    raise ValueError, "unsupported pickle protocol: %d" % protoValueError: unsupported pickle protocol: 3

解决方案：

http://blog.csdn.net/u013828589/article/details/72848192
https://stackoverflow.com/questions/25843698/valueerror-unsupported-pickle-protocol-3-python2-pickle-can-not-load-the-file

2）sklearn 工具包版本造成的问题：ValueError: Expected 2D array, got 1D array instead

Traceback (most recent call last):  File "RCNN_output.py", line 149, in <module>    pred = i.predict(f)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 324, in predict    scores = self.decision_function(X)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 300, in decision_function    X = check_array(X, accept_sparse='csr')  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array    "if it contains a single sample.".format(array))ValueError: Expected 2D array, got 1D array instead:array=[-0.62386084  0.64800894  0.54052156 ...,  0.29537463  0.49037218  0.40998983].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

解决方案：

http://blog.csdn.net/llx1026/article/details/77940880

阅读全文

0 0