【object detection】RCNN 实践篇

来源:互联网 发布:mac创建电脑账户怎么弄 编辑:程序博客网 时间:2024/06/03 22:04

前言

最近根据制定的“Deep Learning”学习计划,11月份的主要任务是:熟悉各大DL网络模型,主要以分类和检测为主;看论文;熟悉病理数据等。我们有一个2人组的小分队,我这个月的主要工作集中在学习目标检测的经典算法以及基于tensorflow或者keras跑一些经典的案例,主要有R-CNN,SPP-Net,Fast-RCNN,Faster-RCNN,YOLO等;另一名成员主要学习分类相关的经典网络模型,主要是google-net一系列的模型(inception-v1,inception-v2,inception-v3,resnet 等)。我们分别要整理出一份关于检测分类的详细报告,然后不断完善、互相交流讨论、分享,发挥小分队的优势。


本文主要结合实践对R-CNN 进行整理。其中 R 表示的是候选区(Region Proposal), CNN 部分使用经典的 AlexNet 网络模型,先来看个关于 AlexNet 的图示感受一下。


2012 年,Hinton 的学生 Alex Krizhevsky 提出了深度卷积神经网络模型AlexNet,它可以算是 LeNet 的一种更深更宽的版本,几个关键点整理如下:

1)首次在CNN 中成功应用了 ReLU(非线性激活函数)、Dropout (防止过拟合)和 LRN 等Trick。
2)使用了GPU 进行运算加速,作者开源了他们在 GPU 上训练卷积神经网络 CUDA 代码。
3)包含了6 亿 3000 万个连接,6000 万个参数和65 万个神经元,拥有 5 个卷积层,其中 3 个卷积层后面连接了最大池化层(MaxPooling Layer),最后还有 3 个全连接层。
4)AlexNet 以显著的优势赢得了竞争激烈的ILSVRC 2012 比赛,top-5 的错误率降低至了16.4%,相比第二名的成绩 26.2%错误率有了巨大的提升。
5)AlexNet 可以说是神经网络在低谷期后的第一次发声,确立了深度学习(深度卷积网络)在计算机视觉的统治地位,同时也推动了深度学习在语音识别、自然语言处理、强化学习等领域的拓展。


本文不过多的介绍理论知识,待后续有更深刻的理解,再针对性的整理相关的理论篇。



AlexNet 模型结构


再来看看R-CNN的图示:




RossB.Girshick(RBG)大神使用 Region Proposal + CNN 代替传统目标检测使用的滑动窗口 + 手工设计特征,设计了R-CNN框架。与 R-CNN 相关的几个 关键词 如下所示:

1)Region proposals

2)Selective Search:选择性搜索,找出可能含有物体的框,这些框之间是可以互相重叠互相包含,这样就可以避免暴力枚举出所有框。

3)Warp and Crop:修正区域大小,以适合CNN的输入(AlexNet 的 input layer 的图像大小为 224 x 224 x3,3表示 RGB 3个颜色通道)。

4)Supervisedpre-training:有监督预训练也称迁移学习

5)IOU:交并比IOU=(A∩B)/(A∪B)

6)NMS:非极大值抑制(参考:http://blog.csdn.net/H2008066215019910120/article/details/25917609)

7)DPM:使用判别训练的部件模型进行目标检测


关于选择性搜索算法,如下图所示,有这么多相关的算法,“Selective Search”只是其中一种。



R-CNN的缺点:


1)要求输入固定大小的图片,需要对原始图片进行crop(裁剪)或者wrap(缩放),在一定程度上导致图片信息的丢失和变形,限制了识别精度。
2)重复计算:R-CNN虽然不再是穷举,但依然有两千个左右的候选框,这些候选框都需要进行CNN操作,计算量依然很大,其中有不少其实是重复计算;
3)SVM模型:线性模型,在标注数据不缺的时候显然不是最好的选择;
4)训练测试分为多步:区域提名、特征提取、分类、回归都是断开的训练的过程,中间数据还需要单独保存;训练的空间和时间代价很高。


本文主要介绍R-CNN的实践,不过多介绍理论知识。具体的理论知识网上有很多相关的教程,我也贴出一些参考网址放在 Reference 部分。




Reference


github 源码(1):https://github.com/edwardbi/DeepLearningModels/tree/master/RCNN

github源码(2)本文使用的代码:http://download.csdn.net/download/houchaoqun_xmu/10138539

selectivesearch:https://github.com/AlpacaDB/selectivesearch

本文实践部分主要参考:https://www.cnblogs.com/edwardbi/p/5647522.html

我看AlexNet【简书】:http://www.jianshu.com/p/58168fec534d

【卷积神经网络-进化史】从LeNet到AlexNet:http://blog.csdn.net/cyh_24/article/details/51440344

深度学习(十八)基于R-CNN的物体检测:http://blog.csdn.net/hjimce/article/details/50187029

深度学习与计算机视觉,看这一篇就够了:https://www.leiphone.com/news/201605/zZqsZiVpcBBPqcGG.html#rd

tflearn官网:http://tflearn.org/installation/

Ubuntu 常用软件安装:http://blog.csdn.net/houchaoqun_xmu/article/details/72461592




准备工作


1)本文实践环境:python3 + tensorflow-1.2.0 + tflearn

tensorflow.__version__ = 1.2.0tflearn:先装好 tensorflow,然后直接使用 pip install tflearn 命令安装 tflearn


2)下载 github 源码:

git clone https://github.com/Houchaoqun/MachineLearning_DeepLearning.git

3)下载数据集并放在该 project 的根目录下(参考下文的代码结构):

- 链接: https://pan.baidu.com/s/1hrSKz56 

- 密码: gput


4)安装 python 提供的 SelectiveSearch 插件,输入如下命令:

pip install selectivesearch




flower 数据集

1)2-flowers:2种花朵的类别(不同时期,不同姿态,不同颜色),每个类别下有30张图片



该目录结构如下所示:

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn/2flowers$ tree -L 2.└── jpg    ├── 0    └── 1


2)17-flowers:17种花朵的类别,每个类别下有80张图片




该目录结构如下所示:

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn/17flowers$ tree -L 2.└── jpg    ├── 0    ├── 1    ├── 10    ├── 11    ├── 12    ├── 13    ├── 14    ├── 15    ├── 16    ├── 2    ├── 3    ├── 4    ├── 5    ├── 6    ├── 7    ├── 8    ├── 9    └── files.txt




tflearn-rcnn 代码结构

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn$ tree -L 2.├── 17flowers│   └── jpg├── 2flowers│   └── jpg├── fine_tune_RCNN.py├── output│   └── alexnet_oxflowers17├── preprocessing_RCNN.py├── RCNN.md├── RCNN_output.py├── refine_backup.txt├── refine_list.txt├── svm_train│   ├── 1.txt│   └── 2.txt├── testimg7.jpg├── train_alexnet.py└── train_list.txt


tflearn 构建 AlexNet 的核心代码


def create_alexnet(num_classes):    # Building 'AlexNet'    network = input_data(shape=[None, 224, 224, 3])    network = conv_2d(network, 96, 11, strides=4, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = conv_2d(network, 256, 5, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = conv_2d(network, 384, 3, activation='relu')    network = conv_2d(network, 384, 3, activation='relu')    network = conv_2d(network, 256, 3, activation='relu')    network = max_pool_2d(network, 3, strides=2)    network = local_response_normalization(network)    network = fully_connected(network, 4096, activation='tanh')    network = dropout(network, 0.5)    network = fully_connected(network, 4096, activation='tanh')    network = dropout(network, 0.5)    network = fully_connected(network, num_classes, activation='softmax')    network = regression(network, optimizer='momentum',                         loss='categorical_crossentropy',                         learning_rate=0.001)    return network



实践步骤

1)做好准备工作;


2)执行如下命令,基于 flower17 数据集(标签只有类别)做预训练,得到分类器

python train_alexnet.py

epoch = 200,训练效果如下:

hcq@hcq-home:~/document/deepLearning/github/rcnn-tflearn$ python train_alexnet.py 2017-11-27 20:07:56.694124: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694144: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694162: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694166: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:56.694170: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.2017-11-27 20:07:57.024534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero2017-11-27 20:07:57.025026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: GeForce GTX 1080 Timajor: 6 minor: 1 memoryClockRate (GHz) 1.582pciBusID 0000:01:00.0Total memory: 10.90GiBFree memory: 10.29GiB2017-11-27 20:07:57.025053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 2017-11-27 20:07:57.025063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 2017-11-27 20:07:57.025081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)2017-11-27 20:08:05.490085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)loading previous parameters2017-11-27 20:08:05.569326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)2017-11-27 20:08:06.435150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)---------------------------------Run id: alexnet_oxflowers17Log directory: output/---------------------------------Training samples: 1224Validation samples: 136--Training Step: 200  | total loss: 3.37247 | time: 9.586ss| Momentum | epoch: 004 | loss: 3.37247 - acc: 0.0630 | val_loss: 2.98178 - val_acc: 0.0515 -- iter: 0160/1224--Training Step: 400  | total loss: 3.32241 | time: 17.929s| Momentum | epoch: 009 | loss: 3.32241 - acc: 0.0576 | val_loss: 2.77432 - val_acc: 0.0588 -- iter: 0320/1224--Training Step: 600  | total loss: 3.00749 | time: 26.467s| Momentum | epoch: 014 | loss: 3.00749 - acc: 0.0976 | val_loss: 2.46418 - val_acc: 0.1544 -- iter: 0480/1224-- ... ... ... Training Step: 7600  | total loss: 0.00295 | time: 66.383s| Momentum | epoch: 193 | loss: 0.00295 - acc: 0.9999 | val_loss: 0.78864 - val_acc: 0.8088 -- iter: 1088/1224--Training Step: 7800  | total loss: 0.00561 | time: 76.434s| Momentum | epoch: 198 | loss: 0.00561 - acc: 0.9998 | val_loss: 0.73804 - val_acc: 0.8088 -- iter: 1224/1224--Training Step: 7878  | total loss: 0.00451 | time: 75.050s| Momentum | epoch: 200 | loss: 0.00451 - acc: 0.9998 -- iter: 1224/1224

由上述提示信息可见,经过200轮训练后,模型的准确率已经从 0.0630 达到 0.9998。而此时还未涉及到 R-CNN 的核心部分 - 候选框的提取。本文使用的是 Selective Search 算法,直接使用 pip install selectivesearch 进行下载即可,有兴趣的读者也可以自己编写python脚本。


3)执行如下命令,基于 flower2 数据集(标签既有类别,又有位置信息 [x,y,w,h])做 fine-tuning

python fine_tune_RCNN.py

fine_tune_Alexnet 函数如下所示:

def fine_tune_Alexnet(network, X, Y):    # Training    model = tflearn.DNN(network, checkpoint_path='rcnn_model_alexnet',                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='output_RCNN')    if os.path.isfile('fine_tune_model_save.model'):        print("Loading the fine tuned model")        model.load('fine_tune_model_save.model')    # saver.restore(sess, './alexnet-cnn.model')    elif os.path.isfile('model_save.model.meta'):        print("Loading the alexnet")        # saver = tf.train.Saver()        try:            model.load('model_save.model')            print("successful loaded [model_save.model]...")        except Exception as e:            print(e)            pass        # saver.restore(model.session,'./model_save.model')    else:        print("No file to load, error")        return False    model.fit(X, Y, n_epoch=10, validation_set=0.1, shuffle=True,              show_metric=True, batch_size=32, snapshot_step=200,              snapshot_epoch=False, run_id='alexnet_rcnnflowers2')  # epoch = 1000    # Save the model    model.save('fine_tune_model_save.model')

fine-tuning 结束后,提示如下所示(会生成 fine_tune_model_save.model 的相关文件):

successful loaded [model_save.model]...---------------------------------Run id: alexnet_rcnnflowers2Log directory: output_RCNN/---------------------------------Training samples: 2034Validation samples: 226--Training Step: 8000  | total loss: 0.18893 | time: 112.817s| Momentum | epoch: 002 | loss: 0.18893 - acc: 0.9366 | val_loss: 0.20319 - val_acc: 0.9336 -- iter: 1856/2034--Training Step: 8200  | total loss: 0.18021 | time: 4.906s8s| Momentum | epoch: 006 | loss: 0.18021 - acc: 0.9413 | val_loss: 0.19566 - val_acc: 0.9336 -- iter: 0064/2034--Training Step: 8400  | total loss: 0.18054 | time: 20.381ss| Momentum | epoch: 009 | loss: 0.18054 - acc: 0.9347 | val_loss: 0.16186 - val_acc: 0.9336 -- iter: 0320/2034--Training Step: 8518  | total loss: 0.11383 | time: 122.717s| Momentum | epoch: 010 | loss: 0.11383 - acc: 0.9564 -- iter: 2034/2034

由上述提示信息可知,训练10轮后,模型的准确率为0.9564。适当增加epoch的值,再进行训练可以得到更优的效果。本人准备将epoch的值设置成1000再训练一次模型,感受一下效果。


至此,模型就已经训练好了。此处有点不同的是,该网络模型不是直接使用CNN后接softmax做分类,而是换成SVM。因为SVM适用于小样本训练,这里这么做可以提高准确率。详细的解释可以参考博文:http://blog.csdn.net/hjimce/article/details/50187029


训练SVM的代码如下所示:

# Construct cascade svmsdef train_svms(train_file_folder, model):    listings = os.listdir(train_file_folder)    svms = []    for train_file in listings:        if "pkl" in train_file:        continue        X, Y = generate_single_svm_train(train_file_folder+train_file)        train_features = []        for i in X:            feats = model.predict([i])            train_features.append(feats[0])    print("feature dimension")        print(np.shape(train_features))        clf = svm.LinearSVC()        print("fit svm")        clf.fit(train_features, Y)    svms.append(clf)    return svms# Load training imagesdef generate_single_svm_train(one_class_train_file):    trainfile = one_class_train_file    savepath = one_class_train_file.replace('txt', 'pkl')    images = []    Y = []    if os.path.isfile(savepath):    print("restoring svm dataset " + savepath)        images, Y = prep.load_from_pkl(savepath)    else:    print("loading svm dataset " + savepath)        images, Y = prep.load_train_proposals(trainfile, 2, threshold=0.3, svm=True, save=True, save_path=savepath)    return images, Y

generate_single_svm_train 函数分别根据 ./svm_train 目录下的 ”1.txt“和”2.txt“生成对应的”1.pkl“和”2.pkl“,都是很大的文件(> 1GB)。


测试结果

执行如下命令,作用包括:

- 训练 SVM

- 使用”testimg7.jpg“测试模型

python RCNN_output.py

此时,你可能会遇到如下问题:

Traceback (most recent call last):  File "RCNN_output.py", line 151, in <module>    pred = i.predict(f)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 324, in predict    scores = self.decision_function(X)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 300, in decision_function    X = check_array(X, accept_sparse='csr')  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array    "if it contains a single sample.".format(array))ValueError: Expected 2D array, got 1D array instead:array=[-0.62386084  0.64800894  0.54052156 ...,  0.29537463  0.49037218  0.40998983].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

你需要在 RCNN_output.py 脚本里加一条语句 f = f.reshape(1, -1) ,如下所示:

if __name__ == '__main__':    train_file_folder = 'svm_train/'    img_path = 'testimg7.jpg'    imgs, verts = image_proposal(img_path)    net = create_alexnet(3)    model = tflearn.DNN(net)    model.load('fine_tune_model_save.model')    svms = train_svms(train_file_folder, model)    print("Done fitting svms")    features = model.predict(imgs)  #     print("predict image:")    print(np.shape(features))   # (107, 4096)    results = []    results_label = []    count = 0    for f in features:        f = f.reshape(1, -1)    # add by hcq 20171128        for i in svms:            pred = i.predict(f)            print(pred)            if pred[0] != 0:                results.append(verts[count])                results_label.append(pred[0])        count += 1    print("result:")    print(results)    print("result label:")    print(results_label)    img = skimage.io.imread(img_path)    fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6))    ax.imshow(img)    for x, y, w, h in results:        rect = mpatches.Rectangle(            (x, y), w, h, fill=False, edgecolor='red', linewidth=1)        ax.add_patch(rect)    plt.show()


调试成功后,效果如下所示(此处使用的是经过200轮fine-tuning后的模型):




执行如下命令,加入非极大值抑制(nms)后的效果如下所示:

python RCNN_output_nms.py 




实践效果解释(后续补上)



实践过程中遇到的问题:


1)使用python2(建议使用python3):

restoring svm dataset svm_train/2.pklTraceback (most recent call last):  File "RCNN_output.py", line 139, in <module>    svms = train_svms(train_file_folder, model)  File "RCNN_output.py", line 119, in train_svms    X, Y = generate_single_svm_train(train_file_folder + train_file)  File "RCNN_output.py", line 80, in generate_single_svm_train    images, Y = prep.load_from_pkl(savepath)  File "/home/hcq/document/deepLearning/github/rcnn-tflearn/preprocessing_RCNN.py", line 133, in load_from_pkl    X, Y = pickle.load(open(dataset_file, 'rb'))  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 1384, in load    return Unpickler(file).load()  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 864, in load    dispatch[key](self)  File "/home/hcq/anaconda2/lib/python2.7/pickle.py", line 892, in load_proto    raise ValueError, "unsupported pickle protocol: %d" % protoValueError: unsupported pickle protocol: 3

解决方案:


http://blog.csdn.net/u013828589/article/details/72848192
https://stackoverflow.com/questions/25843698/valueerror-unsupported-pickle-protocol-3-python2-pickle-can-not-load-the-file

2)sklearn 工具包版本造成的问题:ValueError: Expected 2D array, got 1D array instead

Traceback (most recent call last):  File "RCNN_output.py", line 149, in <module>    pred = i.predict(f)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 324, in predict    scores = self.decision_function(X)  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 300, in decision_function    X = check_array(X, accept_sparse='csr')  File "/home/hcq/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array    "if it contains a single sample.".format(array))ValueError: Expected 2D array, got 1D array instead:array=[-0.62386084  0.64800894  0.54052156 ...,  0.29537463  0.49037218  0.40998983].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

解决方案:

http://blog.csdn.net/llx1026/article/details/77940880


原创粉丝点击