目标检测RON网络安装运行记录

来源:互联网 发布:淘宝卖家如何提前收款 编辑:程序博客网 时间:2024/05/16 08:07

论文地址
代码地址

RON:Reverse Connection with Objectness Prior Networks for Object Detection,一个有效、高效的通用对象检测框架。巧妙地结合基于区域(region-based,例如 Faster R-CNN)和不基于区域(region-free,例如 SSD)这两种方法的优点。在全卷积架构下,RON 主要关注两个基本问题:(a)多尺度对象定位和(b)负样本挖掘。为了解决(a),我们设计了反向连接,使网络能够检测多层 CNN 中的对象。为了处理(b),作者提出了 objectness prior,显著减少对象搜索空间。通过多任务损失函数联合优化了反向连接、objectness prior 和对象检测,因此RON 可以直接预测各种特征图所有位置的最终检测结果。
这里写图片描述

1.RON安装

1.下载RON repository

git clone https://github.com/taokong/RON.git

2.安装caffe
先进入RON目录

cd ./RONgit clone https://github.com/taokong/caffe-ron.gitcd caffe-ron#注:下载的源代码用了cudnn加速,但是我在服务器上编译时cudnn通不过,因此这儿把Makefile.config中 USE_CUDNN := 1改为#USE_CUDNN := 1 ,然后在编译make -j8 && make pycaffe

若出现opencv的报错,则 注释掉Makefile.config里#OPENCV_VERSION := 3,打开 USE_OPENCV := 1,再次编译

3.建立Cython模块

cd ./RON/libmake

2.摄像头测试demo_video.py

1.下载caffemodel
百度云下载 RON320_VOC0712_VOC07.caffemodel放在\RON\data\RON_models中

2.修改/pascalvoc/VGG16/test320cudnn.prototxt
注:此处因为编译没加cudnn,因此把prototxt中engine: CUDNN全去掉(空格替换最简洁)

3.切换到RON目录,运行脚本demo_camera.sh,得到检测结果
修改tools/demo_video.py计算时间,TITIAN上平均16帧

import _init_pathsfrom fast_rcnn.config import cfgfrom fast_rcnn.test import im_detectfrom fast_rcnn.nms_wrapper import nmsfrom utils.timer import Timerimport matplotlib.pyplot as pltimport numpy as npimport scipy.io as sioimport caffe, os, sys, cv2import argparsefrom fast_rcnn.bbox_transform import clip_boxes, filter_boxesimport time  CLASSES = ('__background__', # always index 0                         'aeroplane', 'bicycle', 'bird', 'boat',                         'bottle', 'bus', 'car', 'cat', 'chair',                         'cow', 'diningtable', 'dog', 'horse',                         'motorbike', 'person', 'pottedplant',                         'sheep', 'sofa', 'train', 'tvmonitor');def detect(net, im):    # Detect all object classes and regress object bounds    ims = []    ims.append(im)    scores, boxes = im_detect(net, ims)    scores = scores[0]    boxes = boxes[0]    # filter boxes according to prob scores    keeps = np.where(scores[:,0] > cfg.TEST.PROB)[0]    scores = scores[keeps, :]    boxes = boxes[keeps, :]    # change boxes according to input size and the original image size    im_shape = np.array(im.shape[0:2])    im_scales = float(cfg.TEST.SCALES[0]) / im_shape    boxes[:, 0::2] =  boxes[:, 0::2] / im_scales[1]    boxes[:, 1::2] =  boxes[:, 1::2] / im_scales[0]    # filter boxes with small sizes    boxes = clip_boxes(boxes, im_shape)    keeps = filter_boxes(boxes, cfg.TEST.RON_MIN_SIZE )    scores = scores[keeps,:]    boxes = boxes[keeps, :]    scores = np.tile(scores[:, 0], (len(CLASSES), 1)).transpose() * scores    return scores, boxesdef parse_args():    """Parse input arguments."""    parser = argparse.ArgumentParser(description='Train a grasp network')    parser.add_argument('--video', dest='video', help='video used to test',                        default='', type=str)    parser.add_argument('--model', dest='model', help='model used to test',                        default='', type=str)    parser.add_argument('--weights', dest='weights', help='weights used to test',                        default='', type=str)    args = parser.parse_args()    return argsif __name__ == '__main__':    args = parse_args()    _t = {'im_detect' : Timer()}    prototxt = os.path.join(cfg.ROOT_DIR, args.model)    caffemodel = os.path.join(cfg.ROOT_DIR, args.weights)    cfg.MINANCHOR = 24    caffe.set_mode_gpu()    caffe.set_device(0)    net = caffe.Net(prototxt, caffemodel, caffe.TEST)    print '\n\nLoaded network {:s}'.format(caffemodel)    top_N = 30    max_score = 0.6    # videoCam = cv2.VideoCapture(os.path.join(cfg.ROOT_DIR, args.video))    videoCam = cv2.VideoCapture(0)    while cv2.waitKey(1) != 0x20:    start = time.time()        success, frame = videoCam.read()        if not success:            print "Error getting frame"            break        _t['im_detect'].tic()        scores, boxes = detect(net, frame)        _t['im_detect'].toc()        for j in xrange(1, len(CLASSES)):            inds = np.where(scores[:, j]> max_score)[0]            cls_boxes = boxes[inds, :]            cls_scores = scores[inds, j]            cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \                .astype(np.float32, copy=False)            keep = nms(cls_dets, cfg.TEST.NMS)            cls_dets = cls_dets[keep, :]            if len(keep) > top_N:                keep = keep[:top_N]            for k in xrange(len(keep)):                bbox = cls_dets[k, 0:4]                cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (255,0,0), 3)                cv2.putText(frame, CLASSES[j], (bbox[0], bbox[1]), fontFace = cv2.FONT_HERSHEY_SIMPLEX, fontScale = 1, color = (0, 0, 255), thickness = 2)        end = time.time()        #print 'time:',(end - start)        print 'frame:',int(1/(end - start))        cv2.imshow("RON detection results", frame)    videoCam.release()    cv2.destroyAllWindows()

3.PASCAL VOC数据集 test和train

原创粉丝点击