DL开源框架Caffe | 目标检测Faster-rcnn问题全解析

来源：互联网发布：java gb2312转utf8 编辑：程序博客网时间：2024/05/16 02:47

一工程目录

在github上clone下来的代码，可以看到根目录下有以下几个文件夹，其中output为训练完之后才会有的文件夹。

caffe-fast-rcnn ，这里是caffe框架目录；
data，用来存放pretrained模型，比如imagenet上的，以及读取文件的cache缓存；
experiments，存放配置文件以及运行的log文件，另外这个目录下有scripts可以用end2end或者alt_opt两种方式训练；
lib，用来存放一些python接口文件，如其下的datasets主要负责数据库读取，config负责cnn一些训练的配置选项；
models，里面存放了三个模型文件，小型网络的ZF，大型网络VGG16，中型网络VGG_CNN_M_1024。推荐使用VGG16，如果使用端到端的approximate joint training方法，开启CuDNN，只需要3G的显存即；
output，这里存放的是训练完成后的输出目录，默认会在faster_rcnn_end2end文件夹下；
tools，里面存放的是训练和测试的Python文件。

二训练方式

Alternative training(alt-opt)
Approximate joint training(end-to-end)
　　推荐使用第二种，因为第二种使用的显存更小，而且训练会更快，同时准确率差不多，两种方式需要修改的代码是不一样的，同时faster rcnn提供了三种训练模型，小型的ZFmodel，中型的VGG_CNN_M_1024和大型的VGG16,论文中说VGG16效果比其他两个好，但是同时占用更大的GPU显存(~11GB)

三训练代码

cd py-faster-rcnn./experiments/scripts/faster_rcnn_alt_opt.sh 0 VGG16 pascal_voc# 第一块GPU(0) 模型是VGG16 数据集时pascal_voc cd $FRCN_ROOT./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] [--set ...]

python ./tools/train_net.py --gpu 1 --solver models/pascal_voc/VGG_CNN_M_1024/faster_rcnn_end2end/solver.prototxt --weights data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel --imdb voc_2012_trainval --iters 70000 --cfg experiments/cfgs/faster_rcnn_end2end.yml

问题1：如何在同一张图像中画出不同种类对应颜色的目标框？

修改demo.py中的代码，代码如下：

`# Visualize detections for each classCONF_THRESH = 0.7NMS_THRESH = 0.3for cls_ind, cls in enumerate(CLASSES[1:]):cls_ind += 1 # because we skipped backgroundcls_boxes = boxes[:, 4_cls_ind:4_(cls_ind + 1)]cls_scores = scores[:, cls_ind]dets = np.hstack((cls_boxes,cls_scores[:, np.newaxis])).astype(np.float32)keep = nms(dets, NMS_THRESH)dets = dets[keep, :]    #draw    #vis_detections(im, cls, dets, thresh=CONF_THRESH)    font = cv2.FONT_HERSHEY_SIMPLEX    color = (0,0,0)    if cls_ind == 1: #motorbike        color = (0, 0, 255)    elif cls_ind == 2: #car        color = (0, 255, 0)    elif cls_ind == 3: #bus        color = (255, 0, 0)    else: #truck        color = (255, 255, 255)    inds = np.where(dets[:, -1] >= CONF_THRESH)[0]    if len(inds) > 0:        for i in inds:            bbox = dets[i, :4]            score = dets[i, -1]            cv2.rectangle(im,(bbox[0], bbox[1]), (bbox[2], bbox[3]), color, 2)            cv2.putText(im,'{:s} {:.3f}'.format(cls, score),(bbox[0], (int)((bbox[1]- 2))), font, 0.5, (0,255,0), 1)# Display the resulting framecv2.imshow('{:s}'.format(image_name),im)`

四场景应用

问题1：如果想检测小的物体，应该怎么办？
解答：改变anchor_target_layer 和proposal_layer层的参数，[链接在此]

scales: decrease these values to account for smaller boxesratios: adjust them depending on the shape of your grount-truth boxesfeat_stride : supposedly this can be modified to improve accuracy of the generated anchors

问题2：如何实时的进行视频的检测？（#578）
解答：需要修改原代码demo.py，代码如下

while True:    demo_video(net,cv2.VideoCapture(videoFilePath))def demo_video(net, videoFile):    global frameRate    # Load the demo image    ret, im = videoFile.read()    # Detect all object classes and regress object bounds    timer = Timer()    timer.tic()    scores, boxes = im_detect(net, im)    timer.toc()    print ('Detection took {:.3f}s for '        '{:d} object proposals').format(timer.total_time, boxes.shape[0])    frameRate = 1.0/timer.total_time    print "fps: " + str(frameRate)    # Visualize detections for each class    CONF_THRESH = 0.65    NMS_THRESH = 0.2    for cls_ind, cls in enumerate(CLASSES[1:]):        cls_ind += 1 # because we skipped background        cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]        cls_scores = scores[:, cls_ind]        dets = np.hstack((cls_boxes,                          cls_scores[:, np.newaxis])).astype(np.float32)        keep = nms(dets, NMS_THRESH)        dets = dets[keep, :]        im=vis_detections_video(im, cls, dets, thresh=CONF_THRESH)    cv2.putText(im,'{:s} {:.2f}'.format("FPS:", frameRate(1750,50),cv2.FONT_HERSHEY_SIMPLEX,1,(0,0,255))    cv2.imshow(videoFilePath.split('/')[len(videoFilePath.split('/'))-1],im)    cv2.waitKey(20)

问题3：如何针对小的目标检测？（#443）

针对一个大图像中的小目标进行检测，需要修改anchor的参数，具体的文件：generate_anchors.py
from this：
def generate_anchors(base_size=16, ratios=[0.5, 1, 2], scales=2**np.arange(3, 6)):
To this:
def generate_anchors(base_size=16, ratios=[0.3, 0.75, 1], scales=2**np.arange(3, 6)):

参考链接： [链接1] ，[链接2]

五训练问题

问题1：训练完成的模型，但是使用原图却检测不到任何结果？

原因：很有可能标注的时候的label超出了图像的边界。推荐两个验证标注的方式：[check the boxes] 和最新版本的LabelImg。

问题2：如何去训练一个RPN模型（#364）

首先需要知道alt_opt是如何工作的：

Train RPN
Write down the RPN
Train Fast-RCNN using the generated RPNs
Repeat 1-3 again for optimising weights for RPN & Fast-RCNN

然后，只需做1-2步即可生成proposals. 可视化这些proposals可以将
lib/rpn/generate.py中的visualisation置为1。

问题3：faster-rcnn如何使用多GPU进行训练

首先答案是否定的，python不支持多GPU训练。但也有相关的解决方案：
1. https://github.com/315386775/py-R-FCN-multiGPU 这个分支支持多GPU
2. mxnet可以支持多GPU训练

0526更新
问题4：训练时出现bbox_loss为0的问题　　　　　　　　
　　　　　　　　　这里写图片描述

问题对应的链接如下：[loss为0的问题]

六训练日志

在$FRCNN_ROOT的experiments/script中有脚本可以查看：faster_rcnn_end2end.sh

LOG="experiments/logs/faster_rcnn_end2end_${NET}_${EXTRA_ARGS_SLUG}.txt.`date +'%Y-%m-%d_%H-%M-%S'`"

阅读全文

1 0

DL开源框架Caffe | 目标检测Faster-rcnn问题全解析

一 工程目录

二 训练方式

三 训练代码

四 场景应用