DSOD: Learning Deeply Supervised Object Detectors from Scratch

来源:互联网 发布:wp10记录仪软件 编辑:程序博客网 时间:2024/06/16 02:44

Key Problems

  • Limited structure design space.
  • Learning bias
    • As both the loss functions and the category distributions between classification and detection tasks are different, we argue that this will lead to different searching/optimization spaces. Therefore, learning may be biased towards a local minimum which is not the best for detection task.
  • Domain mismatch
  • State-of-the-art object objectors rely heavily on the offthe-shelf networks pre-trained on large-scale classification datasets like ImageNet
  • transferring pre-trained models from classification to detection between discrepant domains is even more difficult

Architecture

这里写图片描述

这里写图片描述

Principles

  • training detection network from scratch requires a proposal-free framework.
  • Deep Supervision
    • Transition w/o Pooling Layer. We introduce this layer in order to increase the number of dense blocks without reducing the final feature map resolution.
  • Stem Block
    • stem block can reduce the information loss from raw input images.
  • Dense Prediction Structure
      *

Contributions

  • DSOD is a simple yet efficient framework which could learn object detectors from scratch
  • DSOD is fairly flexible, so that we can tailor various network structures for different computing platforms such as server, desktop, mobile and even embedded devices.
  • We present DSOD, to the best of our knowledge, world first framework that can train object detection networks from scratch with state-of-the-art performance.
  • We introduce and validate a set of principles to design efficient object detection networks from scratch through step-by-step ablation studies.
  • We show that our DSOD can achieve state-of-the-art performance on three standard benchmarks (PASCAL VOC 2007, 2012 and MS COCO datasets) with realtime processing speed and more compact models.

Experiments

这里写图片描述

这里写图片描述

这里写图片描述

这里写图片描述

这里写图片描述

Others

  • a well-designed network structure can outperform state-ofthe-art solutions without using the pre-trained models
  • only the proposal-free method (the 3rd category) can converge successfully without the pre-trained models.
    • RoI pooling generates features for each region proposals, which hinders the gradients being smoothly back-propagated from region-level to convolutional feature maps.
    • The proposal-based methods work well with pretrained network models because the parameter initialization
      is good for those layers before RoI pooling, while this is not
      true for training from scratch
阅读全文
0 0