Scene Parsing

来源：互联网发布：淘宝直播要钱吗编辑：程序博客网时间：2024/05/22 03:39

Scene Parsing

Problem

segment and parse an image into different image regions associated with semantic categories

Evaluation

mean of the pixel-wise accuracy
the ratio of pixels which are correctly predicted.
class-wise IoU
the Intersection of Union of pixels averaged over all the semantic categories.

Dataset

Stanford Background
S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1–8, Sept 2009.
SIFT Flow
C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(12):2368–2382, Dec 2011.
PASCAL-Context
Mottaghi, Roozbeh, et al. “The role of context for object detection and semantic segmentation in the wild.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
ADE20K
Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. arXiv:1608.05442

Dataset Stanford Background SIFT Flow ADE20K No. of images 715 2688 25562 No. of train set 572 2488 20210 No. of val set 0 0 2000 No. of test set 143 200 3352 No. of classes 8 33 150

Samples of ADE20K

http://sceneparsing.csail.mit.edu/browse.php/?dirname=training/

Result

Stanford Background

Method Pixel Acc. Class Acc. averaged computing time per image Single-scale ConvNet 66 56.5 0.35 (GPU) Augmented CNNs 71.97 66.16 - Superparsing 77.5 - 10 to 300 Deep 2D LSTM (window 5x5) 77.73 68.26 1.3 (CPU) Deep 2D LSTM (window 3x3) 78.56 68.79 3.7 (CPU) Multi-scale ConvNet 78.8 72.4 0.6 (CPU) RCNN2 (3 instances) 80.2 69.9 10.7 (GPU) N-ReNet 80.4 71.8 0.07 (GPU) Multi-CNN + rCPN Fast 80.9 78.8 0.37 (GPU) multiscale net + CRF on gPb 81.4 76.0 60.5 (CPU) Zoom-out 82.1 77.3 - HGDN 82.41 72.98 0.02 (GPU) RCNN_NIPS2015 83.1 74.8 0.03 (GPU)

SIFT Flow

Method Pixel Acc. Class Acc. mean IU f.w. IU averaged computing time per image Augmented CNNs 49.39 44.54 - - - Deep 2D LSTM (window 5x5) 68.74 22.59 - - 1.2 (CPU) Deep 2D LSTM (window 3x3) 70.11 20.90 - - 3.1 (CPU) RCNN2 (3 instances) 77.7 29.8 - - - multiscale net + cover1 72.3 50.8 - - - multiscale net + cover2 78.5 29.6 - - - RCNN (balanced) 79.3 57.1 - - 0.03 (GPU) HGDN 79.68 51.26 - - 0.03 (GPU) RCNN-large 84.3 41.0 - - 0.04 (GPU) FCN-16s 85.2 51.7 - - 0.175 (GPUs) VGG-conv5-DAG-RNN(8) 85.3 55.7 - - - FCN-8s 85.9 53.9 41.2 77.2 - patch CRF+CNN 88.1 53.4 - - -

PASCAL-Context

Method Pixel Acc. Class Acc. mean IU f.w. IU CFM - - 18.1 - CFM - - 34.4 - FCN-32s 65.5 49.1 36.7 50.9 FCN-16s 66.9 51.3 38.4 52.3 FCN-8s 67.5 52.3 39.1 53.0 patch CRF+CNN 71.5 53.9 - -

Reference

Method Year Conference Reference Paper Superparsing 2010 ECCV Superparsing: Scalable nonparametric image parsing with superpixels Single-scale ConvNet 2013 PAMI Learning hierarchical features for scene labeling multiscale net 2013 PAMI Learning hierarchical features for scene labeling Augmented CNNs 2014 BMVC Contextually constrained deep networks for scene labeling RCNN2 (3 instances) 2014 ICML Recurrent convolutional neural networks for scene labeling Multi-CNN + rCPN Fast 2014 NIPS Recursive context propagation network for semantic scene labeling RCNN (balanced) 2015 NIPS Convolutional Neural Networks with Intra-layer RCNN-large 2015 NIPS Convolutional Neural Networks with Intra-layer Deep 2D LSTM 2015 CVPR Scene Labeling with LSTM Recurrent Neural Networks Zoom-out 2015 CVPR Feedforward semantic segmentation with zoom-out features FCN-16s 2015 CVPR Fully convolutional networks for semantic segmentation N-ReNet 2016 Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation HGDN 2016 CVPR Hierarchically Gated Deep Networks for Semantic Segmentation VGG-conv5-DAG-RNN(8) 2016 CVPR DAG-Recurrent Neural Networks For Scene Labeling patch CRF+CNN 2016 CVPR Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

0 0