阅读笔记 - TextProposals

来源:互联网 发布:淘宝客如何赚钱 编辑:程序博客网 时间:2024/06/16 18:51

忘了是怎么找到这篇文章。感兴趣是因为

  • 够新, 2015-16
  • 有完整的C++源码 TextProposals
  • 和其他重要的state-of-the-art 做比较(e.g. Jaderberg)

segTracking.m –> precompAux() –> myTSP.m –> TSP()
flow
image info
Iunsp
ISall
seqinfo
1. optical flow for each frame, save to image directory
2. SP segmentation for each frame, ‘sp_labels’ save to image directory
3. save optical flow in ‘mat’ format under ./tmp/seqinfo/flowinfo-xxx.mat
4. save normalised image data (im=double(imread(filename))/255;) to ./tmp/seqinfo/iminfo-xxx.mat; all images in one array
5. independent superpixels for each frame to ./tmp/Iunsp/%04d-%d-%d-K%d.mat’,scenario,frames(1),frames(end),K, take a multi-frame segmentation and
% create a unique one for each frame by unspliceSeg(imseg)
6. all info about superpixel in one single matrix, ./tmp/ISall/%04d-%d-%d-K%d.mat’,scenario,frames(1),frames(end),K);
7. concat sequence info into struct array

%% perform foreground background segmentation
detSeg;
svmSeg;

%% generate initial set of trajectory hypotheses
generateHypotheses; (crashed on step 1)
1. DP: ./tmp/hyps/DPHyp-%04d-%d-%d.mat’,scenario,frames(1),frames(end))
2. ./tmp/hyps/MFTHyp-%04d-%d-%d-%d.mat’,scenario,frames(1),frames(end),opt.maxMFTHyp)
3. ./tmp/hyps/MFTDPHyp-%04d-%d-%d-%d.mat’,scenario,frames(1),frames(end),opt.maxMFTDPHyp)

generateHypotheses.m–>runDP(…) to get DPHyp –> generateHypothesesMFT.m to get MFTHyp –> generateHypothesesMFTDP.m to get MFTDPHyp

key functions in the main loop of generateHypothesesMFT.m
- justTrack(…)
- rmOlDets(…)
- evaluateHypothesesSet(…)
- detStructToArray(…)
-

Summary

some good survey papers

Paper novelty

  • Regarding Scene Text Detection (2.1)
    • all methods mentioned in Seciton 2.1, either region-based or texture-based, rely in generating individual character candidates and are complemented with a post-processing step where regions assessed to be characters are grouped into words or text lines based on spatial, similarity, and/or collinearity constraints
    • all other grouping processes are assuming that their atomic elements are well-segmented individual characters
    • takes inspiration from existing connected component based methods but does not assume that the individual connected components are well segmented characters.
  • Regarding Generic Object Proposal (Section 2.2)
    • Overall, all generic object proposals algorithms seem designed to target objects that can be isolated from its background as a single body: grouping-based methods do it by agglomerating adjacent regions; and most of the sliding window based methods do it intrinsically as they are actually trained with examples of such object type. Thus, in their majority these generic methods are not adequate for text detection, just because they are designed for a different task.
    • However, the use of generic object proposals techniques for scene text understanding has been exploited recently by Jaderberg et al. [3] with impressive results. Their end-to-end pipeline combines object proposals from the Edge- Boxes [13] algorithm and a trained aggregate channel features detector [12] with a powerful deep Convolutional Neural Network for holistic word recognition. Still, their method uses a CNN-based bounding box regression module on top of region proposals in order improve their quality.

Scene text detection

  • two category
    • Sliding window search methods
      • drawbacks: high computational cost
      • limited to detect a single language and orientation for which they have been trained on
    • Connected component based approaches
      • a typical bottom-up pipeline: over segmented region (CC)–> classify the resulting regions into character or background –> the identified characters are grouped into longer sequences (i.e. words or text lines)
  • SWT
    • Detect multi-script and arbitrary oriented text
  • MSER
    • Neumann and Matas [1] performs individual MSER classification using hand-crafted region-based features (e.g. aspect ratio, compactness, etc.)

Generic object proposal algorithms

  • Two major types
    • evaluate a fast to compute objectness measure by exhaustive search
      • Alexe et al. [2] combines several image cues such as saliency score, color contrast, edge density and straddling contours; suing integral images for fast computation
      • BING [3] using the norm of image gradients (NG feature)
        • in proper scales and aspect ratios, the Normed Gradients (NG) features between object and non-object windows share strong correcletion
      • EdgeBox [4] see below
    • the search is driven by segmentation and grouping process

Key insights

Different methods

  • BInarized Normed Gradients (BING)
    • BING proposals algorithm is trained to detection single body objects with compact shapes
  • EdgeBox
    • sliding window driven algorithm
    • a box objectness score is measured as the number of edges [5] that are wholly contained in the box minus those that are members of contours that overlap the box’s boundary.
    • using efficient data structures to evaluate millions of candidate boxes in a fraction of second
    • is not well suited for detecting non-horizontal and small-sized text

Data sets

  • ICDAR 2013
    • mostly well focused and flatly illuminated
  • SVT
    • more challenging text, with lower quality and many times under bad illumination conditions
    • SVT ground-truth annotations are less consistent in terms of the extra padding allowed around word instances
  • MLe2e
    • contains well-focused and horizontal text instances in four different scripts
  • ICDAR2015 “Incidental Scene Text”
    • contains a large number of non-horizontal and very small text instances

  1. L. Neumann, J. Matas, A method for text localization and recognition in real-world images, in: Computer Vision–ACCV 2010, Springer, 2010, pp. 770–783. ↩
  2. B.Alexe,T.Deselaers,V.Ferrari,Measuringtheobjectnessofimagewindows,PatternAnalysisandMachineIntelligence,IEEETransactions
    on 34 (11) (2012) 2189–2202. ↩
  3. M.-M. Cheng, Z. Zhang, W.-Y. Lin, P. Torr, Bing: Binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the
    IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3286–3293. ↩
  4. C.L.Zitnick,P.Dolla ́r,Edgeboxes:Locatingobjectproposalsfromedges,in:ComputerVision–ECCV2014,Springer,2014,pp.391–405. ↩
  5. P. Dolla ́r, C. Zitnick, Structured forests for fast edge detection, in: Proceedings of the IEEE International Conference on Computer Vision,
    2013, pp. 1841–1848. ↩
0 0