阅读笔记 - TextProposals

来源：互联网发布：淘宝客如何赚钱编辑：程序博客网时间：2024/06/16 18:51

忘了是怎么找到这篇文章。感兴趣是因为

够新， 2015－16

有完整的C++源码 TextProposals

和其他重要的state-of-the-art 做比较(e.g. Jaderberg)

segTracking.m –> precompAux() –> myTSP.m –> TSP()
flow
image info
Iunsp
ISall
seqinfo
1. optical flow for each frame, save to image directory
2. SP segmentation for each frame, ‘sp_labels’ save to image directory
3. save optical flow in ‘mat’ format under ./tmp/seqinfo/flowinfo-xxx.mat
4. save normalised image data (im=double(imread(filename))/255;) to ./tmp/seqinfo/iminfo-xxx.mat; all images in one array
5. independent superpixels for each frame to ./tmp/Iunsp/%04d-%d-%d-K%d.mat’,scenario,frames(1),frames(end),K, take a multi-frame segmentation and
% create a unique one for each frame by unspliceSeg(imseg)
6. all info about superpixel in one single matrix, ./tmp/ISall/%04d-%d-%d-K%d.mat’,scenario,frames(1),frames(end),K);
7. concat sequence info into struct array

%% perform foreground background segmentation
detSeg;
svmSeg;

%% generate initial set of trajectory hypotheses
generateHypotheses; (crashed on step 1)
1. DP: ./tmp/hyps/DPHyp-%04d-%d-%d.mat’,scenario,frames(1),frames(end))
2. ./tmp/hyps/MFTHyp-%04d-%d-%d-%d.mat’,scenario,frames(1),frames(end),opt.maxMFTHyp)
3. ./tmp/hyps/MFTDPHyp-%04d-%d-%d-%d.mat’,scenario,frames(1),frames(end),opt.maxMFTDPHyp)

generateHypotheses.m–>runDP(…) to get DPHyp –> generateHypothesesMFT.m to get MFTHyp –> generateHypothesesMFTDP.m to get MFTDPHyp

key functions in the main loop of generateHypothesesMFT.m
- justTrack(…)
- rmOlDets(…)
- evaluateHypothesesSet(…)
- detStructToArray(…)
-

Summary

some good survey papers

Paper novelty

Regarding Scene Text Detection (2.1)

all methods mentioned in Seciton 2.1, either region-based or texture-based, rely in generating individual character candidates and are complemented with a post-processing step where regions assessed to be characters are grouped into words or text lines based on spatial, similarity, and/or collinearity constraints

all other grouping processes are assuming that their atomic elements are well-segmented individual characters

takes inspiration from existing connected component based methods but does not assume that the individual connected components are well segmented characters.

Regarding Generic Object Proposal (Section 2.2)

Overall, all generic object proposals algorithms seem designed to target objects that can be isolated from its background as a single body: grouping-based methods do it by agglomerating adjacent regions; and most of the sliding window based methods do it intrinsically as they are actually trained with examples of such object type. Thus, in their majority these generic methods are not adequate for text detection, just because they are designed for a different task.

However, the use of generic object proposals techniques for scene text understanding has been exploited recently by Jaderberg et al. [3] with impressive results. Their end-to-end pipeline combines object proposals from the Edge- Boxes [13] algorithm and a trained aggregate channel features detector [12] with a powerful deep Convolutional Neural Network for holistic word recognition. Still, their method uses a CNN-based bounding box regression module on top of region proposals in order improve their quality.

Scene text detection

two category

Sliding window search methods

drawbacks: high computational cost

limited to detect a single language and orientation for which they have been trained on

Connected component based approaches

a typical bottom-up pipeline: over segmented region (CC)–> classify the resulting regions into character or background –> the identified characters are grouped into longer sequences (i.e. words or text lines)

SWT

Detect multi-script and arbitrary oriented text

MSER

Neumann and Matas [1] performs individual MSER classification using hand-crafted region-based features (e.g. aspect ratio, compactness, etc.)

Generic object proposal algorithms

Two major types

evaluate a fast to compute objectness measure by exhaustive search

Alexe et al. [2] combines several image cues such as saliency score, color contrast, edge density and straddling contours; suing integral images for fast computation

BING [3] using the norm of image gradients (NG feature)

in proper scales and aspect ratios, the Normed Gradients (NG) features between object and non-object windows share strong correcletion

EdgeBox [4] see below

the search is driven by segmentation and grouping process

Key insights

Different methods

BInarized Normed Gradients (BING)

BING proposals algorithm is trained to detection single body objects with compact shapes

EdgeBox

sliding window driven algorithm

a box objectness score is measured as the number of edges [5] that are wholly contained in the box minus those that are members of contours that overlap the box’s boundary.

using efficient data structures to evaluate millions of candidate boxes in a fraction of second

is not well suited for detecting non-horizontal and small-sized text

Data sets

ICDAR 2013

mostly well focused and flatly illuminated

SVT

more challenging text, with lower quality and many times under bad illumination conditions

SVT ground-truth annotations are less consistent in terms of the extra padding allowed around word instances

MLe2e

contains well-focused and horizontal text instances in four different scripts

ICDAR2015 “Incidental Scene Text”

contains a large number of non-horizontal and very small text instances

L. Neumann, J. Matas, A method for text localization and recognition in real-world images, in: Computer Vision–ACCV 2010, Springer, 2010, pp. 770–783. ↩
B.Alexe,T.Deselaers,V.Ferrari,Measuringtheobjectnessofimagewindows,PatternAnalysisandMachineIntelligence,IEEETransactions
on 34 (11) (2012) 2189–2202. ↩
M.-M. Cheng, Z. Zhang, W.-Y. Lin, P. Torr, Bing: Binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3286–3293. ↩
C.L.Zitnick,P.Dolla ́r,Edgeboxes:Locatingobjectproposalsfromedges,in:ComputerVision–ECCV2014,Springer,2014,pp.391–405. ↩
P. Dolla ́r, C. Zitnick, Structured forests for fast edge detection, in: Proceedings of the IEEE International Conference on Computer Vision,
2013, pp. 1841–1848. ↩

0 0