[论文笔记]Single Shot Text Detector with Regional Atterntion

来源：互联网发布：北京java工资编辑：程序博客网时间：2024/06/06 18:48

Single Shot Text Detector with Regional Atterntion

论文地址：https://arxiv.org/abs/1709.00138

创新点：

提出an atterntion mechanism，也就是an automatically learned attention map，从而实现抑制背景干扰。

模型架构：

－text-specific component: Text Attention Module(TAM) 和Hierarchical Inception Module(HIM)

－convolutional component: 由SSD扩展而来

－box prediction component: 由SSD扩展而来

Text Attention Mechanism：

原理：利用文字的像素级别的binary mask

步骤：

1.从卷积特征中学习文字的空间区域信息

2.将文字特征封装回卷积层，实现特征增强

Aggregated Inception Feature(AIF)：

The attention model基于AIF，由AIF产生heatmap。heatmap也就是像素概率热点图，展示每个像素点的文字概率。得到的attention map与输入图像大小相同，被每个prediction layer进行降采样

如何由AIF产生heatmap:

－>input:512*512

－>F(AIF1):64*512*512

－>D(AIF1):512*512*512

－>D'(AIF1):512*512*2

－>alpha:softmax（2类）positive部分则为pixel-wise possibility of text

alpha+

－>^alpha+ = resize(alpha+):64*64

－>the resulted feature maps: ^F(AIFI)=^alpha+ * F(AIFI)

本质：提取低维度信息，通过decov方法，保留粗粒度信息。

如何训练：

提出an auxiliary loss，利用binary mask判断每个位置的像素是否属于text。

主要卖点：

同时利用pixel-wise 和box-wise信息。

Hierarchical Inception module:

原理：

低层的卷积特征关注细节，而高层的卷积特征更关注抽象信息。

感知模块：

四个卷积层

4个128 channel features －> 512 channel features

Dilated convolutions：

在无损情况下，支持感受野的指数级的增长。

Final AIFs:

Each AIF is computed by fusing the inception features of current layer with two directly adjacent layers.

lower layer: Down-sampling

higher layer:Up-sampling

阅读全文

0 0