论文笔记：going deeper with convolutions

来源：互联网发布：软件破解软件编辑：程序博客网时间：2024/04/29 15:02

读完以后还有很多细节不懂,但这篇paper主要思想就是提出了inception层,有两个好处,第一,inception层计算时降低了数据的维度,使得可以再有限的计算资源下构建更复杂的网络结构(GoogleNet有22层),第二,inception层将多个scale的feature结合起来,效果更好,原文是这么说的:

One of the main beneficial aspects of this architecture is that it allows for increasing the number of units at each stage significantly without an uncontrolled blow-up in computational complexity. The ubiquitous use of dimension reduction allows for shielding the large number of input filters of the last stage to the next layer, first reducing their dimension before convolving over them with a large patch size. Another practically useful aspect of this design is that it aligns with the intuition that visual information should be processed at various scales and then aggregated so that the next stage can abstract features from different scales simultaneously.

下面看一下inception层的结构:

可以看到它是4个scale的feature的结合.3个黄色的1x1的卷积层都是降维的作用,作者认为降维后仍可以保留大部分原始信息.上一层的输出分别经过1x1,3x3,5x5的conv和pooling层最后结合在一起作为inception层的输出.下面看一下GoogleNet的结构,进一步解释inception层:

以inception (3a)为例,它的输入是28x28x192,有192个chanel,1x1的conv只输出了64个chanel,降低了维数但保留了大部分信息,3x3和5x5的conv也都是在降维后进行,pooling层经过降维也只有32个chanel,最终的输出为28x28x256,chanel增加的很少.所以说inception层允许CNN更加deeper.

关于GoogleNet最后几层原文中是这样说的:

The use of average pooling before the classifier is based on [12], although our implementation differs in that we use an extra linear layer. This enables adapting and fine-tuning our networks for other label sets easily, but it is mostly convenience and we do not expect it to have a major effect. It was found that a move from fully connected layers to average pooling improved the top-1 accuracy by about 0.6%, however the use of dropout remained essential even after removing the fully connected layers.

大概就是说用avg pool替代fully connected效果更好,linear层的作用是在使用其他数据集的时候做一个映射,比如我们训练face verification的model的时候有10000多类别,就要在这一层将1024维的输入映射到10000多维.dropout层是有一定几率(这里是40%)将输入变为0输出,有助于避免overfiting,基本现在所有的CNN都有用到.

还有一个地方不清楚的是：文章中所提的分析统计相关性来进行聚类，这个聚类是怎么做的？聚类的结果与Inception Model有没有什么关系？（比如说是不是先将前一层的输出进行统计分析聚类，然后再在这个聚类的结果上进行1*1 3*3 5*5卷积在多个尺度上提取特征？？？一个朋友说这里的聚类就是pooling，不是很理解）以后有机会再来考虑吧！或者看看参考文献2吧

2 0