A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning

来源：互联网发布：淘宝网页设计怎么设计编辑：程序博客网时间：2024/06/05 10:28

这篇文章值得关注的点：

1. This network combines residual learning with Inception-style layers and is used to count cars in one look. This is a new way to count objects rather than by localization or density estimation.

2. The counting method is not car or scene specific. It would be easy to train this method to count other kinds of objects and counting over new scenes requires no extra set up or assumptions about object locations.

3.COWC数据集:
3.1 The image set is annotated by single pixel points, 不用处理车辆大小不变性，因为地图的空间分辨率作为先验信息。
3.2 Large trucks are completely omitted since it can be unclear when something stops being a light vehicle and starts to become a truck. Vans and pickups are included as cars even
if they are large. All boats, trailers and construction vehicles are always added as negatives.
3.3 生成红色部分是测试集，蓝色部分是训练集。不觉得这样的选取很奇怪吗！！！
这里写图片描述
3.4 验证集的图片来自犹他州，图片内容与训练集的差别很大
3.5 训练集大小256X256, 每15度旋转扩充训练集，This yielded a set of 308,988 training
patches and 79,447 testing patches.Patch中间区域48 × 48 有车辆才认为有车，所以负样本里面也有车。此外，边缘处的32个像素灰色处理，因为这样实验对比出来，模型性能最好（An edge margin of 32 pixels was grayed out in each patch）。
4.The ResCeption network： We created a third network to synthesize Residual Learning [26] with Inception. We called this one ResCeption network.与标准的残差网络不一样的地方是增加了projection shortcut.
这里写图片描述
5. 对比试验与本文的方法在caffe上完成，AlexNet、GoogLeNet/Inception、ResCeption network。
6.

patch边缘灰化对性能的影响1%左右，作者认为上下文场景太多对车辆有不好的提示。
（That we can have too much context might be a result of too much irrelevant information or
bad hints from objects that are too far from a car.）
7. 测试用10张图片，滑动窗口的方式，窗口大小224X224,32像素的灰化区域，步长8个像素。极大值抑制阈值0.75。Verification测试标准下，split和merge都没有计入(TP - x)/(TP+FP - 2x)，所以值较大相比正常的统计TP/(TP+FP)

这里写图片描述

Q:发现没有correct = (TP+TN)/(TP+TN+FP+FN);F = 2*P*R/(P+R),可table2竟然拿这两个指标对比(Ideally the verification score should …)。

8. 计数，每个patch，车辆中心至少八个像素才算，整个网络变成一个回归问题，64个输出，表示一个patch可以输出最多64辆车。试验在GoogleNet(22层),
这里写图片描述

0 0