PASCAL Visual Object Class --- PASCAL VOC数据集（最后附deeplab数据集）

来源：互联网发布：手机版财务记账软件编辑：程序博客网时间：2024/06/06 11:02

历史背景

自从2005年开始每年举办挑战赛。官网：http://host.robots.ox.ac.uk/pascal/VOC/

数据集

用于从实际图片中识别出特定对象。

2005~2012共八个版本，目前可以使用的是2007年及其以后的版本：

年份数据新增NOTE2005Only 4 classes: bicycles, cars, motorbikes, people. Train/validation/test: 1578 images containing 2209 annotated objects.
Two competitions: classification and detection
Images were largely taken from exising public datasets, and were not as challenging as the flickr images subsequently used. This dataset is obsolete.
200610 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects.
Images from flickr and from Microsoft Research Cambridge (MSRC) dataset
The MSRC images were easier than flickr as the photos often concentrated on the object of interest. This dataset is obsolete.
2007

20 classes:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Train/validation/test: 9,963 images containing 24,640 annotated objects.

Number of classes increased from 10 to 20
Segmentation taster introduced
Person layout taster introduced
Truncation flag added to annotations
Evaluation measure for the classification challenge changed to Average Precision. Previously it had been ROC-AUC.

This year established the 20 classes, and these have been fixed since then. This was the final year that annotation was released for the testing data.
200820 classes. The data is split (as usual) around 50% train/val and 50% test. The train/val data has 4,340 images containing 10,363 annotated objects.

Occlusion flag added to annotations
Test data annotation no longer made public.
The segmentation and person layout data sets include images from the corresponding VOC2007 sets.

200920 classes. The train/val data has 7,054 images containing 17,218 ROI annotated objects and 3,211 segmentations.

From now on the data for all tasks consists of the previous years' images augmented with new images. In earlier years an entirely new data set was released each year for the classification/detection tasks.
Augmenting allows the number of images to grow each year, and means that test results can be compared on the previous years' images.
Segmentation becomes a standard challenge (promoted from a taster)

No difficult flags were provided for the additional images (an omission).
Test data annotation not made public.

201020 classes. The train/val data has 10,103 images containing 23,374 ROI annotated objects and 4,203 segmentations.

Action Classification taster introduced.
Associated challenge on large scale classification introduced based on ImageNet.
Amazon Mechanical Turk used for early stages of the annotation.

Method of computing AP changed. Now uses all data points rather than TREC style sampling.
Test data annotation not made public.

201120 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 5,034 segmentations.

Action Classification taster extended to 10 classes + "other".

Layout annotation is now not "complete": only people are annotated and some people may be unannotated.

201220 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations.

Size of segmentation dataset substantially increased.
People in action classification dataset are additionally annotated with a reference point on the body.

Datasets for classification, detection and person layout are the same as VOC2011.

2012版本

1、数据集

数据集共有10类：

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

2、适用竞赛及场景

（1）Classification/Detection Competitions

分类：对所有20个类，判断测试图像中是否有这些类的元素。

检测：对测试图像，预测其bounding box和每个对象的标签（20个类）。

（2）Segmentation Competition

分割：点层面分割。类别class\对象object

（3）Action Classification Competition

动作分类：预测给定图片中人的动作。（用bounding box标出人的位置/人身上的某个部位用单个点标出）。

（4）ImageNet Large Scale Visual Recognition Competition

识别ImageNet数据集上的对象，目的在于找到图上的主体对象，而不是对象的位置。

（5）Person Layout Taster Competition

预测人不同部位的bounding box和标签。

deeplab实践

使用2012版本。

train&val：VOCtrainval_11-May-2012.tar

label：SegmentationClassAug.tar

（非2012版本的点击最上面的链接进官网哦~）

阅读全文

0 0