关于semantic segmentation的标签制作方法

来源:互联网 发布:less 1.7.3.min.js 编辑:程序博客网 时间:2024/04/28 00:56

这段时间想做自己的图像分割的标签,想了好几种方法工作量都比较大,现在在Amazon picking challenge的文章上面找到一种方案,拍摄背景,然后放上物体,然后算法处理即可获得分割标签,原文如下


VI. SELF-SUPERVISEDTRAINING
By bringing deep learning into the approach we gain robustness. This, however, comes at the expense of amassing
quality training data, which is necessary to learn highcapacity models with many parameters. Gathering and manually labeling such large amounts of training data is expensive.
The existing large-scale datasets used by deep learning (e.g.
ImageNet [20]) are mostly Internet photos, which have very
different object and image statistics from our warehouse
setting.
To automatically capture and pixel-wise label images, we
propose a self-supervised method, based on three observations:
· Batch-training on scenes with a single object can yield
deep models that perform well on scenes with multiple
objects [17] (i.e., simultaneous training on cat-only or
dog-only images enables successful testing on cat-withdog images);
· An accurate robot arm and accurate camera calibration,
gives us at will control over camera viewpoint;
· For single object scenes, with known background and
known camera viewpoint, we can automatically obtain
precise segmentation labels by foreground masking.
The captured training dataset contains 136,575 RGB-D images of 39 objects, all automatically labeled.
Semi-automatic data gathering. To semi-autonomously
gather large quantities of training data, we place single
known objects inside the shelf bins or tote in arbitrary poses,
and configure the robot to move the camera and capture
RGB-D images of the objects from a variety of different
viewpoints. The position of the shelf/tote is known to the

robot, as is the camera viewpoint, which we use to transform
the collected RGB-D images in shelf/or tote frame. After
capturing several hundred RGB-D images, the objects are
manually re-arranged to different poses, and the process is
repeated several times. Human involvement sums up to rearranging the objects and labeling which objects correspond
to which bin/tote. Selecting and changing the viewpoint,
capturing sensor data, and labeling each image by object
is automated. We collect RGB-D images of the empty shelf
and tote from the same exact camera viewpoints to model the
background, in preparation for the automatic data labeling.
Automatic data labeling. To obtain pixel-wise object segmentation labels, we create an object mask that separates
foreground from background. The process is composed of 2D
and 3D pipelines. The 2D pipeline is robust to thin objects
(objects not sufficient volume to be reliably segmented in 3D
when placed too close to a walls or ground) and objects with
no depth information, while the 3D pipeline is robust to large
miss-alignments between the pre-scanned shelf bin and tote.
Results from both pipelines are combined to automatically
label an object mask for each training RGB-D image.
The 2D pipeline starts by fixing minor possible image
misalignments by using multimodal 2D intensity-based registration to align the two RGB-D images [21]. We then convert
the aligned color image from RGB to HSV, and do pixelwise comparisons of the HSV and depth channels to separate
and label foreground from background.
The 3D pipeline uses multiple views of an empty shelf bin
and tote to create their pre-scanned 3D models. We then use
ICP to align all training images to the background model,
and remove points too close to the background to identify the
foreground mask. Finally, we project the foreground points
back to 2D to retrieve the object mask


原创粉丝点击