DiscoGAN

来源：互联网发布：雅思听力短期提高知乎编辑：程序博客网时间：2024/05/21 20:28

Abstract

人类可以在无监督的情况下轻易地发现两个东西之间的联系(或者说相同点), 而想让机器学习的话需要人类给他们配对作为ground truth然后再进行训练.
为了避免这种配对的麻烦, 提出了DiscoGAN
We propose a method based on generative adversarial networks that learns to discover relations between different domains
Successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity

Introduction

这篇文章把”寻找两种图片的关系”变成了”用一种风格的图片生成另一种风格”(利用GAN), 这是本文解决”寻找relation”的思路所在.
不需要人工配对图片(作为监督学习的训练集), 是无监督的.
A key intuition we rely on is to constraint all images in one domain to be representable by images in the other domain. For example, when learning to generate a shoe image based on each handbag image, we force this generated image to be an image-based representation of the handbag image (and hence reconstruct the handbag image) through a reconstruction loss(通过reconstruction loss使得生成的图片是image-based representation, 理解不了), and to be as close to images in the shoe domain as possible through a GAN loss. We use these two properties to encourage the mapping between two domains to be well covered on both directions (i.e. encouraging one-to-one rather than many-to-one or one-tomany).

Model

两个constrain

前面提到, 希望找到的映射是bijection, 也就是说, GAB is the inverse mapping of GBA
GAB的结果一定要在B domain里, 反之相同.

这两个限制分别用下面两个loss实现:

LCONSTA=d(GAB∘GAB(xA),xA)
LGANB=−ExA∼PA[logDB(GAB(xA))]

几种不同的架构分析

Standard GAN

这里写图片描述
缺点:
1. 只能从A映射到B, 反之不能
2. 无法保证bijection, 即会出现model collapse
3. 生成的图片不是 image-based representation(还是不理解)

这里写图片描述

GAN with a reconstruction loss

这里写图片描述
优点:
1. 增加了一个GBA. (Any form of metric function (L1, L2, Huber loss) can be used)
2. 增加了reconstruction loss, 使得xA和xABA接近, 即更接近于bijection.

Each generator in the model can learn mapping from its input domain to output domain and discover relations between them

缺点:

一.

During training, the generator GAB learns the mapping from domain A to domain B under two relaxed constraints:
1. domain A maps to domain B. (LCONSTA)
2. mapping on domain B is reconstructed to domain A. (LGANB)
However, this model lacks a constraint on mapping from B to A, and these two conditions alone does not guarantee a cross-domain relation (as defined in section 2.1) because the mapping satisfying these constraints is one-directional. In other words, the mapping is an injection, not bijection, and one-to-one correspondence is not guaranteed.

从上面这句话来看, LCONSTA仅仅用于调整GAB的参数, 而不用于GBA, 因此才会导致lacks a constraint on mapping from B to A. 不过这不重要, 重要的是下面的缺点二.

二.

In some sense, the addition of a reconstruction loss to a standard GAN is an attempt to remedy the mode collapse problem.

In Figure 3c, two domain A modes are matched with the same domain B mode, but the domain B mode can only direct to one of the two domain A modes.

Although the additional reconstruction loss LCONSTA forces the reconstructed sample to match the original (Figure 3c), this change only leads to a similar symmetric problem. The reconstruction loss leads to an oscillation between the two states and does not resolve mode-collapsing.

出现的问题的示意图在(c); 注意最后一句话, 如果出现了两个A的model A1和A2被map到了同一个B, 而GAB和GBA都是函数, 即都满足一个输入只能对应一个输出, 因此GBA只能映射到一个A model(设为Areconst). 这样, LCONSTA就会使得Areconst要么接近A1, 要么接近A2, 即an oscillation between the two states.

这里写图片描述

DiscoGAN

这里写图片描述

Loss function:
这里写图片描述

优点:

This model is constrained by two LGAN losses and two LCONSTlosses.

Therefore a bijective mapping is achieved, and a one-to-one correspondence, which we defined as cross-domain relation, can be discovered.

Experiment

Toy Experiment

我认为这是本文的一大亮点. 这个Toy Experiment不仅展示了GAN的G和D生成数据的过程, 而且解释了DiscoGAN的原理.
实验内容:
1. 用GMM取得source和target samples.
2. Fig.4里面的所有图都是B domain; 颜色代表了DB(xAB) 黑色的”x”代表target samples
这里写图片描述
这里我有一些地方想不通, 思来想去有了以下几点理解:
1. source和target用了两个不同的GMM; 或者说, 首先target是一个单独的GMM, 而Fig.4里的10个黑色”x”就是这个GMM的10个μ, 代表着10种不同的数据.
2. source用了一个单独的GMM, 有5个Gaussian, 因此取了5类; 或者说, 就是用了5个Gaussian分别取了5类数据.
3. 但是我无论如何想不通, source有5类, target有10类, 这怎么能产生一个bijection呢?

起始状态

这里写图片描述

所有的source经过GAB映射到了B domain的同一个点上. (除非是GAB的参数全部初始化为0, 不然怎么会出现这种情况?)

我不太明白为什么途中都是白色(DB(xB)=0.5), DB对real data不应该输出1吗?

Standard GAN model

这里写图片描述

Many translated points of different colors are located around the same B domain mode.

This result illustrates the mode-collapse problem of GANs since points of multiple colors (multiple A domain modes) are mapped to the same B domain mode.

Regions around all B modes are leveled in a green colored plateau in the baseline, allowing translated samples to freely move between modes

我不太明白为什么途中都是绿色(DB(xAB)=1), DB对fake data怎么会输出1?

GAN with reconstruction loss

这里写图片描述

The collapsing problem is less prevalent, but navy, green and light-blue points still overlap at a few modes

The regions between B modes are clearly separated

总结下上面两种模型:
1. Both standard GAN and GAN with LCONST fail to cover all modes in B domain since the mapping from A domain to B domain is injective.
2.

DiscoGAN

这里写图片描述

Not only prevent mode-collapse by translating into distinct well-bounded regions that do not overlap, but also generate B samples in all ten modes as the mappings in our model is bijective.

The discriminator for B domain is perfectly fooled.

Real Domain Experiment

这部分我没看到太多值得关注的, 唯一有一个就是下图:
Fig.5

In standard GAN and GAN with reconstruction (5a and 5b), most of the red dots are grouped in a few clusters, indicating that most of the input images are translated into images with same azimuth, and that these models suffer from mode collapsing problem as predicted.

比如在(a)中, 不同的输入角度得到了相同的输出角度, 因此出现了左上角多个点排成一条横线的情况. 同样在(a)下面的小车图片中也能看到, 前两对小车, 输入的角度不同, 但是输出的角度几乎相同. 而在(c)中, 所有的点均匀排列在蓝色斜线上, 说明是bijection.

值得注意的是, 这种无监督的寻找relation的方法, 找到的relation也是不一定的. 比如上图中, 找到的图片关联就是角度是相反的.

总结

综观全文, 我觉得最大的亮点, 也是最值得我学习的有二:
1. 通过降维的方法把高维空间的情况展示在二维空间中
2. 本文的作图方法值得学习, 比如toy experiment, 以及后面的Real Domain Experiment中Fig.5的方法.
3. 本文把”寻找两个domain的relation”的问题转化成了”由一个domain生成另一个domain”的方式.

阅读全文

0 0