[深度学习论文笔记][Adversarial Examples] Deep Neural Networks are Easily Fooled: High Confidence Predictions

来源：互联网发布：淘宝助理怎么导出图片编辑：程序博客网时间：2024/05/29 13:42

Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.” 2015 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR). IEEE, 2015. (Citations: 190).

1 Motivation
Produce images that are completely unrecognizable to humans, but CNNs believe to be recognizable objects with 99.99% confidence.

2 Implication of Adversarial Examples

For example, one can imagine a security camera that relies on face or voice recognition being compromised. Swapping white-noise for a face, fingerprints, or a voice might be
especially pernicious since other humans nearby might not recognize that someone is attempting to compromise the system.

Another area of concern could be image-based search engine rankings: background patterns that a visitor does not notice could fool a CNN-driven search engine into thinking a

page is about an altogether different topic.

3 Gradient Based

Like [Simonyan et al. 2013], compute adversarial examples by optimization, but we are optimize wrt the posterior probability (softmax output).

Starting with random noise, repeat update until the CNN confidence for the target class reaches 99.99%. Adding regularization makes images more recognizable but results in
slightly lower confidence scores. See the results in Fig.

4 GA Based
4.1 GA Approach
See Fig.

• Organisms: synthetic images that are used to fool CNN.
• fitness function: highest prediction value a CNN makes for that image belonging to a class.

• selection: MAP-Elites: keep the best individual found so far for each objective (each class).

4.2 Direct Encoding
Three integers (H, S, V) for each pixel of the image. Each pixel value is initialized with uniform random noise within [0, 255] range.

Each pixel value has probability p of being mutated by a polynomial mutation operator. p starts with 0.1 and drops by half every 1000 generations.

4.3 Indirect Encoding
Use compositional pattern-producing network (CPPN). CPPN is similar to CNN, which takes in the (x, y) position of a pixel as input, and outputs the tuple of HSV values for
that pixel. CPPN can evolve complex, regular images that resemble natural and man-made objects.

Evolution determines the topology, weights, and activation functions of each CPPN network in the population. Thus, elements in the genome can affect multiple parts of the
iamge. CPPN networks start with no hidden nodes, and nodes are added over time, encouraging evolution to first search for simple, regular images before adding complexity

4.4 Results
See Fig. 10.4. GA exploits specific discriminative features corresponding to each class learned by CNN. This is because evolution need only to produce features that are unique to, or discriminative for, a class, rather than produce an image that contains all of the typical features of a class. Many of the CPPN images feature a pattern repeated many times. The extra copies make the CNN more confident that the image belongs to the target class. It also show that CNN tend to learn low- and middle-level features rather than the global structure of objects.

Larger datasets are a way to make CNN more difficult to fool. Because there are more cat and dog classes, the GA had difficulty finding an image that scores high in a specific
dog category (e.g. Japanese spaniel), but low in any other related categories (e.g. Blenheim spaniel), which is necessary to produce a high confidence given that the final CNN layer is softmax. This explanation suggests that datasets with more classes can help ameliorate fooling.

It is not easy to prevent the CNN from being fooled by retraining them with fooling images (be recognize as a new “fooling images” class). While retrained CNN learn to
classify the negative examples as fooling images, a new batch of fooling images can be produced that fool these new networks, even after many retraining iterations.

5 Analysis
Discriminative models learn p(y|X) directly. They create decision boundaries that partition data into classification regions, see Fig. In a high-dimensional input space, the area a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class. Synthetic images far from the decision boundary and deep into a classification region may produce high confidence predictions even though they are far from the natural images in the class. The large regions of high confidence exist in certain discriminative models due to a combination of their locally linear nature and high-dimensional input space.

In contrast, a generative model that represents the complete joint density p(X, y) = p(y|X)p(X). Such models may be more difficult to fool because fooling images could be
recognized by their low marginal probability p(X), and the CNNs confidence in a label prediction for such images could be discounted when p(X) is low.

6 References
[1]. Evolving AI Lab. https://www.youtube.com/watch?v=M2IebCN9Ht4.
[2]. CVPR 2015. http://techtalks.tv/talks/deep-neural-networks-are-easily-fooled-high-confidence-predictions-for-unrecog61573/.

0 0