[深度学习论文笔记][Adversarial Examples] Intriguing properties of neural networks

来源：互联网发布：新版手机知乎怎么提问编辑：程序博客网时间：2024/09/21 08:58

Szegedy, Christian, et al. “Intriguing properties of neural networks.” arXiv preprint arXiv:1312.6199 (2013). (Citations: 251).

1 Representation of High Level Neurons

1.1 Motivation

Previous works analyzed the semantic meaning of various neurons by finding the set of inputs that maximally activate a given unit. The inspection of individual units makes the

implicit assumption that the neurons in high level layers form a distinguished basis which is particularly useful for extracting semantic information.

1.2 Observation

We found that random linear combinations of activations are semantically indistinguishable from the activations themselves. This puts into question the notion that neural networks disentangle variation factors across activations. It suggests that it is the space, rather than the individual neurons, that contains the semantic information in the high layers of neural networks.

2 Adversarial Examples
2.1 Motivation
By adding imperceptibly small perturbations to a correctly classified input image, it is no longer classified correctly.

2.2 Optimization

For a given image X, we add small perturbation ε ∈ R D×H×W such that X + ε is the closest image to X be wrongly classified as class k.

This is a box-constrained optimization.

Approximate it by box-constrained L-BFGS

2.3 Results
See Fig. 10.1. Adversarial examples show that inputs in the vicinity of training set images could have unexpected classification labels assigned.

Cross model generalization: a relatively large fraction of examples will be misclassified by networks trained from scratch with different hyper-parameters (number of layers,
regularization or initial weights).

Cross training-set generalization: a relatively large fraction of examples will be misclassified by networks trained from scratch on a disjoint training set.

It suggest that adversarial examples are somewhat universal and not just the results of overfitting to a particular model or to the specific selection of the training set.

The set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers), and so it is found near every virtually every test case.

0 0